Unemployment data - India with Polynomial analysis
Description
Description of the Data Data Structure: The data is organized into a dictionary format, containing two main keys: "State/UT" and "Unemployment Rate (2022-23)". Each state or union territory is listed alongside its corresponding unemployment rate for the year 2022-23. Unemployment Rates: The unemployment rates vary significantly across different states, with some states like Goa showing a high rate of 9.7%, while others like Assam have a notably low rate of 1.7%. The average unemployment rate for all states combined is also calculated, providing a national perspective on employment challenges. Future Projections: The dataset includes a projection for the unemployment rates in 2026, assuming a consistent annual increase of 2% from the 2022-23 rates. This projection allows for an analysis of potential future trends in unemployment across states. Statistical Analysis: The data is converted into a Pandas DataFrame, facilitating further statistical analysis and visualization. A linear regression model is fitted to establish an optimal trend line, predicting how unemployment rates may evolve over time. A polynomial regression model is also applied to better capture non-linear trends in the data. Visualization: The results are visualized using Matplotlib, showcasing both actual unemployment rates for 2022-23 and the predicted rates for 2026. Scatter plots highlight the disparities between states, while trend lines illustrate the overall direction of unemployment rates over the specified period. Contextual Relevance: This analysis is particularly relevant in understanding regional economic conditions and labor market dynamics in India. It provides insights that can inform policymakers and stakeholders about potential areas of concern regarding employment and economic planning.
Files
Steps to reproduce
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures # Given data for 2022-23 unemployment rates data = { "State/UT": [ "Andhra Pradesh", "Arunachal Pradesh", "Assam", "Bihar", "Chhattisgarh", "Delhi", "Goa", "Gujarat", "Haryana", "Himachal Pradesh", "Jharkhand", "Karnataka", "Kerala", "Madhya Pradesh", "Maharashtra", "Manipur", "Meghalaya", "Mizoram", "Nagaland", "Odisha", "Punjab", "Rajasthan", "Sikkim", "Tamil Nadu", "Telangana", "Tripura", "Uttarakhand", "Uttar Pradesh", "West Bengal", "Andaman & N. Island", "Chandigarh", "Dadra & Nagar Haveli & Daman & Diu", "Jammu & Kashmir", "Ladakh", "Lakshadweep", "Puducherry", "All India" ], "Unemployment Rate (2022-23)": [ 4.1, 4.8, 1.7, 3.9, 2.4, 1.9, 9.7, 1.7, 6.1, 4.3, 1.7, 2.4, 7.0, 1.6, 3.1, 4.7, 6.0, 2.2, 4.3, 3.9, 6.1, 4.4, 2.2, 4.3, 4.4, 1.4, 4.5, 2.4, 2.2, 9.7, 4.0, 2.5, 4.4, 6.1, 11.1, 5.6, 3.2 ] } # Convert data to DataFrame df = pd.DataFrame(data) # Estimating unemployment rate for 2026 assuming a 2% annual increase in unemployment annual_growth_rate = 0.02 # 2% increase per year df["Unemployment Rate (2026)"] = df["Unemployment Rate (2022-23)"] * ((1 + annual_growth_rate) ** 3) # Extract X (independent variable: years) and y (dependent variable: unemployment rates) X = np.array([2023, 2026]).reshape(-1, 1) y = np.array([df["Unemployment Rate (2022-23)"].mean(), df["Unemployment Rate (2026)"].mean()]) # Fit Linear Regression Model for optimal trend line linear_regressor = LinearRegression() linear_regressor.fit(X, y) y_pred = linear_regressor.predict(X) # Fit Polynomial Regression (degree 2 for better trend representation) poly = PolynomialFeatures(degree=2) X_poly = poly.fit_transform(X) poly_regressor = LinearRegression() poly_regressor.fit(X_poly, y) # Generate predictions for smoother trend line X_range = np.linspace(2023, 2026, 100).reshape(-1, 1) X_poly_range = poly.transform(X_range) y_poly_pred = poly_regressor.predict(X_poly_range) # Plot actual and predicted values plt.figure(figsize=(10, 6)) # Scatter plot of actual data plt.scatter([2023]*len(df), df["Unemployment Rate (2022-23)"], color='blue', label="2022-23 Unemployment Rates") plt.scatter([2026]*len(df), df["Unemployment Rate (2026)"], color='red', label="Predicted 2026 Unemployment Rates") # Trend lines plt.plot(X, y_pred, color='green', linestyle='--', label="Optimal Linear Trend") plt.plot(X_range, y_poly_pred, color='purple', linestyle='-', label="Polynomial Trend Line") plt.xlabel("Year") plt.ylabel("Unemployment Rate (%)") plt.title("Unemployment Rate Trend: 2022-23 to 2026") plt.legend() plt.grid(True) plt.show()