Revenue trend analysis - Brand Identity - Economics

Published: 27 January 2025| Version 1 | DOI: 10.17632/zd3vj49c3t.1
Contributor:
Sunil Maria Benedict

Description

Dataset Description: Revenue Trend Analysis (2020-2023) Overview This dataset provides a comprehensive analysis of revenue data over a three-year period, specifically from 2020 to 2022-2023. It includes revenue figures expressed in Indian Rupees (₹) and utilizes statistical methods to visualize trends and correlations within the data. The analysis employs linear regression to identify trends in revenue growth and generates visual representations through line plots and heatmaps. Data Structure The dataset consists of two main columns: Year: This column indicates the fiscal years under consideration. The years are represented as strings, with the last year in the range specified for "2022-2023." Revenue (₹ Crores): This column contains revenue figures formatted as strings, including currency symbols and commas. Data Preparation Cleaning Revenue Data: The revenue values are cleaned by removing the currency symbol (₹) and commas, converting them into float data types for numerical analysis. Handling Year Format: The year data is processed to ensure consistency, particularly for the entry "2022-2023," which is converted to just "2023" for clarity in analysis. Growth Rate Calculation: A new column, "Growth Rate (%)", is added to represent the percentage change in revenue from one year to the next. This is calculated using the pct_change() method, which computes the percentage growth relative to the previous year's revenue. Statistical Analysis Linear Regression Model: A linear regression model is fitted to the data to predict revenue based on the year. The model helps identify trends in revenue growth over time. Predicted values are generated using the fitted model, which allows for comparison with actual revenue figures. Visualizations Revenue Trend Plot: A line plot illustrates the actual revenue over the years alongside a trend line derived from the linear regression model. The plot features: Actual revenue points connected by lines (in blue). A dashed red line representing the optimal trend line predicted by the linear regression model. Axes labeled with appropriate titles and a grid for better readability. Correlation Heatmap: A heatmap visualizes the correlation between year, revenue, and growth rate metrics. The correlation matrix is computed from the relevant columns, providing insights into how these variables relate to one another. The heatmap uses a color gradient to represent correlation coefficients, with annotations indicating precise values. Conclusion This dataset serves as a valuable resource for analyzing revenue trends over a specified period while employing statistical methods to derive meaningful insights. By cleaning and processing the data effectively, applying linear regression for trend analysis, and visualizing results through plots and heatmaps, this analysis provides a clear understanding of revenue dynamics and growth patterns from 2020 to 2023.

Files

Steps to reproduce

import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.linear_model import LinearRegression import numpy as np # Data preparation data = { "Year": ["2020", "2021", "2022-2023"], "Revenue (₹ Crores)": ["₹ 1,104.30", "₹ 1,122.57", "₹ 1,786.60"] } df = pd.DataFrame(data) # Clean the revenue data: Remove symbols and convert to float df['Revenue (₹ Crores)'] = ( df['Revenue (₹ Crores)'] .str.replace("₹", "", regex=True) .str.replace(",", "", regex=True) .astype(float) ) # Handle non-standard year format (assuming the year is the last part after "-") def handle_year_format(year_str): if "-" in year_str: return int(year_str.split("-")[-1]) else: return int(year_str) df['Year'] = df['Year'].apply(handle_year_format) # Check data types print(df.dtypes) # Remove any non-breaking spaces in the 'Year' column (if applicable) # df['Year'] = df['Year'].str.replace(u'\xa0', '', regex=False) # Uncomment if necessary # Ensure 'Year' is a string type for .str methods (if necessary) if df['Year'].dtype != 'object' and df['Year'].dtype != 'string': df['Year'] = df['Year'].astype(str) # Linear regression model X = df[['Year']].to_numpy() # Feature (Year) y = df['Revenue (₹ Crores)'].to_numpy() # Target (Revenue) model = LinearRegression() model.fit(X, y) y_pred = model.predict(X) # Plotting the revenue trend with the optimal line plt.figure(figsize=(10, 6)) plt.plot(df['Year'].to_numpy(), df['Revenue (₹ Crores)'].to_numpy(), marker='o', linestyle='-', color='b', label='Actual Revenue') plt.plot(df['Year'].to_numpy(), y_pred, linestyle='--', color='r', label='Trend Line (Optimal)') plt.title('Revenue Trend with Optimal Line (2020-2023)', fontsize=14) plt.xlabel('Year', fontsize=12) plt.ylabel('Revenue (₹ Crores)', fontsize=12) plt.legend() plt.grid() plt.show() # Heatmap of metrics (correlation) # Add a dummy column for visualization df['Growth Rate (%)'] = df['Revenue (₹ Crores)'].pct_change().fillna(0) * 100 # Percentage growth rate # Compute correlation matrix corr_matrix = df[['Year', 'Revenue (₹ Crores)', 'Growth Rate (%)']].corr() # Plot heatmap plt.figure(figsize=(8, 6)) sns.heatmap(corr_matrix, annot=True, fmt=".2f", cmap="coolwarm", cbar=True) plt.title('Correlation Heatmap of Metrics', fontsize=14) plt.show()

Institutions

Independent

Categories

Economics, Brand Management, Brand Personality, Linear Function

Licence