From Expansion to Elimination, DATA

Name: From Expansion to Elimination, DATA
Creator: Devan Wiley
Published: 2025-10-07T08:22:20.704Z
Keywords: Energy Economics, Political Issues of Energy Policy, Temporal Variability, Clean Energy Investment, Bayesian Hierarchical Modeling, Tax Credit, Residential Energy Efficiency

Wiley, Devan

doi:10.17632/5v54mctvxs.1

From Expansion to Elimination, DATA

Published: 7 October 2025| Version 1 | DOI: 10.17632/5v54mctvxs.1

Contributor:

Devan Wiley

Description

This project performs a Bayesian hierarchical analysis to investigate the factors influencing energy cost burden across different ZIP codes and years. Using panel data from multiple Excel files spanning several years (2012-2022), the project aims to model the relationship between energy cost burden and various predictors including tax_returns, uptake (presumably related to program participation or energy efficiency measures), and percent_white. The core of the analysis involves: Data Loading and Preprocessing: Combining data from multiple years, handling missing values, and standardizing predictor variables. Hierarchical Modeling: Building a Bayesian hierarchical model using PyMC that accounts for variation across both ZIP codes and years through the use of random effects. Inference: Performing inference using both variational inference (ADVI) and Markov Chain Monte Carlo (MCMC) methods, specifically the No-U-Turn Sampler (NUTS), to estimate the posterior distributions of the model parameters. Diagnostics and Comparison: Analyzing the convergence diagnostics (R-hat, ESS, divergences) for the MCMC samples and comparing the results obtained from ADVI and NUTS to understand the reliability of the different inference methods for this model and dataset. Exploratory Analysis: Including steps for basic data exploration such as summary statistics, correlation analysis, and time trends of key variables. The project highlights the importance of using robust MCMC methods like NUTS for complex models, especially when simpler approximations like ADVI might yield conflicting conclusions, and includes steps to improve sampler performance and assess convergence.

Files

Steps to reproduce

To reproduce this analysis in a Google Colab environment: Upload Data: Upload the following Excel files to the /content/ directory: Correct 2013_Diagnostics.xlsx Correct2011_Diagnostics.xlsx Correct2016_Diagnostics.xlsx Correct2019_Diagnostics.xlsx Run Notebook: Execute all code cells sequentially within this Google Colab notebook. Verify Environment: Ensure the Colab runtime uses Python 3.12.11 and that the specified library versions (PyMC 5.25.1, ArviZ 0.22.0, NumPy 2.0.2, Pandas 2.2.2) are installed (the first code cell handles installation). Check Random Seeds: Confirm that the random seeds (random_state=42 or random_seed=42) are maintained in the relevant data preprocessing and sampling steps.

Institutions

Vanderbilt University

From Expansion to Elimination, DATA

Description

Files

Steps to reproduce

Institutions

Categories

Licence