ENDID_IV

Name: ENDID_IV
Creator: Sandy Dall'Erba
Published: 2026-01-26T15:46:10.066Z
Keywords: Economics, Econometrics, Agricultural Economics, Spatial Econometrics

Dall'Erba, Sandy; Chagas, Andre LS; Ridley, William; Xu, Yilan; Yuan, Lilin

doi:10.17632/dpzjjnh6hf.1

ENDID_IV

Published: 26 January 2026| Version 1 | DOI: 10.17632/dpzjjnh6hf.1

Contributors:

, Andre LS Chagas,

,

, Lilin Yuan

Description

This repository reproduces the paper’s main empirical results from: “Difference-in-differences with endogenous externalities: Model and application to climate econometrics” Papers in Regional Science, 104 (2025) 100125. The code implements the paper’s strategy by combining (i) a PPML first stage to predict bilateral flows and construct a country–year exposure instrument, and (ii) an IV second stage to estimate causal effects on country–year outcomes. First stage (PPML). We estimate a Poisson model with high-dimensional fixed effects (exporter–importer and year) using fixest::fepois. From the estimated coefficients, we compute predicted bilateral flows and build WD_row_hat, an instrument that measures country i’s exposure in year t to shocks occurring in node j (e.g., drought). Exposure is weighted by shares derived from predicted bilateral flows, using either row-standardization (by exporter-year) or a year-global normalization, depending on the option selected. Second stage (IV). We estimate country–year panel models for outcomes such as ln_production (and optionally ln_area, ln_yield) as functions of observed exposure WD_row_obs, treated as endogenous and instrumented by WD_row_hat. Specifications include fixed effects and climate/structural controls (temperature, precipitation, irrigation, and drought indicators), estimated with fixest::feols using its IV formula syntax. Inference. Standard errors are obtained via a two-level bootstrap: (1) draw PPML coefficient vectors from a multivariate normal approximation based on the first-stage variance–covariance matrix; (2) resample country clusters (or a user-defined cluster key) with replacement and re-estimate the second-stage and IV models for each replication. This yields bootstrap standard errors that incorporate both first-stage uncertainty and within-cluster dependence. Parallelization uses foreach/doParallel, and reproducibility is ensured via doRNG with a fixed seed. Inputs are: (a) a bilateral panel (isoi, isoj, year) for the PPML stage and (b) a country–year panel (iso, year) for the IV stage. Running the replication script produces point estimates and a coefficient table with bootstrap standard errors and key diagnostics (e.g., first-stage F-statistics when available).

Files

Steps to reproduce

This project estimates: First stage (PPML): bilateral trade/flows with fixed effects using fixest::fepois, generating predicted bilateral flows and a country-year exposure instrument (e.g., WD_row_hat). Second stage (IV): country-year outcomes (e.g., ln_production) on endogenous exposure WD_row_obs, instrumented by WD_row_hat via fixest::feols IV syntax. Inference: a two-stage bootstrap that (i) draws PPML coefficients from a multivariate normal approximation and (ii) re-samples country clusters, then re-estimates stages 2 and 3. A machine-readable summary of inputs/parameters is in replication_config.csv. Software R (>= 4.1 recommended) Packages: data.table, fixest, MASS, foreach, doParallel, doRNG Install: install.packages(c("data.table","fixest","MASS","foreach","doParallel","doRNG")) Folder structure <project_root>/ data_article/ dt_bilateral_wheat.rds df_country_year_wheat.rds output/ your_script.R output/ is created automatically if missing. Input data Bilateral data (data_article/dt_bilateral_wheat.rds) Panel at (exporter i, importer j, year t) for PPML. Minimum fields: IDs/FE: isoi, isoj, year, iso_pair PPML clusters: cl_iy, cl_jy Dependent variable: value Shock: speij_d_crop07 (or shock_var) All regressors in rhs_ppml Country-year data (data_article/df_country_year_wheat.rds) One row per (iso, year) for IV. Minimum fields: Keys: iso, year FE: isoid, year (or chosen FE) Endogenous regressor: WD_row_obs Outcomes: ln_production (optionally ln_area, ln_yield) Controls in rhs_stage2 (e.g., temp, temp2, precip, precip2, irrigation2, spei_d_crop07) Run From project root: source("your_script.R") Main output example: summary(resrob$ln_production$stage3$coeftest) Key objects First stage (mrob): mrob$model, mrob$dt_for_W, mrob$beta_hat, mrob$vcv, mrob$W_hat (instrument at iso-year). Second stage (resrob) per outcome: stage3$model_point, stage3$se_boot, stage3$coeftest. Customization Edit: dep_vars, rhs_stage2, rhs_ppml, and bootstrap settings (B, n_cores, chunk_size, seed). Reproducibility & troubleshooting Bootstrap uses doRNG, so results are reproducible given identical data/software (minor floating-point differences may occur). If bootstrap SEs are NA, too many draws likely failed (collinearity / weak within-cluster variation). Increase B, simplify the model, or ensure cluster_id has many clusters; also verify all cluster columns exist in df_iv.

Institutions

Universidade de Sao Paulo, University of Illinois at Urbana-Champaign

Funders

National Council for Scientific and Technological Development
Ministry of Science, Technology and Innovation
Brazil

ENDID_IV

Description

Files

Steps to reproduce

Institutions

Categories

Funders

Licence