data and code

Published: 1 October 2025| Version 1 | DOI: 10.17632/s9zdjnrrhv.1
Contributors:
,
,

Description

This dataset provides the data and code for the paper: "Central Bank Green Communication and Pollution Premium: Evidence from China." The study's central hypothesis is that green communication by a central bank leads to a "pollution premium," where investors demand higher expected returns as risk compensation for holding stocks of high-polluting firms. The data was gathered for A-share listed companies in China from Q1 2007 to Q4 2023, with firm-level financial, governance, and market data sourced from the CSMAR and RESSET databases. The core explanatory variable, the Central Bank Green Communication (DGC) index, was constructed through text analysis of the China Monetary Policy Implementation Reports issued by the People's Bank of China. This process involved creating a specialized Chinese green finance dictionary using machine learning algorithms (Word2Vec) and then quantifying the central bank's green focus by calculating the frequency of green-related terms. The data shows that stronger central bank green communication significantly increases the expected stock returns and lowers the valuation (e.g., price-to-earnings ratio) of high-polluting firms. Notable findings reveal that this effect is transmitted through three primary channels: increasing the risk exposure of polluting firms, tightening lending restrictions from commercial banks, and shifting investor preferences toward sustainability. The folder contains the main panel dataset for baseline regressions (data.dta), as well as specific datasets for the event study (eventsd.dta), Granger causality tests (granger.dta), and portfolio analysis (portfolio.dta). For a detailed description of all files and all variables, please refer to the README document (Readme.docx) included in the package. Researchers can use this data in conjunction with the provided Stata code file (code.do) to fully replicate all tables and figures presented in the manuscript, thereby verifying its conclusions. The data can also be used to test alternative model specifications or serve as a foundation for future research in green finance, central bank communication, and asset pricing.

Files

Categories

Environmental Economics, Asset Pricing, Policy of Central Banks

Licence