Data and code for: Contagion risk prediction with Chart Graph Convolutional Network: Evidence from Chinese stock market

Published: 5 June 2025| Version 1 | DOI: 10.17632/6xy9d4bp28.1
Contributors:
Zhensong Chen, Wenjun Zhang,

Description

This dataset accompanies the study “Contagion risk prediction with Chart Graph Convolutional Network: Evidence from Chinese stock market”, which proposes a framework for contagion risk prediction by comprehensively mining the features of technical charts and technical indicators. The utilized data include the closing prices of 28 sectors in Shen wan primary industry index, the closing price of CSI-300 Index, and eight classes of trading indicators that include Turnover Rate, Price-to-Earnings Ratio, Trading Volume, Relative Strength Index, Moving Average Convergence Divergence, Moving Average, Bollinger Bands, and Stochastic Oscillator. The sample period is from 5 Jan 2007 to 30 Dec 2022. The closing prices of 28 sectors are downloaded from the Choice database. The closing price of the CSI-300 Index and eight classes of trading indicators are downloaded from the Wind database. This dataset includes two raw data files, one predefined temporary file, and eighteen code files, which are described as follows: Sector_data.csv stores the closing prices of 28 sectors. CSI_300_data.csv includes closing price of CSI-300 Index, and eight classes of trading indicators. DCC_temp.csv is a predefined temporary file used to store correlation results. Descriptive_code.py is utilized to calculate the statistical results. ADF Test.py is utilized to test the stationarity of the data. Min-max normalization.py is utilized to standardize data. ADCC-GJR-GARCH.R is utilized to calculate dynamic conditional correlations between sectors. MST_figure.py is used to a construct complex network that illustrates the inter-sector relationships. Correlation.py is used to calculate inter-industry correlations. Corr_up.py, corr_mid.py and corr_down.py are used to calculate dynamic correlations in upstream, midstream, and downstream sectors. Centrality.py is used to quantify the importance or influence of nodes within a network, particularly across distinct upstream, midstream, and downstream sectors. Averaging_corr_over_a_5-day_period.py calculates 5-day rolling averages of correlation and centrality metrics to quantify contagion risk on a weekly cycle. Convert technical charts using PIP and VG methods.py extracts significant nodes and converts them into graphical representations, and save them in Daily Importance Score.csv, Daily Threshold Matrix.csv, and Daily Technical Indicators.csv. Convert_CSV_to_TXT.py converts Daily Importance Score.csv, Daily Threshold Matrix.csv, and Daily Technical Indicators.csv into TXT files for later use. Four files included in the folder of Generating and normalizing the subgraphs to generate subgraphs and then normalize them. The receptive_field.py serves as the main program, which calls the other three files. The stock_graph_indicator.py calculates topological structure data for subsequent use. Predictive_model.py takes normalized subgraphs and Y-values defined by contagion risk as inputs and performs parameter tuning to achieve optimal results.

Files

Steps to reproduce

Contagion Risk Specification and Prediction 1.Collecting data and conducting preprocessing The closing prices of 28 sectors are downloaded from the Choice database. The closing price of the CSI-300 Index and eight classes of trading indicators are downloaded from the Wind database. Daily returns are log-differenced, and descriptive statistics are computed using Python scripts. In addition, the closing prices and eight technical indicators are standardized using min-max normalization method. 2.Constructing complex network Dynamic conditional correlations between two sectors are sequentially calculated in the R environment. Then, we manually modify the filename for final output (e.g., renamed to a target name such as A&F_Sector _DCC.csv). Sectors are abstracted as nodes, with edges defined by correlation-transformed distances, thereby constructing an interconnected complex network utilizing the MST. The generated MST graph is saved to the file MST_figure.svg. 3.Specifying contagion risk Correlation and centrality metrics in complex networks are computationally derived and stored in correlation.csv and centrality.csv respectively. Both indicators undergo 5-day period averaging processing. Contagion risk is confirmed when a sharp increase in correlation occurs simultaneously with rising centrality. This risk period terminates after correlation exhibits sustained decline for five consecutive cycles. We then save the correlation data for upstream, midstream, and downstream sectors to the files correlation_upstream_data.csv, correlation_midstream_data.csv, and correlation_downstream_data.csv respectively. 4.Converting technical charts The PIP method is first used to generate two outputs: a node list ranked by importance and their associated significance scores. Then, the natural visual graph method is utilized to transform the extracted key node time series into a graphical representation. This process will yield three specific results (Daily Importance Score.csv, Daily Threshold Matrix.csv, and Daily Technical Indicators.csv). These outputs respectively represent the node importance scores within technical graphs, the topological structure of edge connections in technical graphs, and eight technical indicators corresponding to nodes in technical graphs. 5.Generating and normalizing the subgraphs The CSV files obtained in step 4 are converted to TXT files (IS.txt, stock_A.txt, stock_node_attributes.txt). Then, the generation and normalization of subgraphs are performed by executing the main program receptive_field.py. 6.Contagion risk prediction The Y values and normalized subgraphs, stored in subgraph_data.npy, are fed into convolutional layers with attention mechanisms for prediction. The hyperparameter tuning configurations of the specific model are detailed in Table 4 of the paper, and the prediction results are averaged over five independent runs in the code to ensure stability. This process is implemented through the file Predictive_model.py.

Institutions

  • Capital University of Economics and Business

Categories

Empirical Finance

Funders

  • Humanities and Social Science Research Youth Fund project of Ministry of Education
    Grant ID: 24YJCZH032, 23YJCZH146
  • R&D Program of Beijing Municipal Education Commission
    Grant ID: KM202210038001
  • Scientific and Technological Innovation Projects for Academic Degree Postgraduates of Capital University of Economics and Business

Licence