Nairobi Securities Exchange Prices 2008-2012 for 6 selected stocks
Description
Stock market prediction remains active research in a quest to inform investors on how to trade (buy/sell) at the most opportune time. The prevalent methods used by stock market players in trying to predict the likely future trade prices are either technical, fundamental or time series analysis. This research wanted to try out machine learning methods, in contrast to the existing prevalent methods. Artificial neural networks (ANNs) tend to be the preferred machine learning method for this type of application. However, ANNs require some historical data to learn from, in order to do predictions. The research used an ANN model to test the hypothesis that the next day price (prediction) can be determined from the stock prices of the immediate last five days. The final ANN model used for the tests was a feedforward multi-layer perceptron (MLP) with error backpropagation, using sigmoid activation function, with network configuration 5:21:21:1. The data period used was a 5-year dataset (2008 to 2012), with 80% of the data (4-year data) used for training and the balance 20% used for testing (last 1-year data). The original raw data for Nairobi Securities Exchange (NSE) was scrapped from a publicly available and accessible website of a stock market analysis company in Kenya (Synergy, 2020). This daily prices data was first exported to a spreadsheet, then cleaned off headers and other redundant information, leaving only the data with stock name, date of trade and the related data such as volumes, low prices, high prices and adjusted prices. The data was further sorted by the stock names and then the trading dates. The data dimension was finally reduced to only what was needed for the research, which was the stock name, the date of trade and the adjusted price (average trade price). This final dataset was in CSV format, as hereby presented. The research tested three NSE stocks with the mean absolute percentage error (MAPE) ranging between 0.77% to 1.91%, over the 3-month testing period, while the root mean squared error (RMSE) ranged between 1.83 and 3.07. This raw data can be used to train and test any machine learning model that requires training and testing data. The data can also be used to validate and reproduce the results already presented in this research. There could be slight variance between what is obtained when reproducing the results, due to the differences in the final exact weights that the trained ANN model eventually achieves. However, these differences should not be significant. List of data files on this dataset: stock01_NSE_01jan2008_to_31dec2012_Kakuzi.csv stock02_NSE_01jan2008_to_31dec2012_StandardBank.csv stock03_NSE_01jan2008_to_31dec2012_KenyaAirways.csv stock04_NSE_01jan2008_to_31dec2012_BamburiCement.csv stock05_NSE_01jan2008_to_31dec2012_Kengen.csv stock06_NSE_01jan2008_to_31dec2012_BAT.csv References: Synergy Systems Ltd. (2020). MyStocks. Retrieved March 9, 2020, from http://live.mystocks.co.ke/
Files
Steps to reproduce
1) Download Neuroph studio (Neuroph, 2020) or Encog workbench (Heaton, 2020), which are freely available machine learning tools 2) Setup the ANN parameters as described in the data description 3) Point the model to the location of the CSV data file, having split the data into training set (80%) and testing set (20%) 4) Train the network, to the lowest error possible 5) Test the network based on test data proportion (20%) References: Heaton Research. (2020). Encog Machine Learning Framework. Retrieved March 9, 2020, from http://www.heatonresearch.com/encog Neuroph. (2020). Neuroph Download. Retrieved March 9, 2020, from http://neuroph.sourceforge.net/download.html