Two Daily Weather Datasets: Chiang Mai International Airport and Theodore Francis Green State Airport

Published: 8 Mar 2020 | Version 7 | DOI: 10.17632/95mr7pr8rj.7

Description of this data

Two daily weather datasets for experimenting data-driven models on two different weather types:

  1. Chiang Mai International Airport, Chiang Mai, Thailand from January 1st 1998 to July 31st 2019. The data were acquired from the station via personal communication. The following files are provided:
  • chiang_mai_1998-2019_raw.csv : the raw data.
  • chiang_mai_1998-2019.csv : the preprocessed data: the dates and redundant variables were removed, the missing data were imputed with MICE algorithm and all units were changed to SI units.
  1. Theodore Francis Green State Airport, Providence, RI from January 1st 2006 to October 31st 2019. The data were acquired from the National Oceanic and Atmospheric Administration (https://www.ncdc.noaa.gov/cdo-web/datatools/lcd). The following files are provided:
  • providence_2006-2019_raw.csv : the raw data.
  • providence_2006-2019.csv : the preprocessed data: the dates were removed, the missing data were imputed with MICE algorithm and all units were changed to SI units.

Additionally, we provide code in Python and shell scripts for reproducibility of the three autoencoder models in "Short-term Daily Precipitation Forecasting with
Seasonally-Integrated Autoencoder". The code have the following requirements:

  • Python 3.6 or higher
  • Keras 2.2 or higher (Python library)
  • Tensorflow 1.x.y or where x.y is 12.0 or higher (Python library)

The proposed model can be trained by simply running the following command:

./Providence.sh

After the training is done, the RMSE and CORR scores will be reported, and the forecast values will be saved in path/to/data_XXXXXX-xxxxxx.csv. The README.md file provides additional information on code usage.

#### Changing arguments

You can modify the arguments in the script files. For example, --model and horizon let you specify the model and the forecast horizon, respectively. The descriptions of all available options can be accessed via the command:

python3 main.py -h

#### Running the script in different modes

We have prepared the scripts for prediction mode and evaluation mode, namely Providence_predict.sh and Providence_eval.sh, as well as the pretrained weights for all three models in the model folder. To use these two modes, you need to specify the location of the pretrained weights using --load option. For example, the weights of SSAE that makes forecast over the next three days on Providence dataset are stored in pvd_ssae_3.h5

--load model/pvd_ssae_3.h5

You also need to specify the test data.

--test_data Data/data_name.csv

REFERENCES:

  • Liu, L., Jiang, H., He, P., Chen, W., Liu, X., Gao, J., Han, J., 2019. On the variance of the adaptive learning rate and beyond. arXiv preprint arXiv:1908.03265.
  • Zaytar, M.A., Amrani, C.E., 2016. Sequence to sequence weather forecasting with long short-term memory recurrent neural networks. International Journal of Computer Applications 143, 7–11. doi:10.5120/ijca2016910497.

Experiment data files

Steps to reproduce

Run the following commands to train SSAE at forecast horizon 1:
./Providence.sh
./Chiang_Mai.sh
Modify the following arguments in these script to change the model and the forecast horizon:
--model : S2S1, S2S2 or SSAE
--horizon : 1, 2 or 3 (or any other numbers)

Related links

Latest version

  • Version 7

    2020-03-08

    Published: 2020-03-08

    DOI: 10.17632/95mr7pr8rj.7

    Cite this dataset

    Pornnopparath, Donlapark (2020), “Two Daily Weather Datasets: Chiang Mai International Airport and Theodore Francis Green State Airport”, Mendeley Data, v7 http://dx.doi.org/10.17632/95mr7pr8rj.7

Statistics

Views: 253
Downloads: 78

Previous versions

Compare to version

Categories

Meteorology, Weather Forecasting, Forecasting, Time Series, Weather, Time Series Forecasting, Recurrent Neural Network, Deep Learning, Autoencoder (Artificial Neural Networks)

Licence

CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?
You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.

Report