COMPARISON OF PRINCIPAL COMPONENT ANALYSIS ALGORITHMS FOR IMPUTATION IN AGROMETEOROLOGICAL DATA IN HIGH DIMENSION AND REDUCED SAMPLE SIZE

Name: COMPARISON OF PRINCIPAL COMPONENT ANALYSIS ALGORITHMS FOR IMPUTATION IN AGROMETEOROLOGICAL DATA IN HIGH DIMENSION AND REDUCED SAMPLE SIZE
Creator: Valter De Souza
Published: 2024-06-07T08:29:25.119Z
Keywords: Database

De Souza, Valter; Sergio Augusto, Sérgio; Gabriel Filho, Luís Roberto Almeida

doi:10.17632/2ptckpw94f.1

COMPARISON OF PRINCIPAL COMPONENT ANALYSIS ALGORITHMS FOR IMPUTATION IN AGROMETEOROLOGICAL DATA IN HIGH DIMENSION AND REDUCED SAMPLE SIZE

Published: 7 June 2024| Version 1 | DOI: 10.17632/2ptckpw94f.1

Contributors:

Valter De Souza,

,

Description

Hourly databases were used, provided by National Meteorological Institute (INMET) were used for each meteorological variable, from January 1, 2012, to December 31, 2021, evaluated at 45 automatic weather stations in the region of the State of São Paulo, Brazil. For each station, the hourly databases covering the period in question were downloaded from the website of the National Meteorological Institute in .csv format files, totaling four hundred and fifty files (10 years x 45 stations).

Files

Steps to reproduce

EXTRACTING WEATHER DATA FROM INMET To download weather data from INMET's historical series, several steps are required: 1. Log on to the INMET website: https://bdmep.inmet.gov.br/; 2. Choose the annual data package option for all automatic stations separated by year, and you will be taken to the page for annual historical data; 3. Choose the years of interest, among which data is available from the year 2000 onwards. For each year selected, a file in .csv format will be available for each station; 4. Choose the stations of interest for the particular survey. To choose the stations of interest, view the geographical distribution of the stations on the map of stations on the link: https://mapas.inmet.gov.br/ ; 5. Rename all the files (.csv), this can be done manually or automatically. The automatic method is preferable due to the number of files to be handled by a data processing routine, for example, for a choice of 45 stations for a period of 10 years, there are 450 files. In order to merge and automatically process the data contained in these files, it is necessary to standardize the names. Going from a full name, for example, INMET_SE_SP_A725_AVARE_01-01-2011_A_31-12-2011 to a shortened name A725_2011; 6. To facilitate the routine reading of these files, create a folder for each station with the spreadsheet files for the years of interest; 7. Create a routine using a script in the R environment that automatically reads the data files obtained, considering the following steps: • Read the files with the data from each station for all the years of the research; • Exclude the first 9 lines, as they contain information on the weather stations from the research data source (INMET); • Use common column names for all the databases read into the R environment; • Replace all "-9999" values with "NA"; • Convert the date-time (Greenwich time zone) to local time, with a specific adjustment for São Paulo, subtracting three hours. • Create a database aggregating all the files; • Recalculate the variables of interest on a daily basis.

Institutions

Universidade Estadual Paulista Julio de Mesquita Filho

Funders

Coordenação de Aperfeicoamento de Pessoal de Nível Superior
Brazil
Grant ID: 001

COMPARISON OF PRINCIPAL COMPONENT ANALYSIS ALGORITHMS FOR IMPUTATION IN AGROMETEOROLOGICAL DATA IN HIGH DIMENSION AND REDUCED SAMPLE SIZE

Description

Files

Steps to reproduce

Institutions

Categories

Funders

Licence