Data for: Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model

Published: 8 November 2017| Version 2 | DOI: 10.17632/d37mzs3b3m.2
Contributors:
,
, Lam Huynh

Description

The "Dataset_HIR" folder contains the data to reproduce the results of the data mining approach proposed in the manuscript titled "Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model". More specifically, the folder contains the raw electronic structure calculation input data provided by the domain experts as well as the training and testing dataset with the extracted features. The "Dataset_HIR" folder contains the following subfolders namely: 1. Electronic structure calculation input data: contains the electronic structure calculation input generated by the Gaussian program 1.1. Testing data: contains the raw data of all training species (each is stored in a separate folder) used for extracting dataset for training and validation phase. 1.2. Testing data: contains the raw data of all testing species (each is stored in a separate folder) used for extracting data for the testing phase. 2. Dataset 2.1. Training dataset: used to produce the results in Tables 3 and 4 in the manuscript + datasetTrain_raw.csv: contains the features for all vibrational modes associated with corresponding labeled species to let the chemists select the Hindered Internal Rotor from the list easily for the training and validation steps. + datasetTrain.csv: refines the datasetTrain_raw.csv where the names of the species are all removed to transform the dataset into an appropriate form for the modeling and validation steps. 2.2. Testing dataset: used to produce the results of the data mining approach in Table 5 in the manuscript. + datasetTest_raw.csv: contains the features for all vibrational modes of each labeled species to let the chemists select the Hindered Internal Rotor from the list for the testing step. + datasetTest.csv: refines the datasetTest_raw.csv where the names of the species are all removed to transform the dataset into an appropriate form for the testing step. Note for the Result feature in the dataset: 1 is for the mode needed to be treated as Hindered Internal Rotor, and 0 otherwise.

Files

Steps to reproduce

The steps to reproduce the results are given in the paper titled "Identification of hindered internal rotational mode for complex chemical species: A data mining approach with multivariate logistic regression model" published in Chemometrics and Intelligent Laboratory Systems.

Institutions

International University

Categories

Chemistry, Data Mining, Machine Learning, Chemical Compound, Data Analysis, Chemometrics, Electronic Structure Calculations, Molecular Internal Rotation, Multivariate Logistical Regression

Licence