Data for: Penalized Logistic Regression for Classification and Feature Selection with its Application to Detection of Two Official Species of Ganoderma

Published: 14 October 2017| Version 2 | DOI: 10.17632/5sg7pgwrp9.2
Contributors:
Ying Zhu,
Wai Kwong Cheang,
Tan Augustine

Description

Two Linzhi data sets are provided including one training data set and one test data set. The training set consists of two csv-data files: 'linzhi.train-spectra.csv' is a 480*763 data matrix of 480 spectral observations used as training set, including 240 G.lucidum spectra (40 samples) and 240 G.sinense spectra (40 samples). Each row contains intensity values of a spectral observation with 763 spectral points, measured at wavenumbers ranging from 3800 to 750 cm−1. 'linzhi.train-vars.csv' is a 480*2 data matrix of response variable information of the 480 spectra in the training set. Each row contains response variable information (about class type and sample index) corresponding to the spectra having the same row index in the file 'linzhi.train-spectra.csv'. The column named as 'class' contains class type of each spectrum, coded as either 1 (G.sinense) or 2 (G.lucidum). The column named as 'sample' contains sample index of each spectrum since multiple spectra were collected for each sample. The test set consists of two csv-data files: 'linzhi.test-spectra.csv' is a 240*763 data matrix of 240 spectral observations used as test set, including 120 G.lucidum spectra (20 samples) and 120 G.sinense spectra (20 samples). Each row contains intensity values of a spectral observation with 763 spectral points, measured at wavenumbers ranging from 3800 to 750 cm−1. 'linzhi.train-vars.csv' is a 240*2 data matrix of response variable information of the 240 spectra in the test set. Each row contains response variable information (about class type and sample index) corresponding to the spectra having the same row index in the file 'linzhi.test-spectra.csv'. The column named as 'class' contains class type of each spectrum, coded as either 1 (G.sinense) or 2 (G.lucidum). The column named as 'sample' contains sample index of each spectrum since multiple spectra were collected for each sample. All the spectral data have been pre-treated as described in Section 2.3 of the manuscript.

Files