Dataset_Machine_Learning_Exoplanets_2024
Description
The dataset used in this study consists of light curves collected by the Kepler telescope, totaling 5302 light curves, each with approximately 60,000 data points. The data were sourced from NASA's Exoplanet Archive, focusing on Kepler Objects of Interest (KOIs). Relevant columns such as kepid, koi_disposition, koi_period, koi_time0bk, koi_duration, and koi_quarters were selected. The Lightkurve library was utilized to extract the light curves, resulting in SAP (Simple Aperture Photometry) and PDCSAP (Pre-search Data Conditioning Simple Aperture Photometry) fluxes. Due to its precision, PDCSAP flux was used for exoplanet detection. Normalization was performed to standardize the light curves, missing data were addressed through linear interpolation, and outliers were removed using a 2-standard deviation threshold. An extensive experimental evaluation involving 16 algorithms with different parameter settings determined that the LightGBM algorithm demonstrated the best performance, achieving an accuracy of 82.92%. The results highlight the effectiveness of LightGBM for exoplanet classification. For more details, refer to the article: Macedo, B. H. D., & Zalewski, W. (2024). Automated Light Curve Processing for Exoplanet Detection Using Machine Learning Algorithms. Rev. Bras. de Iniciação Científica (RBIC), IFSP Itapetininga, 11, e024021, 1-27. Access the code on Website: https://brunohdmacedo.engineer/project.html. "As a key result of the experimental evaluation, the LightGBM algorithm achieved the best performance with an accuracy rate of 82.92%".
Files
Steps to reproduce
This tutorial provides a guide to using the code and data from the study to process light curves and detect exoplanets using the LightGBM algorithm. To get started, ensure you have Python (version 3.6 or higher) and the required libraries (`numpy`, `pandas`, `lightgbm`, `sklearn`, `lightkurve`) installed. First, clone the GitHub repository and install the necessary libraries. Download the light curves from NASA's Exoplanet Archive, focusing on Kepler Objects of Interest (KOIs), and save the data in a CSV file. Use the Lightkurve library to extract light curves and ensure the dataset includes columns such as `kepid`, `koi_disposition`, `koi_period`, `koi_time0bk`, `koi_duration`, and `koi_quarters`. Preprocess the data by normalizing the light curves, handling missing data through linear interpolation, and removing outliers using a 2-standard deviation threshold. Load and preprocess the data using the provided scripts to normalize and clean the data. Train the LightGBM model with the preprocessed data and evaluate its performance, aiming for an accuracy of around 82.92%, as indicated in the study. For comprehensive details, refer to the original article: Macedo, B. H. D., & Zalewski, W. (2024). Automated Light Curve Processing for Exoplanet Detection Using Machine Learning Algorithms. Rev. Bras. de Iniciação Científica (RBIC), IFSP Itapetininga, 11, e024021, 1-27. Acess the article: https://periodicoscientificos.itp.ifsp.edu.br/index.php/rbic/article/view/1403. Access the code on Website: https://brunohdmacedo.engineer/project.html. Access the data at: https://doi.org/10.17632/wctcv34962.3.