Datasets and R Markdown files for the article "Effect size and inferential statistical techniques coupled with machine learning for assessing the association between prolactin concentration and metabolic homeostasis" submitted to Clinica Chimica Acta

Published: 27 November 2023| Version 6 | DOI: 10.17632/z7h7mndnwc.6
Contributors:
, Rafael Henriques Jácomo, Lidia Freire Abdalla Nery,

Description

- Datasets - - - - - - - - - - - - - - - - - - - - - - - The dataset (dataset_individual_results.xlsx) contains 65,795 anonymized laboratory results from tests of Prolactin (PRL), Glucose (GLU), Insulin (INS), Total Cholesterol (TC), HDL-c, LDL-c, and Triglycerides (TG) conducted on adult patients of both sexes in the first half of 2018. Additionally, another dataset (dataset_mean_results.xlsx) includes 106 average results obtained from stratifying the 65,795 results into 106 partitions based on prolactin concentration ranges. Within each partition, the average concentrations of prolactin and tests for glucose and lipid metabolism were computed.

Files

Steps to reproduce

The files named 1_PROJECT.Rproj are RStudio project files that store project settings and allow you to keep all related analyses, scripts, and data in a centralized location. Opening them in RStudio automatically loads the project directory as the working directory, making project management and collaboration easier. - There are five distinct types of .Rmd files - - - - - - - - - - - - - - - - - - - - - - - 1º) INSTALLATION_R_Packages.Rmd: This R Markdown file houses a script designated for the installation and loading of necessary R packages for the project. The script lists a collection of R packages, checks if they are already installed, and installs any missing packages along with their dependencies. Furthermore, it loads the specified packages into the R session, ensuring that all required libraries are available for use. The script thus serves as a preliminary setup tool, streamlining the package management process for the project. 2º) SCRIPT_Desc.Stats,HiMADiG-SEA,ML_model.Rmd: This R Markdown file contains code that conducts descriptive and inferential statistical analyses, calculates the effect size, and implements an approach termed HiMADiG-SEA (Hierarchical Multicriteria Analysis of Differences between Groups - Statistical and Effect size Approach). Additionally, a machine learning model is employed within this file to estimate the inflection point and forecast the average outcomes of glycidic and lipidic metabolism exams based on the average prolactin results. Lastly, it generates all necessary graphical analyses for interpreting the results. 3º) INSTALLATION_R_Packages.Rmd: This R Markdown file houses a script designated for the installation and loading of necessary R packages for the project. The script lists a collection of R packages, checks if they are already installed, and installs any missing packages along with their dependencies. Furthermore, it loads the specified packages into the R session, ensuring that all required libraries are available for use. The script thus serves as a preliminary setup tool, streamlining the package management process for the project. 4º) SCRIPT_IPPlo_CI95_pvalue.Rmd: This R Markdown file contains a script that visualizes inflection points and their 95% Confidence Intervals (CI95%) for various metabolic parameters, including HOMA-IR, Glucose, Total Cholesterol, LDL-c, HDL-c, and Triglycerides, in a graph. Additionally, the script computes p-values based on the method of Knol et al., as referenced, under a scenario of 0% overlap between two adjacent confidence intervals. The generated analyses and visualizations assist in comparing and interpreting the inflection points associated with glucose and lipid metabolism tests. 5º) RMask_Anonymizer.Rmd: Developed by the LabR group, uses hashing method with a "salt" (secret key) to anonymize data. When paired with an "Age" column, it creates irreversible identifiers, compatible with CSV, XLSX, or XLS formats.

Institutions

Universidade de Brasilia

Categories

Neuroscience, Health Sciences, Algorithms, Homeostasis, Machine Learning, Inferential Statistics, Metabolism, Endocrinologist

Funding

Sabin Diagnóstico e Saúde

Licence