Nutritional Composition and Derived Indices of Fish Species Sold in Portugal

Published: 25 December 2025| Version 1 | DOI: 10.17632/3wvbgtwkfz.1
Contributors:
Ana Olívia Jorge,
,

Description

This repository contains the data and analytical materials supporting the study “Nutritional Composition and Derived Indices of Fish Species Sold in Portugal.” The dataset comprises curated nutritional composition data for 40 fish species commonly available in the Portuguese market, harmonized from the Portuguese Food Composition Table (INSA, 2023) and complementary European and international sources. The repository includes: (i) an Excel file containing the full curated dataset with macronutrients, fatty acid classes, vitamins, minerals, and derived nutritional indices (PUFA/SFA, Na/K, Ca/P, Fe/Zn, EPA/DHA, and a modified Nutrient Rich Foods Index); (ii) a Python analysis file used to perform all data processing, statistical analyses, and figure generation; and (iii) two annexes in PDF format. Annex 1 provides detailed methodological descriptions and formulas for the calculation of nutritional indices and ratios, while Annex 3 documents the principal component analysis (PCA) dimensionality, nutrient loadings, and k-means clustering diagnostics. Annex 2 corresponds to the Excel dataset included in this repository. Together, these materials enable full transparency and reproducibility of the analyses presented in the associated manuscript, including nutrient density calculations, multivariate analysis, and clustering of fish species into lean, oily, and outlier nutritional profiles.

Files

Steps to reproduce

Download repository contents Download all files from this repository, including the Excel dataset (Annex 2), the Python analysis file, and the accompanying PDF annexes (Annex 1 and Annex 3). Review methodological details Consult Annex 1 for the definitions, formulas, and assumptions used to calculate derived nutritional indices and ratios (e.g., NRFfish, PUFA/SFA, Na/K, Ca/P, Fe/Zn, EPA/DHA). Review Annex 3 for details on the principal component analysis (PCA), dimensionality selection, nutrient loadings, and k-means clustering diagnostics. Set up the computational environment Use Python (version 3.11 or compatible). Required packages include: pandas, numpy, scipy, scikit-learn, matplotlib, and seaborn. The analyses were originally run in JupyterLab, but the script can be executed in any compatible Python environment. Run the analysis code Execute the provided Python file, ensuring that the Excel dataset is located in the same directory (or that the file path in the script is updated accordingly). The script performs data loading, calculation of derived indices, descriptive statistics, PCA, k-means clustering, and figure generation. Reproduce results and figures Running the script will reproduce all numerical results, clustering outcomes, and figures reported in the associated manuscript. Output figures correspond to nutrient density, fatty-acid ratios, mineral ratios, PCA score plots, and cluster visualizations.

Institutions

Universidade do Porto, REQUIMTE LAQV Porto, Universidade de Vigo

Categories

Food Science, Public Health, Nutrition, Data Analysis

Funders

Licence