Cultural Infrastructure and Reading Inequality

Published: 12 May 2026| Version 1 | DOI: 10.17632/zktmk9gn4w.1
Contributor:
Sergio A Berumen

Description

This dataset provides a regional, place-based compilation linking annual book-reading participation to socioeconomic context and cultural infrastructure across Spain’s 17 autonomous communities. It integrates six data blocks: (1) annual reading participation (share of respondents who read at least one book in the previous year), (2) official public library statistics (e.g., libraries per 100,000 inhabitants and registered users), (3) regional income indicators (net income per person), (4) local public expenditure on public libraries, (5) directory-based counts of municipal public libraries, and (6) independent bookshop density. The merged master file contains 17 territorial units and 152 harmonized variables capturing reading outcomes, infrastructure provision and partial uptake, public funding proxies, and market-based access points. The dataset is designed for exploratory territorial analysis and descriptive modeling (e.g., correlations and parsimonious regressions) rather than causal inference, given the small number of regional observations and the use of aggregated indicators. It supports research on cultural participation, territorial inequality, and cultural policy evaluation, and it enables comparative diagnostics of how socioeconomic gradients relate to reading outcomes relative to standard infrastructure measures. Source documentation and variable definitions are provided in accompanying metadata files.

Files

Steps to reproduce

Files required master_dataset.csv (or .xlsx): merged regional dataset (17 autonomous communities, 152 variables). analytic_dataset.csv (optional but recommended): reduced dataset used for modeling (annual reading participation + selected predictors). data_dictionary.csv (or .xlsx): variable definitions, units, and source notes. replication_script.R or replication_script.py: code to reproduce descriptive statistics, correlations, models, and figures/tables. Software / environment R (≥4.2) with packages such as tidyverse, broom, ggplot2; or Python (≥3.10) with pandas, numpy, statsmodels, matplotlib. (No internet access required once files are downloaded.) Reproduction workflow Download and unzip the dataset package to a local folder. Set the working directory to that folder (R: setwd("..."); Python: run from the folder or set a project path). Run the replication script end-to-end. The script performs: Data loading and consistency checks (17 unique autonomous communities). Construction (or loading) of the reduced analytical file used for modeling (annual reading rate + net income per person + infrastructure indicators such as libraries per 100,000 inhabitants, registered users, local library expenditure, municipal libraries, and bookshop density). Descriptive statistics and regional distributions (means, ranges, SDs). Correlation analysis (Pearson and Spearman) between annual reading and candidate predictors. Estimation of parsimonious OLS models, including: Model A: Reading ~ net income per person; and expanded models adding one infrastructure variable at a time (and one combined model). Model comparison metrics: R², adjusted R², and leave-one-out cross-validation (LOOCV) RMSE. Outputs produced (saved to /outputs/ or printed to console, depending on the script): Correlation table(s) and ranking of predictors. Regression tables for Models A–G and model-comparison summary (R²/adjusted R²/LOOCV RMSE). Figures used in the manuscript (e.g., reading rates by region; income–reading scatter; model-comparison plot). Expected results (high-level checks) Income variables show the strongest bivariate associations with annual reading; net income per person explains a substantial share of regional variation in the benchmark model. Adding libraries per 100,000 inhabitants improves in-sample fit but weakens LOOCV predictive stability relative to the income-only model. Bookshop density, library expenditure, and directory-based library counts show weaker and less consistent associations

Institutions

Categories

Communication, Cultural Sociology, Sociological Analysis, Cultural Framework

Licence