LIWC Dataset of Known-Victim Rape Suspect Statements (Chile, 2017–2021)

Published: 13 April 2026| Version 3 | DOI: 10.17632/8d5z5wp3j6.3
Contributor:
Francisco Ceballos-Espinoza

Description

This repository contains a fully de-identified public-use analytic dataset derived from a cross-sectional secondary analysis of routinely collected police records from Chile. The public dataset does not include raw police statements, transcripts, narrative text, audio, direct identifiers, or any other source material from individual case files. Instead, it contains only numeric variables derived from prior text processing with LIWC-22 (v1.3.0), including 51 psycholinguistic indicators across seven dimensions and total word count, together with non-identifying analytic variables required for reproducibility. The underlying study examined 250 cases from 2017–2021, grouped by victim age category (<14 years vs ≥14 years), and evaluated between-group differences in LIWC-derived linguistic markers. LIWC-22 was applied to the original statements before public release, and the repository therefore provides only derived quantitative metrics, not language samples or recoverable verbal content. The shared files are intended exclusively to support transparency, reproducibility, and secondary methodological work in forensic psycholinguistics. Because the repository contains only de-identified, derived numeric outputs, it does not permit reconstruction of the original statements or re-identification of individual participants. These materials should not be interpreted as evidence of guilt, credibility, or investigative priority, but as reproducible research data documenting group-level linguistic associations within a police-investigation sample..

Files

Steps to reproduce

Steps to reproduce: 1. Load the public dataset: data/public/LIWC_known_victim_public_dataset_2026-04-02_v1.0.0.csv 2. Run the public analytical scripts in the following order: script/public/00_config.R script/public/02_descriptive_tables_S1_S7.R script/public/03_inferential_tables_S8_S9.R script/public/03b_sensitivity_narrative_length_publi.R script/public/04_export_outputs.R script/public/05_figure1_original_metrics_dual_panel_labels_AB.R script/public/06_figure2_distribution_plots_FDR_robust_markers.R script/public/07_figureS2_lollipop_all_markers_ordered_by_effect_size.R 3. Outputs (tables, logs, manifest, and figures) will be generated in: data/outputs/[RUN_TAG]/ 4. Note: Scripts in `script/restricted/` require access to the restricted source matrix `data/restricted/liwctotal.xlsx` and are included for transparency only; they are not executable using the public dataset alone.

Categories

Applied Linguistics, Computational Linguistics, Forensic Psychology, Criminal Profiling, Criminal Behavior

Licence