INDIAN CRIME DATA, 2020

Published: 30 October 2025| Version 1 | DOI: 10.17632/59hn25bn9f.1
Contributor:
SHRUTI SHRUTI

Description

The framework analyses data sources, employ data pre-processing techniques, apply machine learning algorithms, incorporate investigative support to enhance crime detection and prediction capabilities. It encompasses data collection, pre-processing, exploratory data analysis, feature selection and engineering, crime detection, crime prediction, investigation support, deployment and monitoring, as well as collaboration and knowledge sharing. By collecting relevant data from multiple sources crime reports, arrest records, footage it ensures a comprehensive dataset. Data pre-processing techniques are employed to clean, normalize, and transform the collected data. Exploratory data analysis provides insights into crime patterns, trends, correlations. Feature selection and engineering help identify the most relevant features for crime detection and prediction. Clustering algorithms are utilized to identify spatial patterns and crime clusters. It also incorporates investigation support by utilizing techniques to identify relationships, associations, and networks between individuals, locations, and events. Deployment and monitoring ensure its integration into operational systems for real-time or batch processing. The framework encourages collaboration and knowledge sharing between law enforcement agencies, researchers, and data scientists. The proposed Spatio-Temporal Contextual Crime Predictor framework in this research for investigating, detecting, and predicting crime using data mining is built upon a robust, multi-iteration pipeline. This pipeline meticulously extracts and preprocesses structured tabular crime data from complex, multi-page PDF documents, addressing the challenges of India's National Crime Records Bureau data, often non-machine-readable. For crime detection and prediction, the STCCP framework primarily employs Random Forest Regressor for forecasting crime rates and identifying potential hotspots, building a model that predicts crime incidence. A key novelty of STCCP is its contextual feature enrichment via Large Language Models (LLMs), which transform raw, disparate data into explainable, indexed knowledge units, providing richer narrative context and abstract indicators vital for comprehensive crime analysis. This fusion of LLM-derived context with geospatial and temporal modelling enables interpretable decisions, a crucial advantage over traditional "black-box" approaches. Investigation support is provided through the application of the trained Random Forest model, which aids in identifying patterns and extracting interpretable decision rules from crime data. The framework effectively incorporates Random Forest as a core classification and prediction tool, leveraging data mining techniques to investigate, detect, and predict crimes. The STCCP consistently outperforms state-of-the-art models in predictive accuracy, enhancing crime-fighting capabilities, improving resource allocation, and contributing to safer communities.

Files

Institutions

  • IK Gujral Punjab Technical University

Categories

Crime by Area, Crime Analysis

Licence