ClusterSense-FP: A Large-Scale Failure Prediction Dataset for CPU Cluster Infrastructure

Published: 23 June 2026| Version 1 | DOI: 10.17632/wvdhr8yvnr.1
Contributors:
,

Description

The ClusterSense-FP dataset is a comprehensive dataset designed for predictive failure analysis in CPU cluster computing environments. The dataset contains operational and performance-related metrics collected from cluster nodes, including CPU utilization, memory usage, disk I/O, network activity, temperature, power consumption, and system health indicators. It aims to support the development of machine learning and artificial intelligence models for early failure detection, predictive maintenance, anomaly detection, and reliability assessment in distributed computing infrastructures. Researchers and practitioners can utilize this dataset to improve system availability, reduce downtime, and enhance the resilience of large-scale cluster environments.

Files

Steps to reproduce

Collect operational metrics from CPU cluster nodes, including CPU utilization, memory utilization, disk I/O, network throughput, temperature, and power consumption. Monitor cluster nodes continuously and record system health indicators. Label instances based on normal operation and failure events. Clean the raw data by removing duplicates, handling missing values, and correcting inconsistencies. Normalize numerical attributes where necessary. Store the processed data in CSV format. Use the dataset to train and evaluate machine learning models for failure prediction, anomaly detection, and predictive maintenance tasks.

Institutions

Categories

Computer Science, Artificial Intelligence, Data Science, Machine Learning

Licence