ClusterSense-SS: A Large-Scale Smart Scheduling Dataset for Intelligent Cluster Resource Management

Published: 23 June 2026| Version 1 | DOI: 10.17632/wt86wfky46.1
Contributors:
,

Description

The ClusterSense-SS dataset provides a large-scale collection of job scheduling and resource allocation records from CPU cluster computing systems. The dataset includes information related to job characteristics, resource requirements, execution times, queue waiting times, priority levels, CPU and memory allocations, and scheduling outcomes. It is intended to facilitate research in intelligent workload scheduling, resource optimization, and AI-driven cluster management. This dataset can be used to develop and evaluate machine learning models for efficient task scheduling, reduced execution latency, balanced resource utilization, and improved overall system performance in high-performance computing environments.

Files

Steps to reproduce

Generate or collect workload scheduling records from a CPU cluster environment. Record job-related attributes such as job priority, requested CPU cores, memory requirements, execution time, waiting time, and resource allocation. Track scheduling decisions and execution outcomes. Perform data cleaning to remove incomplete or inconsistent records. Convert categorical variables into machine-readable formats if necessary. Save the processed dataset in CSV format. Apply scheduling and optimization algorithms or machine learning techniques to evaluate resource allocation and scheduling performance.

Institutions

Categories

Computer Science, Artificial Intelligence, Machine Learning, Resource Management

Licence