Enhanced GoCJ: Google Cloud Jobs Dataset

Published: 20 April 2026| Version 4 | DOI: 10.17632/r93dfjvvv6.4
Contributors:
,
, SHAHID KAMAL, Muazzam A Khan Khattak, Toqeer Ali Syed

Description

The GoCJ dataset is comprised of multiple files, where each file contains the sizes of a specified number of jobs expressed in Million Instructions (MI), derived from workload behaviors observed in Google cluster traces. The name of each file indicates the number of jobs it contains; for example, GoCJ_Dataset_1000 includes 1000 jobs along with their associated SLA classes and arrival times. In this study, a modified version of the GoCJ dataset is employed. Each dataset file consists of three columns: (i) job length in terms of Million Instructions (MI), (ii) Service Level Agreement (SLA: {1, 2, 3}), representing different levels of priorities, and (iii) job arrival time, which captures realistic workload submission behavior. The experimental evaluation is conducted using the following dataset files: GoCJ_Dataset_1000.csv, GoCJ_Dataset_2000.csv, GoCJ_Dataset_3000.csv, GoCJ_Dataset_4000.csv, GoCJ_Dataset_5000.csv, and GoCJ_Dataset_6000.csv, enabling performance analysis under increasing workload scales.The file Original_Enhanced_Dataset.txt contains the 50 seed job sizes required as input for both the Java-based generator (EnhancedGoCJGenerator.java) and the Excel-based generator (GoCJ_Enhanced_Generator.xlsx) to reproduce datasets of any desired size while preserving the original workload distribution properties. Also the Java based generator java coding file is available online at : https://github.com/Mohsin-Nawaz/Enhanced_GoCJ-Java-Generator

Files

Institutions

Categories

Cloud Computing, Cloud Computing Environment

Licence