Enhanced GoCJ: Google Cloud Jobs Dataset
Description
The GoCJ dataset is comprised of multiple files, where each file contains the sizes of a specified number of jobs expressed in Million Instructions (MI), derived from workload behaviors observed in Google cluster traces. The name of each file indicates the number of jobs it contains; for example, GoCJ_Dataset_1000 includes 1000 jobs along with their associated SLA classes and arrival times. In this study, a modified version of the GoCJ dataset is employed. Each dataset file consists of three columns: (i) job length in terms of Million Instructions (MI), (ii) Service Level Agreement (SLA: {1, 2, 3}), representing different levels of priorities, and (iii) job arrival time, which captures realistic workload submission behavior. The experimental evaluation is conducted using the following dataset files: GoCJ_Dataset_1000.csv, GoCJ_Dataset_2000.csv, GoCJ_Dataset_3000.csv, GoCJ_Dataset_4000.csv, GoCJ_Dataset_5000.csv, and GoCJ_Dataset_6000.csv, enabling performance analysis under increasing workload scales.The file Original_Enhanced_Dataset.txt contains the 50 seed job sizes required as input for both the Java-based generator (EnhancedGoCJGenerator.java) and the Excel-based generator (GoCJ_Enhanced_Generator.xlsx) to reproduce datasets of any desired size while preserving the original workload distribution properties. Also the Java based generator java coding file is available online at : https://github.com/Mohsin-Nawaz/Enhanced_GoCJ-Java-Generator
Files
Institutions
- Institute of Space TechnologyIslamabad, Islamabad