HEP-Frame: Improving the efficiency of pipelined data transformation & filtering for scientific analyses
Software to analyse very large sets of experimental data often relies on a pipeline of irregular computational tasks with decisions to remove irrelevant data from further processing. A user-centred framework was designed and deployed, HEP-Frame, which aids domain experts to develop applications for scientific data analyses and to monitor and control their efficient execution. The key feature of HEP-Frame is the performance portability of the code across different heterogeneous platforms, due to a novel adaptive multi-layer scheduler, seamlessly integrated into the tool, an approach not available in competing frameworks. The multi-layer scheduler transparently allocates parallel data/tasks across the available heterogeneous resources, dynamically balances threads among data input and computational tasks, adaptively reorders in run-time the parallel execution of the pipeline stages for each data stream, respecting data dependencies, and efficiently manages the execution of library functions in accelerators. Each layer implements a specific scheduling strategy: one balances the execution of the computational stages of the pipeline, distributing the execution of the stages of the same or different dataset elements among the available computing threads; another controls the order of the pipeline stages execution, so that most data is filtered out earlier and later stages execute the computationally heavy tasks; yet another adaptively balances the automatically created threads among data input and the computational tasks, taking into account the requirements of each application. Simulated data analyses from sensors in the ATLAS Experiment at CERN evaluated the scheduler efficiency, on dual multicore Xeon servers with and without accelerators, and on servers with the many-core Intel KNL. Experimental results show significant improved performance of these data analyses due to HEP-Frame features and the codes scaled well on multiple servers. Results also show the improved HEP-Frame scheduler performance over the key competitor, the HEFT list scheduler. The best overall performance improvement over a real fine tuned sequential data analysis was impressive in both homogeneous and heterogeneous multicore servers and in many-core servers: 81x faster in the homogeneous 24+24 core Skylake server, 86x faster in the heterogeneous 12+12 core Ivy Bridge server with the Kepler GPU, and 252x faster in the 64-core KNL server.