TabbyXL: Dataset for the Performance Evaluation of a Software Platform for Rule-Based Spreadsheet Data Extraction and Transformation

Published: 16 December 2019| Version 6 | DOI: 10.17632/ydcr7mcrtp.6
Alexey Shigarov


This dataset is designed to evaluate TabbyXL (version 1.1.0), a software platform for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at [GitHub]( The dataset provides all required data to reproduce the performance evaluation including the program running and automatic performance evaluation of TabbyXL. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). This demonstrates that TabbyXL can be used for developing programs for the transformation of spreadsheet data into the relational form. file included in this dataset provides a detail description of the data and steps to reproduce the experiment.


Steps to reproduce

All steps to reproduce the experiment are presented in file included in the dataset.


Institut dinamiki sistem i teorii upravlenia imeni V M Matrosova SO RAN


Spreadsheet, Document Analysis, Data Integration, Information Extraction, Database