TabbyXL2: Experiment Data

Published: 13 August 2018| Version 3 | DOI: 10.17632/ydcr7mcrtp.3
Contributors:
Alexey Shigarov,

Description

This dataset is designed to evaluate TabbyXL2, v1.0.1., a tool for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at GitHub (https://github.com/cellsrg/tabbyxl2/releases/tag/v1.0.1). Our source data are based on the existing dataset of tables called Troy_200 (http://tc11.cvc.uab.es/datasets/Troy_200_1) that contains 200 arbitrary tables as CSV files collected from 10 different government statistical websites. We use its earlier version that stores the original tables with style features (fonts, alignment, and indentation) as Excel spreadsheets (available at http://tango.byu.edu/data). The dataset contains the following material: 1. All of Troy_200 tables with style features put into a single spreadsheet file; 2. The ground-truth data we prepared for the automatic performance evaluation of TabbyXL2 in the role and structural stages of the table analysis; 3. CRL and CLP rulesets designed for transforming Troy_200 arbitrary tables into the relational form; 4. The log files with the results of the program running and with the results of the performance evaluation of TabbyXL2. The dataset provides all required data to reproduce the automatic performance evaluation of TabbyXL2, using three following options: 1. TabbyXL2 automatically generates Java source code from CRL rules with our CRL interpreter and compile it to Java byte code, and then runs this generated program with JRE. 2. TabbyXL2 automatically maps CRL rules to DRL ones with the DSL specification and runs the executing them with Drools Expert (https://www.drools.org) rule engine. 3. TabbyXL2 runs the executing CLP ruleset corresponding to our CRL ruleset with JESS (http://www.jessrules.com) rule engine. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). The experiment demonstrates that our tool, TabbyXL2, can be used for developing programs for the transformation of spreadsheet data into the relational form. README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment.

Files

Steps to reproduce

All steps to reproduce the experiment are presented in README.md file included in the dataset.

Categories

Spreadsheet, Document Analysis, Data Integration, Information Extraction, Database

Licence