A Coral-Reef Approach to Extracting Information from HTML Tables

Published: 22 March 2021| Version 3 | DOI: 10.17632/87gr74cr4r.3
Patricia Jimenez


This dataset describes the on-line materials that accompany article "A Coral-Reef Approach to Extracting Information from HTML Tables", by Patricia Jiménez, Juan C. Roldán, and Rafael Corchuelo. The materials are provided in a zip file that contains the following folders: - "DATA": contains the original HTML tables from which to extract the information as well as some data to configurate the different proposals.. - "NOTEBOOK": it is a Jupyter Notebook that provides the python code required to run and test Coraline. There is a "launch.cmd/sh" script that launches the experimentation according to the operating system. There is also a README.txt file and a requirements.txt file. The former contains the instructions to launch the notebook. The latter provides a number of packages that should be installed prior to launch the notebook. Note that the folder called "output" contains the csv files with the results achieved regarding effectiveness and efficiency for every competitor, which are already implemented in the notebook.



Universidad de Sevilla


Information Extraction, Clustering, Coral Reef