JSON Datasets for Exploratory OLAP

Published: 7 Jul 2017 | Version 1 | DOI: 10.17632/ct8f9skv97.1

Description of this data

These datasets has been used to evaluate the EXODuS approach: EXploratory OLAP over Document Stores.

  • The games dataset has been collected by Sports Reference LLC. It contains around 32K nested documents representing NBA games in the period 1985-2013. Each document represents a game between two teams with at least 11 players each. It contains 47 attributes; 40 of them are numeric and represent team and player results.

  • The DBLP dataset contains 2M documents scraped from DBLP in XML format and converted into JSON. Documents are flat and represent eight kinds of publications including conference proceedings, journal articles, books, thesis, etc. The third portion of the dataset represent author pages, containing half the number of fields compared to other kinds. So, documents have shared attributes such as title, author, type, year and unshared ones such as journal and booktitle.

  • The Twitter dataset contains 2M tweets scraped from the Twitter API. Each document represents a tweet message and its metadata, which contains some nested objects: a user object that represent the author of the tweet, a place object that gives its location and a retweet object if it is a reply. The dataset is heterogeneous and mixes between tweets and documents of an API call for tweet deletes.

The sources of the datasets are listed in the Related links Section.

Experiment data files

Related links

This data is associated with the following publication:

EXODuS: Exploratory OLAP over Document Stores

Published in: Information Systems

Latest version

  • Version 1


    Published: 2017-07-07

    DOI: 10.17632/ct8f9skv97.1

    Cite this dataset

    Chouder, Mohamed L.; Rizzi, Stefano; Chalal, Rachid (2017), “JSON Datasets for Exploratory OLAP”, Mendeley Data, v1 http://dx.doi.org/10.17632/ct8f9skv97.1


Views: 671
Downloads: 241


Information Systems, Big Data


CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?
You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.