Data for: CLAIRE: A combinatorial visual analytics system for information retrieval evaluation

Published: 1 Aug 2018 | Version 1 | DOI: 10.17632/mdwvttzt48.1

Description of this data

We considered the following standard and shared collec- tions, each track using 50 different topics:

• TREC Adhoc tracks T07 and T08: they focus on a news search task and adopt a corpus of about 528K news documents.
• TREC Web tracks T09 and T10: focus on a Web search task and adopt a corpus of 1.7M Web pages.
• TREC Terabyte tracks T14 and T15: focus on a Web search task and adopt a corpus of 125M Web pages.
We considered three main components of an IR system: stop list, stemmer, and IR model. We selected a set of alternative implementations of each component and, by using the Ter- rier v.4.02 open source system, we created a run for each system defined by combining the available components in all possible ways. The selected components are:

• Stop list: nostop, indri, lucene, snowball,
smart, terrier;
• Stemmer: nolug, weakPorter, porter,
snowballPorter, krovetz, lovins;
• Model: bb2, bm25, dfiz, dfree, dirichletlm, dlh, dph, hiemstralm, ifb2, inb2, inl2, inexpb2, jskls, lemurtfidf, lgd, pl2, tfidf.

Overall, these components define a 6 × 6 × 17 factorial design with a GoP consisting of 612 system runs. They represent nearly all the state-of-the-art components which constitute the common denominator almost always present
in any IR system for English retrieval and thus they are a good account of what can be found in many different operational settings.

Experiment data files

peer reviewed

This data is associated with the following peer reviewed publication:

CLAIRE: A combinatorial visual analytics system for information retrieval evaluation

Published in: Information Processing and Management

Latest version

  • Version 1

    2018-08-01

    Published: 2018-08-01

    DOI: 10.17632/mdwvttzt48.1

    Cite this dataset

    Silvello, Gianmaria; Ferro, Nicola; Santucci, Giuseppe; Fazzini, Vanessa; Angelini, Marco (2018), “Data for: CLAIRE: A combinatorial visual analytics system for information retrieval evaluation”, Mendeley Data, v1 http://dx.doi.org/10.17632/mdwvttzt48.1

Categories

Information Retrieval System

Mendeley Library

Organise your research assets using Mendeley Library. Add to Mendeley Library

Licence

CC BY 4.0 Learn more

The files associated with this dataset are licensed under a Creative Commons Attribution 4.0 International licence.

What does this mean?

This dataset is licensed under a Creative Commons Attribution 4.0 International licence. What does this mean? You can share, copy and modify this dataset so long as you give appropriate credit, provide a link to the CC BY license, and indicate if changes were made, but you may not do so in a way that suggests the rights holder has endorsed you or your use of the dataset. Note that further permission may be required for any content within the dataset that is identified as belonging to a third party.

Report