Information Retrieval Dataset - Internet Movie Database (IMDB)

Published: 8 June 2017| Version 2 | DOI: 10.17632/rth2kr5hxf.2
Renato Montaleão Brum Alves


This dataset was constructed for an Information Retrieval research project to obtain a master's degree at the Federal University of Rio de Janeiro (UFRJ). It consists of a collection of nearly 115,000 documents in XML format, being a subset of the Internet Movie Database (IMDB). Each XML file contains the following information about one movie in the collection: · ID · Title · Year · Country · Actors (and their roles) · Actresses (and their roles) · Genre · Color Info · Language · Sound Info · Directors · Writers · Composers · Certificates (by country) · Duration · Shooting location (cities and countries) · Editors · Release date (by country) · Producers · Type (film, TV series, etc.) · Keywords · Plot


Steps to reproduce

In order to use the collection, simply unzip the attached file.


Universidade Federal do Rio de Janeiro


Cinema, Information Retrieval, Search Engine, Web Search Engine, File Searching, Navigation, Cluster Testing, Search