Relevant Image Dataset

Name: Relevant Image Dataset
Creator: Hayri Volkan Agun
Published: 2020-12-22T07:12:50.371Z
Keywords: Information Retrieval, Machine Learning, Web Mining, Feature Extraction, Text Processing

Agun, Hayri Volkan; Uzun, Erdinç

doi:10.17632/mbk294tthf.1

Relevant Image Dataset

Published: 22 December 2020| Version 1 | DOI: 10.17632/mbk294tthf.1

Contributors:

Hayri Volkan Agun, Erdinç Uzun

Description

The dataset contains relevant and irrelevant image tags of Web pages of 125 different domains. The image dataset contains the web domain, file number, the text of image HTML element, attributes of image elements, the size attributes, the parent HTML element of the image, and relevancy of the image. Each Web domain contains 100 Web pages with varying number of image elements.

Files

Steps to reproduce

The file contain image tags with quotes, so the regular CSV readers may split the lines inside an image tag. In each line each image tag should be detected and replaced with an empty symbol first, later the line can be split. After the split the image tag can be attached. Each line corresponds to a sample or an image element. -- Note that each domain should be trained and tested separately.

Institutions

Bursa Teknik Universitesi
Namik Kemal Universitesi

Relevant Image Dataset

Description

Files

Steps to reproduce

Institutions

Categories

Licence