Data for: Machine Learning based Heterogeneous Web Advertisements Detection Using a Diverse Feature Set

Published: 29 Jun 2018 | Version 1 | DOI: 10.17632/5bzh52txpn.1
Contributor(s):

Description of this data

Advertisement identification and filtering in web pages gain significance due to various factors such as accessibility, security, privacy, and obtrusiveness. Current practices in this direction involve maintaining URL-based regular expressions called filter lists. Each URL obtained on a web page is matched against this filter list. While effectual, this procedure lacks scalability as it demands regular continuance of the filter list. To counter these limitations, we devise a machine learning based advertisement detection system using a diverse feature set which can distinguish advertisement blocks from non-advertisement blocks. The method can act as a base to provide various accessibility-related features like smooth browsing and text summarization for persons with visual impairments, cognitive impairments, and photosensitive epilepsy. The results from a classifier trained on the proposed feature set achieve 93.4% accuracy in identifying advertisements.

Experiment data files

This data is associated with the following publication:

Machine learning based heterogeneous web advertisements detection using a diverse feature set

Published in: Future Generation Computer Systems

Latest version

  • Version 1

    2018-06-29

    Published: 2018-06-29

    DOI: 10.17632/5bzh52txpn.1

    Cite this dataset

    Kuppusamy, KS; Shaqoor Nengroo, Ab (2018), “Data for: Machine Learning based Heterogeneous Web Advertisements Detection Using a Diverse Feature Set”, Mendeley Data, v1 http://dx.doi.org/10.17632/5bzh52txpn.1

Categories

Computer Science, World Wide Web, Machine Learning, Accessibility Issue

Mendeley Library

Organise your research assets using Mendeley Library. Add to Mendeley Library

Licence

CC BY NC 3.0 Learn more

The files associated with this dataset are licensed under a Attribution-NonCommercial 3.0 Unported licence.

What does this mean?

You are free to adapt, copy or redistribute the material, providing you attribute appropriately and do not use the material for commercial purposes.

Report