Ground truth dataset: objectionable web contents
This is a ground truth dataset that contains 8,000 labelled websites with 4,000 objectionable websites and 4,000 unobjectionable websites. These websites consist of more than 2 million web pages. The dataset contains two files. The "metadata.json" file gives an overview of the websites and their features. The "webpages_detail.json" file gives detailed information on each collected website's web pages (internal URLs) and features.
Steps to reproduce
This is a JSON format dataset, which can be extracted by almost all programming languages and various tools to view, modify, and use.