Data for: Improving Named Entity Recognition in Noisy User-generated Text with Local Distance Neighbor Feature

Published: 31 March 2020| Version 1 | DOI: 10.17632/nsfdt6m47j.1
Mhd Wesam AL-NABKI


NUToT Dataset (Noisy User-generated Text on Tor) Name: Noisy User-generated Text on Tor Acronym: NUToT Description: The data is annotated for Named Entity Recognition (NER) task, and it involves six categories: Person, Location, Group, Creative work, Corporation, and Product. The Text comes from the domains of two categories of DUTA dataset (DUTA DATASET: They are Drugs and Weapons. The dataset has 851 Sentences with 1200 named entities. The dataset is also available on our group website:



Natural Language Processing