Russian dataset for the thread reconstruction
Published: 12 June 2023| Version 1 | DOI: 10.17632/7rms5vdhf8.1
Contributor:
Igor BuyanovDescription
The dataset for thread reconstruction task, which is the task where chat messages should be connected in such a way that they construct a meaningful conversation threads. The source of the chats is the Telegram messenger. These data were annotated using Label Studio v 1.7.2. There are four files: * `raw_config/config_*` - the initial config files to annotation. * `project-*` - annotated data. For more info, see the README.md file in the dataset and corresponding paper.
Files
Steps to reproduce
See the corresponding paper and github repo.
Categories
Natural Language Processing, Chat, Russian Language, Natural-Language Understanding