Russian dataset for the thread reconstruction

Published: 12 June 2023| Version 1 | DOI: 10.17632/7rms5vdhf8.1
Contributor:
Igor Buyanov

Description

The dataset for thread reconstruction task, which is the task where chat messages should be connected in such a way that they construct a meaningful conversation threads. The source of the chats is the Telegram messenger. These data were annotated using Label Studio v 1.7.2. There are four files: * `raw_config/config_*` - the initial config files to annotation. * `project-*` - annotated data. For more info, see the README.md file in the dataset and corresponding paper.

Files

Steps to reproduce

See the corresponding paper and github repo.

Categories

Natural Language Processing, Chat, Russian Language, Natural-Language Understanding

Licence