Dataset of sentences annotated with complex words and their synonyms to support lexical simplification

Published: 8 March 2021| Version 2 | DOI: 10.17632/ywhmbnzvmx.2
Contributor:
RODRIGO ALARCON

Description

These datasets are part of the EASIER corpus and include instances to evaluate the Generation and Selection tasks pertaining to lexical simplification. These datasets consist of a target word and proposed synonyms, along with their necessary metadata, such as the sentence in which the word appears and start and end offsets to locate it. The smallest dataset contains 575 instances in which a word contains three proposed substitutes, while the full dataset contains over 5000 instances with at least one proposed substitute. For more information, address our git repository https://github.com/LURMORENO/EASIER_CORPUS

Files

Institutions

Universidad Carlos III de Madrid - Campus de Leganes

Categories

Natural Language Processing, Lexical Processing, Lexical Semantics

Licence