Dataset of sentences annotated with complex words and their synonyms to support lexical simplification

Published: 08-03-2021| Version 2 | DOI: 10.17632/ywhmbnzvmx.2


These datasets are part of the EASIER corpus and include instances to evaluate the Generation and Selection tasks pertaining to lexical simplification. These datasets consist of a target word and proposed synonyms, along with their necessary metadata, such as the sentence in which the word appears and start and end offsets to locate it. The smallest dataset contains 575 instances in which a word contains three proposed substitutes, while the full dataset contains over 5000 instances with at least one proposed substitute. For more information, address our git repository