TOEFL iBT & Academic English Vocabulary Dataset with Lexical and Contextual Features
Description
This dataset contains 1,000 highly curated academic English words specifically designed for TOEFL iBT preparation and Natural Language Processing (NLP) research. Each instance in the dataset is enriched with both lexical features and contextual text, making it highly suitable for multi-class text classification, clustering, and word difficulty prediction tasks. Features include: word: The academic English vocabulary word. pos: Part of speech (noun, verb, adjective, etc.). difficulty: A numeric scale representing the complexity and rarity of the word (can be used as a target variable). theme: The academic discipline or topic context where the word is most frequently used (e.g., Economics, Biology, Physics - target variable). synonyms: A list of similar words. definition_en: A comprehensive English definition. example_sentence: A highly contextual academic sentence utilizing the word. Source: This dataset was generated and curated for the WordLevel platform, an AI-powered vocabulary learning application. For more information or to see these words in an interactive learning environment, please visit WordLevel.net
Files
Steps to reproduce
The vocabulary list was compiled by analyzing high-frequency words from academic texts and TOEFL iBT preparation materials. Definitions, synonyms, and example sentences were curated and verified to provide high-quality contextual data for the WordLevel platform.