Small Corpus of Colombian English as a Second Language Essays (SCoCESLE)
Description
The Small Corpus of Colombian English as a Second Language Essays (SCoCESLE) can be classified as a small learner corpus. SCoCESLE is made up of 272 argumentative essays written by Colombian English as a Second Language (ESL) learners. It has a total of 81,994 tokens, 6,057 types, and 5,161 lemmas. Each essay has an average length of about 270 words. Essay topics include gender-related issues, education, information technology, environmental problems, personality traits, poverty, genetic engineering, globalisation, pets ownership, compulsory vaccination, transportation, compulsory military conscription, immigration, job satisfaction, economy, foreign language learning, and employees working conditions. The texts in the corpus were written by male (n=157), female (n=114) and gender fluid (n=1) adult learners (i.e., 18+). The learners’ first language is Colombian Spanish. The corpus is unannotated and is divided into a lower proficiency sub-corpus (n=133) and a higher proficiency one (n=139). The data presented here includes: 1. The corpus manual (.pdf) 2. The corpus metadata (.xls) 3. The corpus of 272 unannotated plain texts (.txt) 4. The sub-corpus of 133 lower proficiency texts (.txt) 5. The sub-corpus of 139 higher proficiency texts (.txt)