400 Compositions by EFL Learners of a University in Southwestern China (CEFLLUSC)

Published: 11 April 2022| Version 1 | DOI: 10.17632/jxm2bwkc33.1


This is a self-built EFL learner corpus comprising 400 compositions written by 100 English program students from a University in Southwestern China. The data were collected in the first semester of the academic year 2016-2017, which was the third semester for the students. The students are from 3 classes of the course English Writing. Each student was required to write 1 composition in weeks 1, 3, 5, and 7 respectively of that semester. There are a total of 115 students in the 3 classes. To keep the consistency of the writing samples, the compositions of 15 of the students who cannot complete all the 4 writing tasks are deleted from the dataset. The corpus is made up of: 100 compositions of genre “letters”: 15,185 words 100 compositions of genre “narration”: 19,555 words 100 compositions of genre “causes & effects”: 15,903 words 100 compositions of genre “argumentation”: 17,971 words The total number of words: 68,614 words. The instructions for the 4 writing tasks are as follows: Writing Task 1 (Week 1), Letters Your aunt sent you a sweater you need. Write a thank-you letter to her. You should write more than 150 words. Writing Task 2 (Week 3), Narration Write a memorable experience in your life. You should write about 150 words. Writing Task 3 (Week 5), Causes & Effects What are the reasons why the government of a developing country may want to send students abroad (to a developed country) to study? Writing Task 4 (Week 7), Argumentation Do you think college students should or should not do part-time jobs in their spare time? Write a paragraph to state your opinion. Their compositions are required to be submitted to an English Writing correcting website (www.pigai.org), which is an intelligent online system for correcting and scoring English compositions. By comparing and analyzing students’ compositions with massive standard corpora, it can give feedback on English compositions with scores, general comments, and error correction. The students are aged from 18 to 22. Their demographic information and the writing scores are in the file "Students' Demographic Information.xls". In this file, demographic information of the students includes gender, age, ethnic group, and where they are from: rural or urban areas and the province. It also provides students’ academic performance of the semester before the semester when they wrote the 4 compositions, including final examination scores of courses such as Basic English, College Physical Education, College English Viewing and Listening, English Reading, English Phonetics, and College Chinese. Scores of the students’ writing are also provided. It should be noted that the scores are automatically generated by a writing correction system mentioned above, but not by teachers’ manual marking. Thus, the writing scores should be carefully used for research purposes.



