Dataset for classifying English words into difficulty levels by undergraduate and postgraduate students
Description
The dataset contains English words in column B. Corresponding to each word the other columns contain its frequency(fre), length(len), parts of speech(PS) and difficulty level (level).The dataset has a total of 5372 unique words. The words marked as difficult at level 1 are 691; at level 2, they are 141; and all remaining words, viz., 4541, are easy and hence have difficulty level 0. The words are labeled "level 2" if they are difficult for post-graduate students, and "level 1" if they are difficult for undergraduate students. The words are labeled "level 0" if they are neither difficult for undergraduate students nor postgraduate students. The data is collected from the students of Jammu and Kashmir (a Union Territory of India). Latitude and Longitude (32.2778° N, 75.3412° E) The description of files attached is as: The dataset_level CSV file is the original dataset containing English words, its length, frequency, Parts of speech and Level(difficulty level). The dataset_numerical CSV file contains the original dataset along with string fields transformed into numerical. The English language difficulty level measurement -Questionnaire (1-6) & PG1,PG2,PG3,PG4 .docx files contains the questionnaire supplied to students of College and University to underline difficult words in the English text. IGNOU English.zip file contains the Indra Gandhi National Open University (IGNOU) English text books for graduation and post graduation students. The text for above questionnaires were taken from these IGNOU English text books.