Dataset for classifying English words into difficulty levels by undergraduate and postgraduate students

Published: 24 October 2023| Version 4 | DOI: 10.17632/p2wrs7hm4z.4
Nisar Kangoo


The dataset contains English words in column B. Corresponding to each word the other columns contain its frequency(fre), length(len), parts of speech(PS), the number of undergraduate students which marked it difficult (difficult_ug) and the number of postgraduate students which marked it difficult (difficult_pg).The dataset has a total of 5368 unique words. The words marked as difficult by undergraduate students are 680; and those marked as difficult by postgraduate students are 151; all the remaining words, viz., 4537, are easy and hence are not marked as difficult either by undergraduate and postgraduate students. The word against which there is hyphen (-) in difficult_ug column means that this word is not present in the text circulated to undergraduate students. Likewise hyphen(-) in difficult_pg column means words not present in text circulated to postgraduate students. The data is collected from the students of Jammu and Kashmir (a Union Territory of India). Latitude and Longitude (32.2778° N, 75.3412° E) The description of files attached is as: The dataset_english CSV file is the original dataset containing English words, its length, frequency, Parts of speech, number of undergraduate and postgraduate students which marked the particular words as difficult. The dataset_numerical CSV file contains the original dataset along with string fields transformed into numerical. The English language difficulty level measurement -Questionnaire (1-6) & PG1,PG2,PG3,PG4 .docx files contains the questionnaire supplied to students of College and University to underline difficult words in the English text. IGNOU file contains the Indra Gandhi National Open University (IGNOU) English text books for graduation and post graduation students. The text for above questionnaires were taken from these IGNOU English text books.



Lovely Professional University Faculty of Technology and Sciences


Machine Learning, Computer