Big Data Set from RateMyProfessor.com for Professors' Teaching Evaluation

Published: 4 March 2020| Version 2 | DOI: 10.17632/fvtfjyvw7d.2
Contributor:
Jibo He

Description

This dataset is shared by Dr. Jibo HE, founder of the USEE Eye Tracking Inc. and professor of Tsinghua University. Contact me at hejibo@usee.tech if you need to get the full dataset in over 5G size. This is the dataset from RateMyProfessor.com for professors' teaching evaluation. The dataset is provided in two format. The zip file contains many single .csv files, with each file for a single professor's webpage. The RateMyProfessor_Sample data.csv is the combined version with many professors' information in a single csv file. We have crawled almost one million professors' information. The total dataset is over 5. The dataset crawled and extracted from RMP has 18 variables. This part briefly describes each variable that needs to be analyzed.  Professor name: name of the professor who is rated  School name: university where the professor is currently teaching  Department name: currently working there  Local name: university’s locally known as  State name: state which the university is located in  Year since first review: the professor's teaching age, from the first student evaluation to the time when we did the analysis in year 2019.  Star rating: the star rating of the professor's overall quality, the point 3.5-5.0 is good, 2.5-3.4 is average and 1.0-2.4 is poor according to RMP’s official standard. This star rating is the average score given to professors by all student comments;  Take again: percentage of students who want to choose this course again;  Difficulty index: The difficulty level of a course. Point 1 is easiest, and point 5 is hardest. The difficulty index is the average score given to professors by all students;  Tags: the tag students chose to describe a professor;  Post date: the date when the student posted an evaluation of a course;  Student star: each student gives a star rating to a professor;  Student-rated difficulty: every student gives difficulty index to a professor;  Attendance: whether a course is mandatory or not;  For credit: whether students chose a course for credit (yes or no);  Would take again: whether students would like to choose a course again (yes or no)  Grade: student’s final score of a course, such as A+, A, A-, B+, B, B-, C+, C, C-, D+, D, D-, F, WD, INC, Not, Audit/No. “WD” is Drop/Withdrawal. “INC” means Incomplete. “Not” is Not sure yet, and “Audit/No” is Audit/No Grade.  Comment: comments that students gave for professors.

Files

Steps to reproduce

Using web crawling techniques, 1.8 million original web pages on the RMP website were collected using Python crawler code from April 2018 to July 2018. The original web pages are in the form of HTML and contains a lot of missing values; thus, the raw data were cleaned and preprocessed using Python. Professors can apply for removing their profiles in RMP if comments are against guidelines. RMP also removes professors if professors are no longer teaching at a university. The website was updated in 2015. Therefore, there are many blank pages on the website. Finally, the dataset contains 9,543,998 rows of comment records with valid information for 919,750 professors.

Institutions

Tsinghua University, Peking University

Categories

Big Data, University Student, Online Teaching

License