Data Science Academic Programs in U.S. Universities

Published: 15 April 2026| Version 2 | DOI: 10.17632/5kc648nbgr.2
Contributors:
Jenny Li Wang,

Description

This dataset contains structured information about different Data Science academic programs offered by universities within the United States. It was created to help analyze and compare universities based on their degree programs, admission requirements, and academic statistics in recent years. The data was collected from multiple sources including College Scorecard, College Board BigFuture, U.S. News & World Report, and Center for World University Rankings (CWUR). Additional information was obtained and collected by directly checking on each individual university website to ensure that the data was accurate. The dataset is organized into four related tables. The Universities table includes basic information about each university like the university name, location (city, state), type (public, private), and total enrollment. The Degree_Program table provides details such as program name, degree level (Bachelor, Master, Certificate, PhD), and the duration of the program. The Admission table includes admission requirements like minimum GPA, whether SAT is required when applying, application fee, and acceptance rate. Lastly, University_Statistics table contains annual data such as academic year, ranking, total students, and tuition fees. The dataset is designed using a relational database structure to keep information organized and reduce duplicate data. Each table is linked through the usage of primary and foreign keys. This dataset can be used to compare universities, study trends in tuition and rankings, and analyze differences in admission requirements. It is useful for students, researchers, and anyone who is interested in higher education data.

Files

Steps to reproduce

This dataset was created by collecting information about different Data Science programs from several public websites including College Scorecard, College Board BigFuture, U.S. News 7 World Report, Center for World University Rankings (CWUR), and each individual university official website. These sources provided data on universities, academic programs, admission requirements, rankings, tuition fees, and student enrollment numbers. The data was collected and organized in Microsoft Excel. Later, after all data was collected, it was cleaned and adjusted for missing values and removing duplicates. Next, the data was converted into a relational database structure using normalization rules. The dataset was split into four tables: Universities, Degree_Program, Admission, and University_Statistics. Primary keys were added to uniquely identify each record, and foreign keys were used to connect the tables together. The database was implemented using MySQL. SQL commands were written to create the database itself as well as the tables, define keys, and insert the cleaned data. Additionally, data was carefully matched to its correct type to ensure good structure. Finally, the database was tested using SQL queries to ensure that all data exist and all relationships worked correctly. Queries were run to join tables, check programs listings by university, and review tuition and ranking data. Once the testing step was complete, the dataset was finalized and prepared for publishing with proper documentation to allow others to understand and reproduce the work.

Institutions

Categories

University, Education

Licence