AI and Data Science Academic Programs in the United States (2020–2025): A Normalized Relational Dataset of Universities, Annual Statistics, AI/DS Degree Programs, and Admission Requirements
Description
This dataset contains a normalized relational database of Artificial Intelligence (AI) and Data Science (DS) academic programs offered by selected universities in the United States during the 2020-2025 period. The dataset was created to analyze universities, their annual institutional changes, their AI/DS degree offerings, and the admissions expectations associated with those programs. It includes 20 universities drawn from at least 5 different U.S. states and includes both public and private institutions to provide geographic and institutional diversity. The data is organized into four related tables: University, University_Annual_Stats, Degree_Program, and Admission_Requirement. In its final form, the dataset contains 20 university records, 100 annual statistics records covering five academic years per university, 43 AI/DS degree program records, and 43 linked admission requirement records. The annual statistics table captures general university information such as tuition, student population, acceptance rate, graduation rate, and the number of AI/DS programs recorded in the dataset. The program and admissions tables capture details such as program name, field, degree level, first offered year, duration, class type, minimum GPA, SAT requirement status, and prerequisite expectations. The dataset was compiled from multiple publicly available sources, including official university websites, program pages, admissions pages, and public higher-education reference sources. Data elements were selected based on availability, consistency, and usefulness for modeling relational tables. The final structure reduces redundancy and improves data integrity through the use of primary and foreign key relationships, making the dataset useable for database coursework, SQL practice, relational design analysis, and the study of higher-education offerings for AI and Data Science in the United States. The published files include the normalized spreadsheet tables and the MySQL script used to create and populate the database.
Files
Steps to reproduce
This dataset was created as a structured database using publicly available educational sources. The first step was defining the project scope as AI and Data Science academic programs in the United States over the 2020–2025 period. The dataset was required to include at least 20 universities, at least 5 states, both public and private institutions, and five years of university-level statistics. The second step was designing the relational tables. Four final tables were used: University, University_Annual_Stats, Degree_Program, and Admission_Requirement. Each table was assigned a primary key, and foreign key relationships were defined between related tables. The tables were designed around Third Normal Form (3NF) principles so that institution identity data, annual statistics, program details, and admission requirements were stored separately while remaining linked. *Note: The URL sources break this normalization, but have been left where they are to make reproduction and tracing the sources for each row of information easier. To return to 3NF, please move or remove the URLs as desired. The third step was selecting institutions and gathering data. Universities were chosen to satisfy the geographic and institutional diversity requirements and to ensure that relevant AI or Data Science program information was available from trustworthy public sources. Large portions of the university data were gathered from NCES College Navigator, IPEDS, and CollegeTuitionCompare, which were used to collect institutional attributes such as city, state, institution size, tuition, student population, acceptance rate, and graduation rate. Program and admissions data were gathered primarily from official university websites, including academic catalogs, department pages, graduate program pages, and admissions pages from institutions such as Northeastern University, Boston University, WPI, UC Berkeley, UC San Diego, NYU, UVA, Rice, UT Austin, UF, and Carnegie Mellon. Program attributes such as program name, field, degree level, first offered year, duration, and class type were taken from these official program sources, while admissions-related values such as minimum GPA, SAT requirement status, and prerequisite expectations were taken from official admissions or application pages whenever possible. When a value was not directly published, it was left missing or standardized as “Not specified” rather than guessed. The fourth step was cleaning and organizing the data in spreadsheet form. Separate Excel sheets were used for each table, wording was standardized across rows, and empty values were reviewed so they could later be converted to NULL in MySQL. The annual statistics table was expanded to include five academic years for each university. Finally, the completed workbook was used to generate MySQL table-creation and insert statements so the dataset could be loaded directly into a relational database.
Institutions
- Wentworth Institute of TechnologyMassachusetts, Boston