Student Career Dataset

Published: 4 April 2024| Version 1 | DOI: 10.17632/4spj4mbpjr.1
Sakir Hossain Faruque,


The research aims to investigate the relationship between students' reported skills and their chosen career fields, hypothesizing that students with specific skills are more likely to pursue careers aligned with those skills. Data Description: The dataset comprises two main columns: "Skill" and "Career." The "Skill" column contains textual data representing the skills reported by students, such as "web development," "mobile app development," etc. The "Career" column contains textual data representing the career fields students aspire to enter, such as "Development," "Software Engineering," etc. The data were collected through surveys or self-reported information provided by students. Analysis and Findings: Analysis of the data reveals notable findings indicating strong associations between students' reported skills and their chosen career fields. For instance, students who reported skills in "web development" often expressed interest in careers related to "Development" or "Software Engineering." Similarly, students with skills in "mobile app development" tended to aspire to careers in "Mobile Application Development" or "Software Engineering." The data suggest that students' reported skills play a significant role in shaping their career aspirations. Interpretation and Implications: The data provide valuable insights for educators, career counselors, and students themselves. Educators and counselors can use this information to provide tailored guidance to students based on their reported skills, helping them align their skill sets with suitable career paths. Students can also benefit from understanding the correlation between their skills and potential career opportunities, enabling them to make more informed decisions about their academic and professional trajectories. Overall, the data shed light on the importance of skill development and its impact on career choices in the field of computer science and software engineering.


Steps to reproduce

1. Data Collection: We collected data from various private and public universities in Bangladesh through Google Forms. Here mainly targeted Computer Science (CS) and Software Engineering (SWE) students to collect their career-related information such as their career goal, skills, interests, and skill-based activities. 2. Data Pre-process: Here we manually pre-process data by categorizing sub-fields under some master field and marge the column from the raw data. Then we apply some Python libraries to pre-process dataset so that the machine can easily find out the result. 3. Data Analysis: After preprocessing, the dataset was analyzed using statistical and text-mining techniques to identify patterns and relationships between students' reported skills and their chosen career fields. This analysis involved techniques such as frequency analysis, association rule mining, and text classification. 4. Data Interpretation: The results of the analysis were interpreted to conclude the relationship between students' skills and career aspirations. This interpretation involved identifying significant associations and trends in the data and discussing their implications for career guidance and academic planning in the field of computer science and software engineering. 5. Reproducibility: To reproduce the research, the same survey instrument and methodology can be employed to collect data from a similar population of computer science and software engineering students. The collected responses can then be processed and analyzed using the same or similar data processing and analysis techniques outlined above. Ensuring transparency in data collection, preprocessing, analysis, and interpretation is essential for reproducibility, as it allows other researchers to validate the findings and build upon the work.


Computer Science, Software Engineering, Natural Language Processing, Skill Development, Student Motivation, Career Planning in Decision Making