Codebook and Annotated Dataset for LLM Career Guidance Analysis Across Ten African Countries

Published: 26 January 2026| Version 1 | DOI: 10.17632/xt42c2jfdg.1
Contributors:
,

Description

This dataset contains the annotated responses, coding schema, and reliability statistics used in a comparative analysis of large language models (LLMs) for computing career guidance across ten African countries. The underlying research examined how different LLMs articulate technical and professional competencies for entry-level computing roles and the extent to which these recommendations reflect local contextual factors. The dataset includes 60 LLM-generated responses (six models × ten countries) produced using a standardized prompt. Models analyzed include ChatGPT-4, Claude 3.5 Sonnet, Gemini 1.5 Pro, LLaMA 3, DeepSeek-V2, and Mistral. Countries represented are Egypt, South Africa, Senegal, Tunisia, Kenya, Nigeria, Ghana, Benin, Zambia, and Morocco. All responses were collected in English within a fixed time window and without follow-up prompts. Responses were analyzed using a conceptual content analysis approach informed by the CC2020 Computing Curricula framework. Coding captured both technical competencies (e.g., programming, algorithms, AI/ML, cloud computing) and professional competencies (e.g., adaptability, teamwork, communication, ethics). In addition, a contextual awareness analysis assessed the presence of country-specific references, including local technology ecosystems, language or cultural considerations, national policies, and institutional mentions. Coding reflects the presence of concepts rather than strict keyword matching. The dataset also includes a rubric-based scoring sheet evaluating response quality across four dimensions (technical coverage, contextual awareness, skills balance, and depth), inter-rater reliability (IRR) data from dual coders, pooled and category-level reliability statistics, and aggregated summaries. Together, these materials enable replication of the reported analyses, secondary analysis of LLM behavior in career guidance contexts, and methodological reuse of the coding framework. The dataset is intended for research and educational purposes and reflects model behavior at the time of data collection.

Files

Steps to reproduce

1. Prompting and Data Collection: Use the standardized prompt provided in the accompanying documentation to generate career guidance responses for entry-level computing roles. Issue the prompt once per country and model combination (six models × ten countries). Collect responses in English using default model settings and no follow-up prompts. Responses should be saved verbatim. 2. Response Organization: Organize responses in a tabular format with columns for country, model, and full response text. Each row should represent one model–country response. 3. Conceptual Coding: Apply the provided codebook to conduct conceptual content analysis of each response. Code for the presence of technical competencies, professional competencies, and contextual indicators. Coding is concept-based rather than keyword-based; both explicit and implicit mentions should be captured. Multiple mentions of the same concept within a response may be counted when they represent distinct conceptual units. 4. Contextual Awareness Coding: For each response, code binary indicators (Yes/No) for local technology industry references, language or cultural considerations, national policy mentions, and local institutional references. 5. Rubric-Based Scoring: Use the scoring rubric included in the dataset to assign qualitative scores (0–5) across four dimensions: technical coverage, contextual awareness, skills balance, and depth of answer. Compute overall performance scores using the specified weighted formula. 6. Inter-Rater Reliability: Have at least two independent coders apply the codebook to a subset or full set of responses. Compute percent agreement and Cohen’s kappa for each category, as well as pooled and category-level reliability statistics, following the structure provided in the dataset. 7. Aggregation and Analysis: Aggregate coded frequencies, coverage rates, and scores across models and countries to reproduce summary tables and figures. Interpret frequency counts as indicators of emphasis rather than quality, which is assessed separately through rubric-based scoring.

Categories

Artificial Intelligence, Educational Technology, Applied Computing, Computer in Education, Natural Language Processing, Higher Education, Career Development, Human-Computer Integration, Large Language Model

Licence