Example Stata syntax and data construction for negative binomial time series regression

Published: 2 November 2022| Version 2 | DOI: 10.17632/3mj526hgzx.2
Sarah Price,


We include Stata syntax (dummy_dataset_create.do) that creates a panel dataset for negative binomial time series regression analyses, as described in our paper "Examining methodology to identify patterns of consulting in primary care for different groups of patients before a diagnosis of cancer: an exemplar applied to oesophagogastric cancer". We also include a sample dataset for clarity (dummy_dataset.dta), and a sample of that data in a spreadsheet (Appendix 2). The variables contained therein are defined as follows: case: binary variable for case or control status (takes a value of 0 for controls and 1 for cases). patid: a unique patient identifier. time_period: A count variable denoting the time period. In this example, 0 denotes 10 months before diagnosis with cancer, and 9 denotes the month of diagnosis with cancer, ncons: number of consultations per month. period0 to period9: 10 unique inflection point variables (one for each month before diagnosis). These are used to test which aggregation period includes the inflection point. burden: binary variable denoting membership of one of two multimorbidity burden groups. We also include two Stata do-files for analysing the consultation rate, stratified by burden group, using the Maximum likelihood method (1_menbregpaper.do and 2_menbregpaper_bs.do). Note: In this example, for demonstration purposes we create a dataset for 10 months leading up to diagnosis. In the paper, we analyse 24 months before diagnosis. Here, we study consultation rates over time, but the method could be used to study any countable event, such as number of prescriptions.



University of Exeter


Early Cancer Detection