De-identified U.S. Occupational Injury Events with Standardized Injury and Industry Classifications
Description
This dataset contains a de-identified, incident-level collection of U.S. occupational injury events compiled from publicly available workplace injury reporting records. The dataset includes standardized categorical classifications for nature of injury (NAT), body part affected (BOD), event or exposure (EVT), and injury source (SRC), along with industry identifiers at both the six-digit and two-digit NAICS levels and derived temporal variables. Supporting lookup tables defining all classification systems used in the dataset are provided.
Files
Steps to reproduce
The dataset was generated using publicly available U.S. workplace injury reporting records obtained from the Occupational Safety and Health Administration Severe Injury Reports data release. Source data were downloaded in tabular format and imported into a local data processing environment. Records were screened to remove duplicate entries and fields containing personally identifiable information or direct calendar dates. Injury characteristics were mapped from source-specific codes and narrative descriptions into standardized categorical classifications for nature of injury (NAT), body part affected (BOD), event or exposure (EVT), and injury source (SRC) using predefined decision rules. Industry identifiers were retained at the six-digit NAICS level and recoded into two-digit NAICS groupings. Derived temporal variables, including month and day of week, were generated from event dates prior to date removal. Indicator variables were created to identify records occurring during defined COVID-19 public health emergency and epidemic periods. Lookup tables defining all standardized classification codes and labels were created alongside the main dataset. The final dataset was exported in CSV and spreadsheet formats, and all supporting lookup tables were saved as plain-text files.