Colorado Benzene Point Sources 2008-2020

Published: 22 May 2025| Version 1 | DOI: 10.17632/xznpb436jh.1
Contributors:
,
,

Description

This dataset in shapefile format includes point sources emitting benzene in the state of Colorado as reported to the United States Environmental Protection Agency (EPA) National Emissions Inventory (NEI) from 2008-2020. Report years are 2008, 2011, 2014, 2017, and 2020. Benzene is a hazardous air pollutant (HAP) that is a constituent of petroleum products and emitted during extraction, production, and combustion activities. Benzene is classified as a known human carcinogen. This dataset can be used for estimating benzene exposure in the state of Colorado based on location and can serve as a template for creating similar datasets for point sources of benzene in other states. The EPA NEI compiles estimated emissions of Criteria Air Pollutants (CAP) and HAPs from point, nonpoint, on-road, nonroad, and fire sources every three years, starting with 2008. The point emissions data is derived from annual emissions estimates provided by each facility to each state, which then reports these amounts to the EPA. Point source emissions data of benzene were downloaded from publicly available US EPA NEI datasets [https://www.epa.gov/air-emissions-inventories/national-emissions-inventory-nei]. These comma-separated value (.csv) files contain reported benzene point emissions data (in pounds) from Colorado facilities in the calendar years 2008, 2011, 2014, 2017, and 2020. Facilities on tribal lands were not included in these state reports and are not included in this dataset.

Files

Steps to reproduce

Each NEI facility is assigned a unique Emissions Inventory System identifier (EIS_ID) by the EPA. All of the sites included the following variables: FACILITY, ADDRESS, CITY, COUNTY, ZIP_CODE and NAICS_CODE (standardized North American Industry Classification System number indicating the economic purpose of the site). Some of the sites also contained information for the following variables: FACILITY_T (facility type), NAICS_DESC (naics description), LAT (latitude), and LON (longitude). Using R Studio v4.4.1 and R tidyverse package v2.0.0, the five NEI benzene datasets were joined, using EIS_ID as a join key, resulting in 11281 sites reporting benzene emissions in at least one of the five reporting years. The variable BNZ_AVG_LB was calculated from the average of the reported annual emissions for each site, not including the years without reported emissions. The variable BNZ_MAX_LB was calculated and lists each site’s maximum reported emissions. Only 8836 sites had latitude/longitude coordinates provided by the EPA. Geoapify’s online geocoding tool [www.geoapify.com/tools/geocoding-online] was used to geocode the street address of 82 additional sites. Many of the remaining site addresses were listed as Public Land Survey System (PLSS) coordinates using ordinal directions indicating the sixteenth of a numbered Section, the Township number and direction, the Range number and direction, and the reference Prime Meridian. The various text formats of the PLSS information were parsed into section, township, and range integers and directions. Using this standardization of PLSS data, 776 oil and gas facilities were matched to latitude and longitude coordinates provided by the Colorado ECMC [https://ecmc.state.co.us/data.html]. ArcGIS Pro v3.3 was used to join an additional 1171 sites to polygons in a CO PLSS Intersected Survey Grid GIS shapefile from US Bureau of Land Management (BLM) [https://gbp-blm-egis.hub.arcgis.com/datasets/f038096492b9490082f9ddfe6a8889f9_4/explore]. The centroids of these polygons were calculated to provide specific coordinates for these sites. The coordinates for the remaining 341 sites with PLSS addresses were identified online [www.randymajors.org/township-range-on-google-maps]. An additional 26 sites with street addresses were located using Google Earth Pro v7.3.6.9796. There were 22 sites with either incomplete addresses or non-existent PLSS coordinates. The variable LL_DATA_SO indicates how the coordinates of each site were obtained. As PLSS coordinates refer to a plot of land with an area of approximately 40 acres, or a square 402m to a side, the potential margin of error for sites located at the edge of the plot could be upwards of 285m from the plot’s central coordinates. Google Earth was used to check a random sample of 100 sites with the site coordinates provided by the EPA. Although many site coordinates were at or near the apparent site, some of the listed coordinates were off by up to 300m.

Institutions

  • Colorado School of Public Health

Categories

Environmental Health, Air Pollution, Leukemia, Hazardous Pollutant, Environmental Geography, Benzene, Energy Development, Childhood Acute Lymphocytic Leukemia

Funders

Licence