Utility-University Collaboration Publication Data
Description
This dataset is a collection of metadata describing the authors, their organizational affiliations, and locations associated with academic publications that result from collaborations between academic researchers and electric utilities. It is queried from the Scopus database by searching for publications where at least one author is affiliated with one of the 20 largest U.S. electric utilities. We used this data set to better understand the nature of and factors in utility-university collaboration formation. In addition to understanding the role geography/proximity plays, we also conducted limited network analysis to identify high frequency collaborators at both the author and organizational scale. We identified some time series trends such as increasing numbers of publications and increasing distances between collaborators over time, but we did not determine their significance by controlling for external factors like funding, regulation, and technological changes. Future work could use the included classifications for each publication to understand the changing mix of research topics over time. The interviews we conducted for the accompanying research suggest that several types of collaborations are not represented in the publication dataset, including unsuccessful collaborations, many types of student-driven practicum-style work, and for-hire work that may assist in regulatory filings, internal documents, or other non-academic publications. We include four separate versions of this dataset at different stages of its refinement to better enable any reproductions, expansions or refinements of the dataset. The first file (Initial Publication Queries By Utility.zip) is our raw output from the Scopus queries. The second file (Author-Parsed Publication Queries By Utilities.zip) is the parsed output of the queries, where each author and affiliation are separated. The third file (Publication Dataset with Duplicates and Erroneous Entries.csv) combines all utilities into a single file and includes many manual corrections to parsed or missing information, as well as some additional fields to classify data and identify duplicates and records erroneously included. The fourth file (Final Utility-University Publication Dataset.csv) then removes some of those additional fields as well as all duplicates and erroneously included records. This was the file we used for our final analyses.
Files
Steps to reproduce
To generate evidence of University/Utility (U/U) collaboration, we search the Scopus database for articles published that include at least one author affiliated with a large U.S. utility. We select the 20 largest utilities– in terms of revenue– from EIA's “2016 Utility Bundled Retail Sales - Total” data. We exclude several utilities listed in the EIA data but not in the list of Scopus affiliations (Puget Sound Energy Inc, Long Island Power Authority, and Reliant Energy Retail Services) and combined several utilities that are listed separately in the EIA data but are owned by larger holding companies (e.g. Duke Energy) or have merged (e.g. Public Service Company Colorado and Northern States Power Company). Those that remained were: Alabama Power Co Union Electric Co - (MO) (Amren) Arizona Public Service Co Commonwealth Edison Co (Commonwealth Edison Co; ComEd) Consolidated Edison Co-NY Inc Consumers Energy Co DTE Electric Company Duke Energy (Duke Energy Carolinas; Duke Energy Progress; Duke Energy Florida; Duke Energy Indiana) Entergy Louisiana LLC (Entergy Corporation) Georgia Power Co Pacific Gas & Electric Co PacifiCorp Xcel Energy (Public Service Co of Colorado) Public Service Elec & Gas Co (Public Service Enterprise Group Inc) Salt River Project South Carolina Electric & Gas Company (SCANA) Southern California Edison Co TXU Energy Retail Co LP Virginia Electric & Power Co (Dominion Virginia Power Co) Wisconsin Electric Power Co Northern States Power Co - Minnesota We did not go out of our way to include independent system operators in our searches, though when they showed up within queries, we classified them as if they were a utility. There were very few instances of this. We then filter the Scopus query results to include only publications that include authors with “University”, “College”, “School” or “Institute” affiliations. The metadata was then downloaded as series of descriptive strings. Scripts were written to parse the strings as best as we were able into separate rows for each author and columns for name, affiliation, location, etc. A substantial amount of manual effort was then put in to verify these parsed strings, including verifying organizational affiliations (since dual affiliations were often listed where students had graduated between research and publication), verifying locations, classifying organizations, mapping distances (in retrospect, we recommend using a Google Distance API), identifying erroneously included publications (after dual-affiliations had been corrected) and duplicate publications (which were often similarly named conference presentations), and in some cases, standardizing author names across different naming conventions.