OA Green / Gold counts

Published: 10-10-2018| Version 2 | DOI: 10.17632/kxx3cfsmc2.2
Jeroen Baas


Derived from Scopus, September 2018, combined with SciVal institutional profiles. For each of the regions of SciVal institutions, the top 20 most prolific institutions in 2016 are selected. These are further used to derive the data for each year 2008-2017. OA Green / Gold is derived from OADOI/unpaywall data.


Steps to reproduce

Green vs Gold tagging: .withColumn('oa_type',func.expr('IF(oa_locations.host_type="repository","GREEN",IF(oa_locations.host_type="publisher","GOLD","SUBSCRIPTION"))')) All publication counts are of Scopus document types Article, Review and Conference Papers Years: 2007-2017 Joined Scopus data with SciVal institution mappings. All counts are de-duplicated for name variants / DOI overlaps etc before aggregating into annual counts. Regions derived from SciVal are rewritten and further filtered based on the following R code: oadata <- transform(oadata, region2= ifelse(country_code=="gbr", "GBR", ifelse(country_code=="aus", "AUS",as.character(region_code)))) "top20" selection is based on the following R code: df_t20dataselect = subset( oadata, sort_year==2016 & regionRank <= 20 & (region2 %in% c("APAC","NAM","SAM","EUR","GBR","AUS")) ) %>% select(institution_id,regionRank) this is joined with the oadata to produce the counts for this stable set of 20 institutions per region: Current version only shows data for records in Scopus including a publication month, allowing a break down by publishing month