Contributors: Forget, Gaël
... Documentation of https://dataverse.harvard.edu/dataverse/ECCOv4r2inputs
Contributors: Caracuzzo, Alex
... InfoGroup’s Historical Business Backfile consists of geo-coded records of US businesses and other organizations that contain basic information on each entity, such as: contact information, industry description, annual revenues, number of employees, year established, and other data. Each annual file consists of a “snapshot” of InfoGroup’s data as of the last day of each year, creating a time series of data 1997-2014. The 2014 data file covers approximately 20 million business records representing all industries. Access is restricted to current Harvard University community members. Use of Infogroup US Historical Business Data is subject to the terms and conditions of a license agreement (effective March 16, 2016) between Harvard and Infogroup Inc. and subject to applicable laws. Each data file is available in either .csv or .sas format. All data files are compressed into an archive in .gz, or GZIP, format. Extraction software such as 7-Zip is required to unzip these archives.
Replication Data for: Google Politics: The Political Determinants of Internet Censorship in Democracies
Contributors: Meserve, Stephen A., Pemstein, Daniel
... The expansion of digital interconnectivity has simultaneously increased individuals' access to media and presented governments with new opportunities to regulate information flows. As a result, even highly democratic countries now issue frequent censorship and user data requests to digital content providers. We argue that government internet censorship occurs, in part, for political reasons, and seek to identify the conditions under which states censor. We leverage new, cross-nationally comparable, censorship request data, provided by Google, to examine how country characteristics co-vary with governments’ digital censorship activity. Within democracies, we show that governments engage in more digital censorship when internal dissent is present and when their economies produce substantial intellectual property. But these demand mechanisms are modulated by the relative influence that democratic institutions provide to narrow and diffuse interests; in particular, states with proportional electoral institutions censor less.
Contributors: Mullinix, Kevin J., Leeper, Thomas J., Druckman, James N., Freese, Jeremy
... This archive contains datasets, R analysis files, and supplemental materials for: Mullinix, Kevin J., Thomas J. Leeper, James N. Druckman, and Jeremy Freese. "The Generalizability of Survey Experiments." Journal of Experimental Political Science, Forthcoming. See README for details on the contents of this archive.
Image data for project "Detecting triple-vessel disease with cadmium zinc telluride-based single photon emission computed tomography using the intensity signal-to-noise ratio between rest and stress studies"
Contributors: Fang, Yu-Hua
... Image data for the project
Contributors: Kwon, Sejeong, Cha, Meeyoung, Jung, Kyomin
... This study determines the major difference between rumors and non-rumors and explores rumor classification performance over varying time windows---from the first three days to nearly two months. A comprehensive set of user, structural, linguistic, and temporal features were examined and their relative strength as a key rumor trait was compared based on near-complete date of Twitter. The first contribution of this study is to provide an insight about cumulative spreading patterns of rumors and non-rumors over time through statistical analysis. We find that structural and temporal features distinguish rumors from non-rumors over a long-term window, yet they are not available during the initial phase of rumors. In contrast, user and linguistic features remain a strong indicator throughout the rumor propagation phases. In addition to cumulative spreading patterns, changes of predictive powers with time are estimated for each set of features. Based on these findings, we suggest a new rumor classification algorithm that achieves competitive accuracy even over short time windows. These findings provide new insights for explaining rumor mechanism theories and for identifying features of early rumor detection.
Contributors: Jarrad, Maya
... Johnson Creek watershed restoration project locations, areas, and attributes by primary watershed restoration agents from 1990-2014 in the Johnson Creek Watershed, Willamette River Basin, Oregon. This spatial data package combines the collective efforts of most jurisdictional and NGO watershed partners in watershed restoration efforts for flood mitigation, wetland enhancement, fish passage and habitat, stormwater mitigation paired with habitat creation or enhancement, and water quality improvements. This layer includes all projects with a statement of intent, a site location, and project start year. Projects must be explicitly stream, riparian, or wetland enhancement projects, or located within 0.5 km of at least a class 4 stream. Projects that only consist of study, monitoring, or education, or were primarily to protect infrastructure, were not included. Individual, private landowner restoration in the eastern third of the watershed and in Multnomah County was not included for privacy reasons.
Archival dataset: A longitudinal dataset of five years of public activity in the Scratch online community
Contributors: Hill, Benjamin Mako, Andrés Monroy-Hernández
... Scratch is a programming environment and an online community where young people can create, share, learn, and communicate. In collaboration with the Scratch Team at MIT, we created a longitudinal dataset of public activity in the Scratch online community during its first five years (2007-2012). The dataset comprises 32 tables with information on more than 1 million Scratch users, nearly 2 million Scratch projects, more than 10 million comments, more than 30 million visits to Scratch projects, and more. To help researchers understand this dataset, and to establish the validity of the data, we also include the source code of every version of the software that operated the website, as well as the software used to generate this dataset. We believe this is the largest and most comprehensive downloadable dataset of youth programming artifacts and communication. This is an archival version of this dataset and all data tables are access restricted. Individuals should request access to these data by filling out a form and agreeing to the Scratch Research Data Sharing Agreement at the following URL: http://llk.media.mit.edu/scratch-data/
Contributors: Hopkins, Daniel
... Retrospective voting is a central explanation for voters’ support of incumbents. Yet despite the variety of conditions facing American cities, past research has devoted little attention to retrospective voting for mayors. This paper first develops hypotheses about how local retrospective voting might differ from its national analog, due to both differing information and the presence of national benchmarks. It then analyzes retrospective voting using the largest data set on big-city mayoral elections between 1990 and 2011 to date. Neither crime rates nor property values consistently influence incumbent mayors’ vote shares, nor do changes in local conditions. However, low city-level unemployment relative to national unemployment correlates with higher incumbent support. The urban voter is a particular type of retrospective voter, one who compares local economic performance to conditions elsewhere. Moreover, these effects appear to be present only in cities that dominate their media markets, suggesting media outlets’ role in facilitating retrospective voting.
Contributors: Li, Yifan
... Stations and Lines of China's High Speed Railway System, circa 2016. Revised from a previous version (circa 2014) based on publicly available system schedules, news items, and locations found using Google Earth and Open Street Map. Compiled by Yifan Li, revised by Lex Berman, edited with revisions by Xuan Zhang. Browse webmap: http://worldmap.harvard.edu/maps/chinamap/WDv