Cold/Cozy Mice - Finding the needles in the haystack of biomedical literature

Published: 15-06-2017| Version 2 | DOI: 10.17632/tbk3km9xfz.2
Helena Deus


Problem Statement: the task facing biomedical scientists hoping to find publications that corroborate or debunk a hypothesis is akin to finding a needle in a haystack that keeps growing. Strategies that mine or summarize the scientific literature exist but have been largely focused on recovery of named entities (e.g. proteins, cells) or more sophisticated methods that make use of ontologies to recover also related terms and even, more recently, machine learning methods when there is sufficient training data. Our Approach: we describe a use case faced by a biomedical scientist who needs to compare tumor volume/weight results in papers describing mice experiments where mice were exposed to the same or similar compounds but housed in different temperatures. In our approach, we have extracted annotations of units and measures (U&M) in scientific literature, which we then used in combination with contextual information (e.g. section of the paper) and regular expressions to identify the specific entity being measured (e.g. Housing Temperature). Results and Discussion: from a corpus of ~1.1M open access publications we found 299 relevant papers using the U&M approach combined with its surrounding contextual information. This large drop in the number of papers can be explained by our restrictive search criteria which included looking for keywords, patterns and temperature annotations in specific sections of the paper. We found a clear prevalence of papers mentioning housing conditions in the range of 20-25°C, which is the approximate temperature range suggested by NIH guidelines. We also found a small increase in the number of paper describing mouse thermo-neutral housing conditions in the period after the observation that this variable has an impact in mice tumor growth (2014-2016). This dataset contains those results.


Steps to reproduce

1. Search science direct for "mice" "temperature" and "tumor" and retrieve list of papers 2. Filter to papers in open access corpus 3. Filter to paper that contain sentences with temperature-type annotations containing patterns like “(animals|mice|rats).+were.+(housed|acclimated|caged|maintained|kept|bred).+at” in sections with title like "experiment|procedure|method|animal|mice|in vivo|murine” 4. Extract only the temperature annotation from the sentences in 3