Contributors:Bamini Jayabalasingham, Thomas Collins, Liliane Kuiper-Hoyng, Jin Zhang, Guillaume Roberge
Data underlying the analyses in chapters 1, 2, 3, and 5 of the report "The researcher journey through a gender lens" (www.elsevier.com/connect/gender-report), which provides an analysis of the researcher journey, analysed using a gender lens. Data on authors, grantees and patent applicants pertain to researchers active during two periods, 16 geographies, and 26 subject areas and 11 sub-fields of medicine. Theses data are provided at the aggregated level.
In the selection task, participants had to select the region of interest (a specified vessel segment) as explained in the "Experiment_description.pdf" file. They were asked to manipulate (i.e., translate, rotate and scale) a 3D box widget, initially positioned such that the region of interest covered all vessel structures displayed on the screen.
Three 3D model files ("level1.vtk", "level2.vtk" and "level.3 vtk") were used to define three complexity levels of the selection task: Level 1 (simple) — one vessel; Level 2 (average) — two closely located vessels; Level 3 (complex) — three vessels, where two vessels are located close to each other.
This project was funded by the University of Amsterdam and NWO.
Contributors:Bamini Jayabalasingham, Roy Boverhof, Kevin Agnew, Lisette Klein
In an effort to identify research that supports the UN SDGs, Elsevier has generated a set of Scopus queries related to each of the SDGs.
In this dataset, you will find documentation describing how each of the Scopus queries were created along with a collated list of the queries.
Contributors:Joerg Hellwig, Sarah Huggett, Mark Siebert
There are the underlying data for our report "Artificial Intelligence: How knowledge is created, transferred, and used", published 2018. Data can be used to construct the graphs used in the report.
Contributors:John P.A. Ioannidis, Jeroen Baas, Richard Klavans, Kevin Boyack
Citation metrics are widely used and misused. We have created a publicly available database of 100,000 top-scientists that provides standardized information on citations, h-index, co-authorship adjusted hm-index, citations to papers in different authorship positions and a composite indicator. Separate data are shown for career-long and single year impact. Metrics with and without self-citations and ratio of citations to citing papers are given. Scientists are classified into 22 scientific fields and 176 sub-fields. Field- and subfield-specific percentiles are also provided for all scientists who have published at least 5 papers. Career-long data are updated to end-of-2017 and to end-of-2018 for comparison.
We worked with the Local Electron Atom Probe (LEAP) group at the Materials Science Department of Oxford university to both devise and carry out search taks for Information Retreival studies for the RDM Research Data Search product development.
Our start point for doing this was a framework of learning objectives and cognitive complexity, outlined and discussed in a paper by Diane Kelly.
This framework was then used to try and define objective task complexity ratings, by using the learning objectives, required outcomes and mental activities involved in completing the tasks. This was done by questioning of the participants. In effect, the participants helped us define the tasks, shortly before carrying them out. As the cognitive processes required are cumulative, more processes equals greater complexity. We ended up with five levels of cognitive complexity.
For search task evaluation and study, participants were first asked to complete a pre-task questionnaire. This contained questions about participants' interest in and knowledge of the task, and subjective questions about perceived task complexity. There were also subjective questions about expected task difficulty, in relation to specific expectations of presumed challenges associated with completing the task, e.g. evaluating the results and determining when they had enough information to stop.
The post-task questionnaire contained similar items to the pre-task questionnaire, with the aim of comparing expectations with experience. Additionally there were questions about enjoyment and engagement, and some overall judgements around difficulty and satisfaction.
Participants' search behaviours were logged manually with real time observation and repeated analysis of recordings. The values were averaged and reported by cognitive complexity level.
The participants were 12 members of the LEAP group based in Oxford. We will revisit to test the same tasks on DataSearch when we have suitable and enriched matching data.
Pair-wise citation numbers of countries, based on citations originating from AR/RE/CP papers in 2017 (Scopus) to AR/CP/RE in 2013-2017. Self-citations are excluded. Ratio's of references are only counted for references to citations in the cited window (sum=100%). Fractional counting applied: proportionally assigns fractions to countries based on the number of authors affiliated with that country in each paper.
Scopus dataset extracted 20181205
Contributors:Jessica Cox, Corey Harper
Jupyter notebooks of our analysis of the data provided by A. de Waard in http://dx.doi.org/10.17632/4bh33fdx4v.3
Contributors:Paul Groth, Mike Lauruhn, Antony Scerri, Ronald Daniel
This dataset is the result of applying crowd sourcing to the extractions of two open information extraction tools (Open IE 4 and MinIE) linked below. Extractions were performed on both a set of random sentences from Wikipedia and randomly selected sentences from the OA-STM corpus.
The aim is to evaluate the effectiveness of open information extraction tools on scientific and medical text.
The initial datasets, the code for applying information, the HITS, labelling instructions, and analysis code are all included above.
The Elsevier DataSearch (https://datasearch.elsevier.com) team participated in the bioCADDIE 2016 Dataset Retrieval Challenge. The results of the Challenge, along with the example and test queries, can be found here: https://biocaddie.org/biocaddie-2016-dataset-retrieval-challenge
We have submitted a paper to DATABASE: The Journal of Biological Databases and Curation that details our work in the Challenge (to be published in the latter half of 2017). The attached file, elsevier-submission.zip, contains elsevier[1-5].txt, which correspond to the five-run submissions as described in the paper.
The following describes the code that we developed for the Challenge:
Aspire Content Processing by Search Technologies (https://www.searchtechnologies.com/en-gb/aspire):
Dictionary.xml - Loads dictionaries (MeSH, Genes, Solr fields) into Aspire so that they can be used to identify concepts in text (document or query).
QueryAnalyzer.xml - Receives a query, identifies concepts using the dictionaries and returns a response containing information about the concepts in the query.
ProcessJSON.xml - Processes the JSON documents (Flattens the metadata; Identifies MeSH and Gene concepts and embeds them in the text; Prepares the document to be indexed by Solr).
ProcessJSONSimple.xml - Enables JSON documents which have previously been created by ProcessJosn.xml to be sent to Solr without any further processing. This is much quicker than having to run ProcessJSONSimple.xml again; Prepares the document to be indexed by Solr.
All other aspects of Aspire (Aspire framework, content source to process a folder of JSON files, submission to Solr) are standard Aspire features with no customisation.
Biocaddie.qpl - QPL file for processing a search query by sending a request to QueryAnalyzer.xml in Aspire, parsing the response and constructing a Lucene query.
Elsevier-solr.zip - Java project for a custom Solr Token Filter to index concept IDs in the same position as the words to which they relate.
All other aspects of Solr are standard Solr or QPL..
MeSH.groovy - Groovy script to convert a MeSH dictionary in ASCII format into a dictionary which can be used in Aspire.
Genes.groovy - Groovy script to convert a Gene dictionary into a dictionary which can be used in Aspire.
The file biocaddie-infosys-master_files.zip contains the following:
SolrQueryGen - Generates Solr queries from text. It supports unigram, gazetteer lookup, lemmatisation and word embedding expansion.
JudgementUI - UI for bioCADDIE manual judgments.
NLP4J - Natural language parsing (tokenisation, lemmatisation, part of speech tagging, etc.).
PseudoRelevanceFeedback - Another approach, but not integrated.
BioCaddieSpark – Apache Spark jobs to load data and process, index into Solr.
BioCaddieServices - Backend services for Judgment UI.
Any questions about the code should be directed to firstname.lastname@example.org.