SC-CoMIcs (Superconductivity Corpus for Materials Infomatics)

Published: 29 June 2021| Version 3 | DOI: 10.17632/xc9fjz2p3h.3
Contributors:
Kyosuke Yamaguchi, Ryoji Asahi, Yutaka Sasaki

Description

A corpus of 1000 Materials Informatics abstracts related to superconductivity. Named entities and relations in these text files are separately annotated in *.ann files in the format of the stand-off annotation. Materials Informatics (MI) needs textual datasets to accelerate the studies in this area, but there is no sizable datasets suitable for our superconducting material search. In this respect, we decided to create a new corpus from scratch for MI information extraction. SC-CoMIcs is the corpus that can contribute to the advancement of MI studies, especially in superconductivity. Experiment tools over the dataset can be found at the github linked below. Note that you need to agree with the license displayed here. The set of 1,000 MI abstracts (0001.txt-1000.txt) are specially permitted to share in the research community under Creative Commons BY-NC 3.0 by Elsevier under a written agreement (#200221-005626). This is the strict condition you MUST obey. NB: There is no difference in the dataset per set from version 1.

Files

Steps to reproduce

Unzip the Zip archives. Follow the steps in the github below.

Categories

Superconductivity, Information Extraction, Informatics, Materials Application

Licence