Datasets: Molecular Entities as Structured Data on the Web

Published: 21 April 2021| Version 1 | DOI: 10.17632/n9xwfs5fcj.1
Contributors:
,

Description

Internet search engines have remodeled the use of the internet, making it easy to find the content we are interested in. The Web was originally designed to exchange natural language documents. It is difficult for machines to interpret this type of data. Structured data placed on websites solves this problem by allowing search engines to "understand" the content better. This can also be applied to chemical data. We have developed three tools to convert chemical data into structured data. SDFEater allows to convert SDF files, Molstruct converts CSV files and MEgen is a web application that allows entering data in a form. Using our tools, we generated 10 datasets including 5 main datasets (DS1, DS2, DS3, DS4, and DS5) and 5 small datasets (DS1s, DS2s, DS3s, DS4s, and DS5s) consisting of 10 files with one molecule each. They are based on well-known chemical databases (ChEBI, DrugBank, PubChem) as well as other data (WikiData). We make them available in JSON-LD HTML, JSON-LD, RDFa, and Microdata structured data formats. More details about the inputs and outputs as well as how the data is generated can be found in README.txt.

Files

Steps to reproduce

Detailed data allowing the reproduction of our datasets are included in README.txt. It lists, for example, the versions of the input datasets and software along with the commands used to generate the data.

Institutions

Uniwersytet w Bialymstoku

Categories

Applied Sciences, Natural Sciences

Licence