Contributors:Verspoor Karin, Nguyen Dat Quoc, Akhondi Saber A., Druckenbrodt Christian, Thorne Camilo, Hoessel Ralph, He Jiayuan, Zhai Zenan
The discovery of new chemical compounds and their synthesis process is of great importance to the chemical industry. Patent documents contain critical and timely information about newly discovered chemical compounds, providing a rich resource for chemical research in both academia and industry. Chemical patents are often the initial venues where a new chemical compound is disclosed. Only a small proportion of chemical compounds are ever published in journals and these publications can be delayed by up to 3 years after the patent disclosure. In addition, chemical patent documents usually contain unique information, such as reaction steps and experimental conditions for compound synthesis and mode of action. These details are crucial for the understanding of compound prior art, and provide a means for novelty checking and validation. Due to the high volume of chemical patents, approaches that enable automatic information extraction from these patents are in demand. To develop natural language processing methods for large-scale mining of chemical information from patent texts, a corpus is created providing chemical patent snippets and annotated entities and reaction steps.
The IAG-TNKU Dataset is a large collection of Turkish news articles that can be used in different Turkish Text Classification NLP tasks such as Identification of Author Gender In Turkish News. The text data belong to 32 female and 38 male authors, has been extracted from the archive of a newspaper (www.hurriyet.com.tr) for the interval 08.11.1997 and 24.04.2019. The dataset divided into males and females in a balanced way consists of a total of 43.292 articles.
How to use the IAG-TNKU Dataset:
1. Unzip compressed resources.
2. There are two folder (Females and Males)
3. Each folder contains a set of article files in .txt formatted corresponding to its category.
A set of results that was obtained as part of the evaluation done in the paper titled "Radial Intersection Count Image: a Clutter Resistant 3D Shape Descriptor".
The data is made available here, in conjunction with a source code repository that can be found here: https://github.com/bartvbl/Radial-Intersection-Count-Image-reproduction
These combined will allow all results presented in the paper to be reproduced.
This dataset includes the raster dataset of Chl-a retrievals in hundreds of lakes more than l km2 in east China. Also, related meteorological and fertilizers are included in this dataset as a Microsoft excel.
16S Metagenomics soil analysis was conducted in three different estates (Kam Cheong Plantation, Ladang Sahabat, Warisan Gagah) in Sabah, Malaysia to understand the soil microbial community in disease free, low and high basal stem rot disease respectively.
The contents of this dataset are all of the non-perturbed 3D CAD objects from the 2017 SHREC 3D object retrieval competition, that has initially been made available here:
Unfortunately, the original files have since been removed, rendering the dataset no longer publicly available.
For the purposes of reproducibility of multiple papers, it has been uploaded here as a mirror copy.
Cell formation, cell scheduling, and group layout are three important problems in designing and configuring a Cellular Manufacturing System (CMS). In our paper, "Joint cell formation, cell scheduling, and group layout problem in virtual and classical cellular manufacturing systems: Metaheuristic approaches embedded in a computer software", we address the integration of these problems in virtual and classical CMSs considering alternative processing routes. The objective is to minimize the total handling costs and cycle time. Due to the computational complexity of the problem, hybrid metaheuristic algorithms are proposed to solve the problem. Depending on the type of cells, which is either classical or virtual, an encoding scheme is proposed to effectively represent candidate solutions. Placement algorithms are developed to obtain the layout from an encoded solution; these algorithms are either based on running a heuristic or a linear program. A computer software, called ICFLSD (Integrated Cell Formation, Layout, and Scheduling Designer) is developed to simplify the problem-solving process from the data entry to getting results. Numerical examples adopted from the literature are solved using the ICFLSD and CPLEX solver to assess the performance of the metaheuristic algorithms. The comparison results demonstrated the superiority of the simulated annealing to the other solution approaches considered in this study.
This dataset contains the ICFLSD and numerical examples solved in our paper.
The supplementary data for the article:
I.V. Oshchapovsky, I.Yu. Zavaliy and V.V. Pavlyuk
The investigation of hydrogen sublattice in Mg2NiHx (x=0.3) hydride by first-principle calculations
Materials Today Communications - 2020
The detailed description in file Supplementary.pdf,
the output of program JDFTx, containing commands to reproduce calculations in file Supplementary.7z