Skip to main content

Natural Language Processing and Information Systems

ISSN: 1611-3349

Visit Journal website

Datasets associated with articles published in Natural Language Processing and Information Systems

Filter Results
1970
2024
1970 2024
341 results
  • PMC clinical trial disentangled tables data set
    The database is created by processing 6558 clinical trial articles from PubMed Central public sample 2014. The articles are obtained by matching PMC and Medline documents. The documents that were selected contained in publication type word "Clinical" in Medline. The documents were processed using TableDisentangler tool, that is able to create the majority of the database. Then documents were annotated using UMLS/MetaMap and script that is a part of TableDisentangler tool for communication with MetaMap. Three case studies were performed for information extraction from these data: - Extraction of patients' age - Extraction of gender distribution - Extraction of FEV1 measures (this has been performed for COPD studies only) Information extraction case studies were performed using TabInOut tool for generating table information extraction rules. Database schema can be seen on the following link: https://github.com/nikolamilosevic86/TableDisentangler/wiki/Database-schema Files included in the dataset: - Clinicaldata.zip - This file contains raw xml clinical documents from PMC - Database.zip - Contains database with processed data using TableDisentangler and TabInOut
    • Dataset
  • Robot at the mirror: learning to imitate via associating self-supervised models
    We introduce an approach to building a custom model from the on-the-shelf self-supervised models via their associating instead of training and fine-tuning. We demonstrate it with an example of a humanoid robot looking at the mirror and learning to detect the 3D pose of its own body from the image it perceives. In order to build our model, we first obtain features from the visual input and the postures of the robot’s body via existing state-of-the-art models. Then we map their corresponding latent spaces by a sample-efficient robot’s self-exploration at the mirror. In this way, the robot builds the solicited 3D pose detector at one instant, instead of acquiring it gradually. The mapping, which employs associating the pairs of feature vectors, is then implemented in the same way as the keys–value mechanism of the famous transformer models. Finally, deploying our model for imitation to a simulated robot allows us to study, tune up and systematically evaluate its hyperparameters, without the involvement of the human counterpart, advancing our previous research.
    • Image
  • Artefact of: JavaBIP meets VerCors: Towards the Safety of Concurrent Software Systems in Java
    This artefact contains an implementation of the Verified JavaBIP toolset as presented in the paper "JavaBIP meets VerCors: Towards the Safety of Concurrent Software Systems in Java", as well as the Casino case study discussed in the paper. The artefact contains all binaries needed to evaluate the toolset, ready to be installed and run into the FASE'23 VM. For that goal there are instructions in the file README.pdf in the artefact. In addition, the artefact also contains scripts and instructions to rebuild the artefact. However, for this an active internet connection is necessary. Instructions for this are in the file AUTHORS_README.md. For the purpose of running the artefact, this author readme can be ignored. To use, load zip into FASE'23 VM: https://doi.org/10.5281/zenodo.7446277
    • Software/Code
  • Artefact of: JavaBIP meets VerCors: Towards the Safety of Concurrent Software Systems in Java
    This artefact contains an implementation of the Verified JavaBIP toolset as presented in the paper "JavaBIP meets VerCors: Towards the Safety of Concurrent Software Systems in Java", as well as the Casino case study discussed in the paper. The artefact contains all binaries needed to evaluate the toolset, ready to be installed and run into the FASE'23 VM. For that goal there are instructions in the file README.pdf in the artefact. In addition, the artefact also contains scripts and instructions to rebuild the artefact. However, for this an active internet connection is necessary. Instructions for this are in the file AUTHORS_README.md. For the purpose of running the artefact, this author readme can be ignored. To use, load zip into FASE'23 VM: https://doi.org/10.5281/zenodo.7446277
    • Software/Code
  • Basking shark head skeletons and software for 3D shape estimation from 2D landmarks
    This is the initial release of the code and basking shark data used in the research paper 'A Kendall Shape Space Approach to 3D Shape Estimation from 2D Landmarks' by Paskin et al. presented at ECCV 2022.
    • Dataset
  • Robot at the mirror: learning to imitate via associating self-supervised models
    We introduce an approach to building a custom model from the on-the-shelf self-supervised models via their associating instead of training and fine-tuning. We demonstrate it with an example of a humanoid robot looking at the mirror and learning to detect the 3D pose of its own body from the image it perceives. In order to build our model, we first obtain features from the visual input and the postures of the robot’s body via existing state-of-the-art models. Then we map their corresponding latent spaces by a sample-efficient robot’s self-exploration at the mirror. In this way, the robot builds the solicited 3D pose detector at one instant, instead of acquiring it gradually. The mapping, which employs associating the pairs of feature vectors, is then implemented in the same way as the keys–value mechanism of the famous transformer models. Finally, deploying our model for imitation to a simulated robot allows us to study, tune up and systematically evaluate its hyperparameters, without the involvement of the human counterpart, advancing our previous research.
    • Software/Code
  • morphomatics/ShapePrediction: v1
    This code was used in the publication Predicting Shape Development: A Riemannian Method to predict future shape developments in Riemannian shape spaces. This work was carried out as part of the Math+ project AA5-3 "Manifold-Valued Graph Neural Networks." The underlying methodological building blocks are part of the Morphomatics library.
    • Software/Code
  • Mudestreda Multimodal Device State Recognition Dataset
    Mudestreda Multimodal Device State Recognition Dataset obtained from real industrial milling device with Time Series and Image Data for Classification, Regression, Anomaly Detection, Remaining Useful Life (RUL) estimation, Signal Drift measurement, Zero Shot Flank Took Wear, and Feature Engineering purposes. The official dataset used in the paper "Multimodal Isotropic Neural Architecture with Patch Embedding" ICONIP23. Official repository: https://github.com/hubtru/Minape Conference paper: https://link.springer.com/chapter/10.1007/978-981-99-8079-6_14 Mudestreda (MD) | Size 512 Samples (Instances, Observations)| Modalities 4 | Classes 3 | Future research: Regression, Remaining Useful Life (RUL) estimation, Signal Drift detection, Anomaly Detection, Multivariate Time Series Prediction, and Feature Engineering. Notice: Tables and images do not render properly. Recommended: `README.md` includes the Mudestreda description and images `Mudestreda.png` and `Mudestreda_Stage.png`. Data Overview Task: Uni/Multi-Modal Classification Domain: Industrial Flank Tool Wear of the Milling Machine Input (sample): 4 Images: 1 Tool Image, 3 Spectrograms (X, Y, Z axis) Output: Machine state classes: `Sharp`, `Used`, `Dulled` Evaluation: Accuracies, Precision, Recal, F1-score, ROC curve Each tool's wear is categorized sequentially: Sharp → Used → Dulled. The dataset includes measurements from ten tools: T1 to T10. Data splitting options include random or chronological distribution, without shuffling. Options: Original data or Augmented data Random distribution or Tool Distribution ([see Dataset Splitting](#dataset-spliting))
    • Dataset
  • A Public Ground-Truth Dataset for Handwritten Circuit Diagram Images
    CGHDThis dataset contains images of hand-drawn electrical circuit diagrams as well as accompanying annotation and segmentation ground-truth files. It is intended to train (e.g. ANN) models for extracting electrical graphs from raster graphics. Content2.549 Raw Images (Annotated)25 Drafters (plus Images provided by TU Dresden)12 Circuits per Drafter2 Drawings per Circuit4 Photos per Drawing212.280 Bounding Box Annotations29.790 Rotation Annotations71.307 Text String Annotations264 Binary Segmentation Maps (Annotated)Strokes vs. BackgroundAccompanying Polygon Annotation Files20.060 Polygon Annotations59 Object ClassesScripts for Data Loading, Statistics, Consistency Check and Training Preparation
    • Software/Code
  • A Public Ground-Truth Dataset for Handwritten Circuit Diagram Images
    CGHDThis dataset contains images of hand-drawn electrical circuit diagrams as well as accompanying annotation and segmentation ground-truth files. It is intended to train (e.g. ANN) models for extracting electrical graphs from raster graphics. Content2.549 Raw Images (Annotated)25 Drafters (plus Images provided by TU Dresden)12 Circuits per Drafter2 Drawings per Circuit4 Photos per Drawing212.280 Bounding Box Annotations29.790 Rotation Annotations71.307 Text String Annotations264 Binary Segmentation Maps (Annotated)Strokes vs. BackgroundAccompanying Polygon Annotation Files20.060 Polygon Annotations59 Object ClassesScripts for Data Loading, Statistics, Consistency Check and Training Preparation
    • Software/Code
1