Source data underlying the main figures in A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning

Published: 12 May 2021| Version 1 | DOI: 10.17632/5td2kp4cbf.1
Contributor:
Youqing Mu

Description

Source data underlying the main figures in the manuscript are available here. “active_learning_result.csv” includes the data used to compare the effectiveness between active learning and random sampling, which is the source of Fig. 2b. “unlabel_tsne.csv” and “label_tsne.csv” include the 2-D projections of the cases’ 768-D embeddings, which are the source of Fig. 3. “dev_result.csv” includes the model performance metrics during development, which is the source of Fig. 4a. “review_result.csv” includes the results from experts’ review, which is the source of Fig. 4b. “mat_abbr.csv” includes label co-occurrence counts, which is the source of Fig. 5. “kws_influence.csv” includes the words’ influence scores for each label, which is the source of Fig. 6.

Files

Categories

Pathology, Natural Language Processing, Machine Learning

Licence