Entropy sorting of single cell RNA sequencing data reveals the inner cell mass in the human pre-implantation embryo

Published: 30 August 2022| Version 1 | DOI: 10.17632/689pm8s7jc.1
This repository stores the data and code used to generate the human pre-implantation embryo results in our publication titled, "Entropy sorting of single cell RNA sequencing data reveals the inner cell mass in the human pre-implantation embryo". In this work we present a mathematical framework that we term Entropy Sorting, that allows us to confidently isolate the previously elusive inner cell mass population in single cell RNA sequencing data (scRNA-seq). The data is subset into four sub folders whose recommended order of usage is as follows: 1. Pre_Implantation_Embryo_Analysis_Code - Contains the code used to generate all the results in our publication. See the README.txt file within this folder for a summary of each workflow provided. 2. Mesitermann_Data - Contains an exact copy of the human pre-implantation embryo scRNA-seq data used in the Mesitermann et al. 2021 publication, which the authors kindly provided to us directly. This dataset was complied from 4 separate scRNA-seq datasets by Mesitermann et al. 2021. See their paper, cited in our publication, for a description of how they complied the 4 datasets. This folder also contains key outputs from the software we present in our work (FFAVES and ESFW), which allows others to re-create the results presented in our publication. 3. Yanagida_Data - An independent human pre-implantation embryo dataset from the Yanagida et al. 2021 paper (GEO ID = GSE171820), which was kindly provided to us by author, alongside a tSNE embedding used in their paper. 4. Nakamura_Data - An independent Macaca embryo dataset from the Nakamura et al. 2016 paper (GEO ID = GSE74767). We would like to highlight that we have provided exact copies of the pre-implantation human embryo used in the paper to make our work as accessible as possible. However, we recommend that researchers who wish to use the datasets we used in their own work start from the source data (e.g. the Gene Expression Omnibus repositories) to maintain their integrity. Details of each dataset used for this work are provided in the experimental procedures section of our manuscript.



