Published: 8 September 2017| Version 2 | DOI: 10.17632/pp282j4h29.2
Shaista Hussain


The data consists of a Golgi image dataset and the pipeline to perform unsupervised phenotypic analysis on these images. The data is presented as a zipped file ‘Golgi_HCA_workflow.zip’ and its contents include: 1) Data folder ‘snare_2’ containing vignettes of Golgi images (.jpg) acquired from multiple fields of multiple wells and numerical data (.sta) corresponding to the image features extracted for each Golgi image. 2) Plate map folder ‘plate_maps’ containing the .csv plate map file for ‘snare_2’ dataset with the well locations for all the siRNA treatments. 3) Repository folder ‘repository’ containing ‘nqc.h5’. A labeled set of good and bad nuclei was used to train the nuclei quality control (NQC) classifier. The results of this pre-trained classifier have been included in ‘nqc.h5’ for convenience of users. 4) Two Python scripts ‘control_model_utils.py’ for the control modeling module of the pipeline and 'HCA_workflow.py’ is the main script for running the entire pipeline. 5) README file describing the steps to download and install this package and the Python software needed to run it.


Steps to reproduce

Run ‘HCA_workflow.py’ using the command “python HCA_workflow.py data_folder_name plate_map_filename plate_name”. For the example ‘snare_2’ dataset, run: “python HCA_workflow.py snare_2/ A-MINISNRX-series.csv snare_2” After successful completion of analysis, all the intermediate results are saved in the .h5 file format in ‘repository’ folder. All the plots and figures generated will be saved in a new ‘Figures’ folder.


Institute of High Performance Computing


Golgi Apparatus, Machine Learning Algorithm, Unsupervised Learning, Phenotyping, Data Quality Control, High Throughput Analysis