Processed Data for: Top-Down Mass Spectrometry of Histone Modifications in Sorghum Reveals Potential Epigenetic Markers for Drought Acclimation

Published: 24-10-2019| Version 1 | DOI: 10.17632/9j232f653t.1
Mowei Zhou,
Neha Malhan,
Amirhossein Ahkami,
Kristin Engbrecht,
Gabriel Myers,
Jeffery Dahlberg,
Joy Hollingsworth,
Julie Sievert,
Robert Hutmacher,
Mary Madera,
Peggy Lemaux,
Kim Hixson,
Christer Jansson,
Ljiljana Pasa-Tolic


This study explores the changes in histone modifications of Sorghum bicolor (L.) Moench through developmental stages and in response to drought stress in two genotypes. We analyzed the leaves of 48 plants using top-down mass spectrometry and identified 26 unique histone proteins and 677 unique histone proteoforms. This is the collection of processed data and scripts used for data processing.


Steps to reproduce

1. The raw instrument data collected from reversed phase liquid chromatography - mass spectrometry will be available at PRIDE repository with the dataset identifier PXD014660. 2. The raw data were processed by several software to generate the files in the folders shown here. Scan (scan information for scripts): MASIC MSPF (proteoform identification): Informed-Proteomics, MSPathFinder module (using "UniProt_sorghum_focus.fasta", search parameters defined in *.param files) ProMex (proteoform quantitation): Informed-Proteomics, ProMex module TopPIC (proteoform identification): TopPIC (using "uniprot-Sorbi20171004_filter.fasta", search parameters logged in the results as the first few lines) 3. Run batch cmd files (batch_step1~5.bat) to process the data from the folders. Run step1~3 sequentially to generate the "unsorted proteoform" list (combine.txt). The list is then manually checked, and consolidated by the R script (step4) as combine_fill.txt. The consolidated list is finally loaded and abundance values are filled in at step 5 to obtain the output table including the abundances across all samples for all the histone proteoforms. * Links for software used are in the references.