Homology and the evolution of vocal folds in the novel avian voice box
Description
Data for "Homology and the evolution of vocal folds in the novel avian voice box" including RNA-sequencing and ATAC-sequencing of developing chick and mouse vocal folds. ATAC-sequencing peaks for dissected HH39 chick vocal folds (VF) and dissected epithelium and mesenchyme from HH34 chick trachea (Tr) and tracheobronchial junction (TBJ). Peaks were called for each replicate and a consensus peak set with a fixed center and standardized width of 200 bps was created using Genrich for input into chromVAR. Also included is the Chromvar analysis R dataset. For chromVAR analysis, we followed the default walkthrough (https://greenleaflab.github.io/chromVAR/articles/Introduction.html) using the galGal6 BSgenome package and the human_pwms_v1 position weight matrices to identify transcription factor binding sites using the matchMotifs command in the motifmatchr R package (v1.2.0). Peaks from the scaffold “chrUn_NW_020109859v1” were removed prior to correcting for GC bias and computing motif variability, as it does not appear in the galGal6 BSGenome file. Also included are transcript frequencies for Lightseq data from mouse and chick vocal fold, tracheal mesenchyme and cartilage. Data analysis was performed using published Light-seq analysis code (Kishi et al., 2022) on the Harvard Medical School O2 cluster (Kernel 2.10.0) with Python (v3.7.5), PyTable (v3.6.1), samtools (v1.12), pysam (v0.17.0), numpy (v1.21.4), pandas (v1.3.4), Biopython (v1.79), and scikit-bio (v0.5.6). Briefly, Barcode, UMI, and cDNA sequences were extracted from Read 1 using UMI-tools (v1.1.1). cDNA sequence was then mapped to either the chick (GG6a) or mouse (M27) genomes. Reads were assigned to genes using FeatureCounts using fractional read counting and the GTF annotation ‘gene’ (-M --fraction -g gene_id -t gene) and deduplicated (per gene) with UMI-tools dedup. Reads were parsed out by barcode sequences using a custom python script adapted from Kishi et al., 2022. Barcode to sample information is included in the sample_info spreadsheet.