ExAgBov: a public database of annotated variations from hundreds of bovine whole-exome sequencing samples

Published: 2 August 2022| Version 4 | DOI: 10.17632/m3p9m9vc4g.4
Contributor:
Moran Gershoni

Description

ExAgBov: A public database of annotated variations from hundreds of bovine whole-exome sequencing samples Rotem Raz1,2, Zvi Roth2 and Moran Gershoni1* 1. Department of Ruminant Science, Institute of Animal Sciences, Agricultural Research Organization, The Volcani Center, Rishon LeZion 7505101, Israel 2. Department of Animal Sciences, Robert H. Smith Faculty of Agriculture, Food and Environment, the Hebrew University, Rehovot 76100, Israel. *Corresponding author: Moran Gershoni (gmoran@volcani.agri.gov.il) Cite this database: Raz, R., Roth, Z. & Gershoni, M. ExAgBov: A public database of annotated variations from hundreds of bovine whole-exome sequencing samples. Sci Data 9, 469 (2022). https://doi.org/10.1038/s41597-022-01597-8 Abstract Large reference datasets of annotated genetic variations from genome-scale sequencing are essential for interpreting identified variants, their functional impact, and their possible contribution to diseases and traits. However, to date, no such database of annotated variation from broad cattle populations is publicly available. To overcome this gap and advance bovine NGS-driven variant discovery and interpretation, we obtained and analyzed raw data deposited in the SRA public repository. Short reads from 262 whole-exome sequencing samples of Bos Taurus were mapped to the Bos Taurus ARS-UCD1.2 reference genome. The GATK best practice workflow was applied for variant calling. Comprehensive annotation of all recorded variants was done using the Ensembl Variant Effect Predictor (VEP). An in-depth analysis of the population structure revealed the breeds comprising the database. The Exomes Aggregate of Bovine- ExAgBov is a comprehensively annotated dataset of more than 20 million short variants, of which ~2% are located within open reading frames, splice regions, and UTRs, and more than 60,000 variants are predicted to be deleterious. The following files are available: 1. BovExAg.DB.TSV+DP.filtered = The filtered database file. includes variants that pass the quality filtration for the read quality and depth of coverage (see Raz R. et al.) 2. ExAgBov.DB.TSV.gz = The database file with all the annotated variants and the supporting information 3. ExAgBov.DP.all-var.csv.gz = The depth of coverage of all variants in all samples 4. SRA.ID+breed.csv = The SRA ID of the ExAgBov WES samples including the sample breed whenever available in the metadata 5. ExAgBov.SRA.metadata.txt = The complete metadata file of all the WES samples included in the ExAgBov database

Files

Institutions

Agricultural Research Organization Volcani Center

Categories

Genetic Variation

Licence