Mitochondrial DNA integrity and metabolome profile are preserved in the human induced pluripotent stem cell reference line KOLF2.1J

Published: 20 December 2023| Version 2 | DOI: 10.17632/n4mccs2zfg.2
Contributors:
,
,

Description

This dataset corresponds to a manuscript submitted to Stem Cell Reports with the same title. It includes the files necessary to reproduce the results described in the manuscript: - GT19-38445_with_p_MT.bam (short-read whole genome sequencing data containing only reads mapped to the mitochondrial DNA) - KOLF2.1J targeted ONT mtDNA.fastq (long-read targeted mitochondrial DNA sequencing of nine PCR amplicons spanning the entire mitochondrial genome) - KOLF2.1J WGS ONT.bam (long-read whole genome sequencing data containing only reads mapped to the mitochondrial DNA) - Jupyter Notebooks containing the code used to reproduce the results: Short-read WGS data processing.ipynb and Short-read WGS data visualization.ipynb; The latter is also used to visualize ONT data (file names need to be changed accordingly) - The reference used for alignment and a shifted version to account for the artificial breakage point: rCRS.fasta and rCRS_shifted.fasta - Jupyter Notebook containing code for targeted ONT mitochondrial DNA sequencing variant calling using Mutserve: Targeted ONT seq variant calling.ipynb -- For analysis of WGS data, the BAM file KOLF2.1J WGS ONT.bam is used and variants are called according to the Targeted ONT seq variant calling.ipynb notebook with adjustment of the calling threshold to 0.01. - Sanger sequencing results of m625 conspicuous position described in the manuscript: kolf_mt2_rev_mt2_internal_rev.fasta, kolf_mt2_rev_mt2_internal_rev.ab1, kolf_mt2_fwd_mt2_internal_fwd.fasta, kolf_mt2_fwd_mt2_internal_fwd.ab1 Additional information can be found below. In case of any problems, please contact the authors.

Files

Steps to reproduce

The to reproduce the targeted mitochondrial DNA sequencing using ONT sequencing can be found in the files described above. The annotations are based on the following datasets from Mitomap: - MutationsCodingControl MITOMAP Foswiki - MutationsRNA MITOMAP Foswiki - Polymorphisms MITOMAP Foswiki They were downloaded on February the 23rd 2023. The ONT WGS data was sequenced on a PromethION Flow cell on a P2 Solo device. Raw reads were basecalled using dorado within the EPI2ME environment with the following parameters: { sv: true, snp: false, cnv: true, str: true, mod: false, mapula: false, sample_name: KOLF, basecaller_cfg: dna_r10.4.1_e8.2_400bps_sup@v4.2.0, bam_min_coverage: 10, depth_window_size: 25000, annotation: true, out_dir: /home/ag-rossi/epi2melabs/instances/wf-human-variation_01HFXJXJCHZWMD67VT6CZ9GSFT/output, cluster_merge_pos: 150, min_sv_length: 30, phase_vcf: true, include_all_ctgs: false, use_longphase: true, use_longphase_intermediate: true, ref_pct_full: 0.1, var_pct_full: 0.7, snp_min_af: 0.08, indel_min_af: 0.15, vcf_fn: EMPTY, min_cov: 2, min_mq: 5, min_qual: 2, min_contig_size: 0, ctg_name: EMPTY, refine_snp_with_sv: true, bin_size: 500, phase_mod: true, force_strand: false, sex: male, basecaller: dorado, dorado_ext: fast5, basecaller_basemod_threads: 2, sv_benchmark: true, depth_intervals: false, GVCF: true, joint_phasing: false, downsample_coverage: false, downsample_coverage_target: 60, downsample_coverage_margin: 1.1, qscore_filter: 10, basecaller_chunk_size: 25, cuda_device: cuda:auto, threads: 4, ubam_map_threads: 8, ubam_sort_threads: 3, ubam_bam2fq_threads: 1, merge_threads: 4, modkit_threads: 4, annotation_threads: 8, disable_ping: false, help: false, version: false, fast5_dir: /media/ag-rossi/Data/2023_10_13_wgs_kolf/fast5_skip, ref: /home/ag-rossi/ReferenceData/Homo_sapiens.GRCh38.dna.primary_assembly.fa, remora_cfg: dna_r10.4.1_e8.2_400bps_sup@v4.2.0_5mCG_5hmCG@v2, wf: { agent: epi2melabs/5.1.4, epi2me_instance: 01HFXJXJCHZWMD67VT6CZ9GSFT, epi2me_user: guest_01HFP8TPTJWD85XBT6ZNTE46VH } } Afterwards, the BAM File was processed according to the steps described in Short-read WGS data processing.ipynb starting with #AddReadGroup Header and stopped after #Extract reads from mt Genome and discard reads marked with the duplicated read filter. Variants were then called using Mutserve at a threshold level of 0.01.

Institutions

Leibniz-Institut fur umweltmedizinische Forschung an der Heinrich-Heine-Universitat Dusseldorf gGmbH

Categories

Mitochondrial DNA, Induced Pluripotent Stem Cell, Genome Sequencing

Licence