Integrated genome assembly of CHO-K1

Published: 18 April 2017| Version 3 | DOI: 10.17632/5ccnnwhszm.3
Meiyappan Lakshmanan


In order to study genomic changes in the antibody-producing cell line, whole-genome shotgun sequences of both CHO-K1 and SH-87 cell lines were generated on the Illumina HiSeq platform with both paired-read (Insert Length = ~300bp; WGS) and large-fragment mate-pair libraries (Insert Length = ~10kbp; DNA-PET). The resulting reads (>0.8 billion) were then assembled de novo using the SOAPdenovo assembler (Li et al., 2010) and OPERA-LG scaffolder (Gao et al., 2016) to obtain improved draft genomes. We combined our data with the previously published sequence by performing a merged-assembly of both genomes. The resultant integrated assembly improved scaffold contiguity statistics by more than 6-fold, with >90% of the genome assembled into ~500 scaffolds. The sequence data was also used to close gaps in silico, filling >130,000 gaps in the original assembly.