PacBio amplicon sequencing of Cordyceps species
Data of high-throughput ITS-D1/D2 LSU amplicon sequencing, using double barcodes.
Steps to reproduce
PCR amplifications for the whole nrITS region with the D1-D2 domains of nrLSU were carried out simultaneously using specifically designed primers in which the ITS5 (forward) and LR5 (reverse) primers were each tagged with a different barcode sequence, resulting in different combinations corresponding to distinct PCR reactions. The ampli-fications were conducted on Applied Biosystems® 2720 automated thermal cycler. A hot start of 4 min at 94 °C was followed by 30 cycles consisting of 3 min at 94 °C, 1 min at 50 °C, 2 min at 72 °C, and a final elongation step of 3 min at 72 °C, using Dream Taq DNA polymerase (Thermo Fisher). Another set of PCR for the same strains were carried out with Platinum SuperFi DNA polymerase (Invitrogen) using the same PCR protocol as above. This latest polymerase has > 300x fidelity to the Dream Taq. The objective was to assess the difference in amplification and phylogenetic identification between a high-fidelity polymerase and a standard Taq. PCR products from both polymerases were purified using AMPure XP DNA purification kit. DNA concentration of the purified products was quantified using QubitTM. All purified PCR products were adjusted to the same concentration of approximately 15 ng/µl. The pooled amplicons were sent to Om-icsDrive (Singapore) for a sequencing with a PacBio SEQUEL I machine. Once the raw data were obtained, Circular Consensus Sequences (CCS) were determined from subread sequences by CCS tool  using required minimum of five subreads. The sequence of each sample was demultiplexed from its barcodes using custom Python script (Python version 3.7, scikit-bio package version 0.5.5). All sequences were bioinformatically cleaved between the ITS and D1/D2 LSU regions using the ITS4 priming sites to cut through. In each sample, sequences were clustered by CD-HIT-EST  at 97% similarity, then sequences in each cluster were aligned by MUSCLE  and a consensus sequence was generated per cluster.