Data on genetic diversity of circumsporozoite protein (csp) non-repeat regions from Plasmodium knowlesi clinical isolates of Sabah.

Published: 16 June 2022| Version 1 | DOI: 10.17632/hykd454wwx.1
Zarina Amin,
Nur Marlessa Suzain Mustaffi,
Noor Ain Haron


This dataset presents an analysis genetic diversity of malaria circumsporozoite protein (csp) of Plasmodium knowlesi in Sabah; where circumsporozoite protein is one of the targeted candidates for malaria vaccine development and was conducted to evaluate the suitability of csp as a vaccine in relation to its genetic diversity. The data were collected from 26 human blood spot samples from Kudat and Kota Kinabalu hospitals in Sabah in 2012 which were tested positive for malaria. Genomic DNA extraction, nested PCR, cloning and sequencing of the csp genes were carried out and phylogenetic, sequence diversity and natural selection of the csp genes were analysed using bioinformatic tools such as MEGAX and DnaSP ver. 5.10.00 for phylogenetic tree build, mutational analysis and neutral theory of evolution. Analysis and comparison of this gene was done against P. knowlesi csp strain H as a reference sequence (GenBank database XM_002258966.1) showed point mutations at 52 positions among the 237 sequences of different geographical regions. The phylogenetic tree revealed that the occurrence of multiple haplotypes was scattered despite of geographical location. The evolutionary history which was inferred using the Neighbor-Joining method revealed no geographical clustering to any country listed above; with a total of 76 non-repeat region Pkcsp haplotypes including one unique haplotype (haplotype H12). These data could serve as auxiliary information and/or research data for other researchers in Sabah. It could also serve as guide or reference data to other researchers outside Sabah who may be interested in carrying out similar research in other states.


Steps to reproduce

Experimental design, materials and methods Specimens 26 human blood samples were dried and stored in individual plastic bags. Patient’s details such as name, reference number of patient, age, race, gender, address and microscopic results were recorded. Meanwhile, 2 negative controls were obtained from healthy individuals with no history of malaria. Genomic DNA extraction The DNA from the dried blood spots were extracted using QIAamp DNA mini kit (QIAGEN, UK) following the manufacturer’s instructions. Gene amplification and DNA sequencing Plasmodium knowlesi csp and dbp genes are both polymorphic antigenic marker genes which were amplified using nested PCR with modifications [11], [12], [13], [14], [15]. The csp gene was amplified using semi-nested PCR. Primer pair PkCSPF2 and PkCSP-R were used for Nest 1 amplification and were further amplified in Nest 2 PCR amplification with primer pairs of PkCSPF2 and PkCSPR2. Direct sequencing was outsourced and performed by a commercial laboratory, First BASE Laboratories Sdn. Bhd. Highest capacity-based genetic analyzer platforms by Applied Biosystems and BigDye® Terminator v3.1 cycle sequencing kit chemistry were used in the protocols which is optimized for most of DNA sequencing applications. Data analysis The sequencing results of cloned csp genes obtained for both forward and reverse sequences were assembled. Basic Local Alignment Search Tool (BLAST) online software from National Centre for Biotechnology Information ( database [16] was used to confirm the amplification of the correct targeted DNA sequences of the cloned partial gene sequence of csp in P. knowlesi. Completed nucleotide sequences of csp genes were aligned independently against a reference sequence respectively and exported as multiple sequence FASTA format for further analysis. The basic sequence statistics; including conserved sites, variable sites of parsimony informative and singleton sites were also analyzed with MEGAX software [10]. Meanwhile, to assess amino acid polymorphisms, the multiple sequence alignment (MSA) for each gene were translated into amino acid sequences using CLUSTAL-W in MEGAX. Selection analysis and genetic diversity on statistical data was calculated using DnaSP version 5.10.00 programme [17]. Phylogenetic analysis on both genes was constructed by using neighboring-joining method by MEGAX software.


Universiti Malaysia Sabah


Disease Epidemiology