raw data

Published: 24 April 2023| Version 1 | DOI: 10.17632/4w8n8n96tc.1
Yue Haitao


16s high-throughput sequencing results were used Uparse (v.7.1) to cluster classification operation units (Operationaltaxonomicunits, OTU) at 97% similarity to get the representative sequence of OTU. Then Alpha diversity analysis was carried out by Mothur (v.1.30.2) to reflect species richness and diversity in the samples. Beta diversity analysis was carried out by Qiime (v.1.9.1) to compare the community composition of the tested samples, and R language (version3.3.1) was used to analyze the species composition of the tested samples. In the results of metagenome sequencing, the original sequencing data were controlled by fastp software, and the short segment sequences obtained by quality control were assembled by Multiple_Megahit. The assembly results were clustered by CD-HIT software to construct non-redundant gene set, and the high-quality reads of each sample were compared with non-redundant gene set using SOAPaligner software (default parameter: 95% identity), and the abundance information of genes in the corresponding samples was calculated. Then the non-redundant gene set sequence was compared with KEGG gene database (GENES) by DIAMOND (parameter: blastp; E-value≤1e-5). According to the gene abundance sum of KO, Pathway, EC and Module, the abundance of this functional category was calculated, and based on the corresponding abundance data table, the functional composition of microbes in the tested samples was analyzed by R language. The original data of serum metabolic sequencing was converted into mzXML format by ProteoWizard, and then peaked alignment, retention time correction and peaked area extraction were performed by XCMS program. The structure of metabolites was identified by accurate mass number matching (<25ppm) and secondary spectrum matching to search the database.83 For the data extracted by XCMS, deleted the ion peaks with missing values > 50% in the group. The software SIMCA-P14.1 (Umetrics,Umea,Sweden) was used for pattern recognition. After the data was preprocessed by Pareto-scaling, the dimension of the multivariable original data was reduced by PCA (principal component analysis). The grouping trend (intra-group and inter-group similarity and difference) and outliers (whether there are abnormal samples) of the observed variables in the data set were analyzed. The differential metabolites were screened by the multiple of difference (Fold change) and T test (Student'st-test) obtained by univariate analysis.



Xinjiang University


Metabolomics, Microorganism