Dataset 4 - Membrane Protein Types
Description
To establish a quality benchmark dataset for developing a predictor to identify the functional types of membrane proteins, the sequences were collected from UniProtKB/ Swiss-Prot release on 2018_04 at http://www.uniprot.org/according to the following steps (Lin et al. 2013). Proteins belonging to all eight types were collected. Those proteins annotated with ‘‘fragment’’ were removed; meanwhile, those proteins with the length of sequence less than 50 residues were also excluded, in case of the influence of the fragment. Sequences annotated with ambiguous or uncertain terms, such as ‘‘potential,’’ ‘‘probable,’’‘‘probably,’’ ‘‘maybe,’’ or ‘‘by similarity,’’ were removed for further consideration. The Dataset 4 is divided as training dataset and testing dataset with 1332 and 1033 respectively.
Files
Steps to reproduce
To establish a quality benchmark dataset for developing a predictor to identify the functional types of membrane proteins, the sequences were collected from UniProtKB/ Swiss-Prot release on 2018_04 at http://www.uniprot.org/according to the following steps (Lin et al. 2013). Proteins belonging to all eight types were collected. Those proteins annotated with ‘‘fragment’’ were removed; meanwhile, those proteins with the length of sequence less than 50 residues were also excluded, in case of the influence of the fragment. Sequences annotated with ambiguous or uncertain terms, such as ‘‘potential,’’ ‘‘probable,’’‘‘probably,’’ ‘‘maybe,’’ or ‘‘by similarity,’’ were removed for further consideration. The Dataset 4 is divided as training dataset and testing dataset with 1332 and 1033 respectively.