Dataset 4 - Membrane Protein Types

Published: 27-06-2018| Version 1 | DOI: 10.17632/dbzdybks82.1
Contributors:
Elangovan Siva Sankari,
mani megalai

Description

To establish a quality benchmark dataset for developing a predictor to identify the functional types of membrane proteins, the sequences were collected from UniProtKB/ Swiss-Prot release on 2018_04 at http://www.uniprot.org/according to the following steps (Lin et al. 2013). Proteins belonging to all eight types were collected. Those proteins annotated with ‘‘fragment’’ were removed; meanwhile, those proteins with the length of sequence less than 50 residues were also excluded, in case of the influence of the fragment. Sequences annotated with ambiguous or uncertain terms, such as ‘‘potential,’’ ‘‘probable,’’‘‘probably,’’ ‘‘maybe,’’ or ‘‘by similarity,’’ were removed for further consideration. The Dataset 4 is divided as training dataset and testing dataset with 1332 and 1033 respectively.

Download All

Steps to reproduce

To establish a quality benchmark dataset for developing a predictor to identify the functional types of membrane proteins, the sequences were collected from UniProtKB/ Swiss-Prot release on 2018_04 at http://www.uniprot.org/according to the following steps (Lin et al. 2013). Proteins belonging to all eight types were collected. Those proteins annotated with ‘‘fragment’’ were removed; meanwhile, those proteins with the length of sequence less than 50 residues were also excluded, in case of the influence of the fragment. Sequences annotated with ambiguous or uncertain terms, such as ‘‘potential,’’ ‘‘probable,’’‘‘probably,’’ ‘‘maybe,’’ or ‘‘by similarity,’’ were removed for further consideration. The Dataset 4 is divided as training dataset and testing dataset with 1332 and 1033 respectively.