Dataset 4 - Membrane Protein Types

Published: 27 June 2018| Version 1 | DOI: 10.17632/dbzdybks82.1
Contributors:
Elangovan Siva Sankari,

Description

To establish a quality benchmark dataset for developing a predictor to identify the functional types of membrane proteins, the sequences were collected from UniProtKB/ Swiss-Prot release on 2018_04 at http://www.uniprot.org/according to the following steps (Lin et al. 2013). Proteins belonging to all eight types were collected. Those proteins annotated with ‘‘fragment’’ were removed; meanwhile, those proteins with the length of sequence less than 50 residues were also excluded, in case of the influence of the fragment. Sequences annotated with ambiguous or uncertain terms, such as ‘‘potential,’’ ‘‘probable,’’‘‘probably,’’ ‘‘maybe,’’ or ‘‘by similarity,’’ were removed for further consideration. The Dataset 4 is divided as training dataset and testing dataset with 1332 and 1033 respectively.

Files

Steps to reproduce

To establish a quality benchmark dataset for developing a predictor to identify the functional types of membrane proteins, the sequences were collected from UniProtKB/ Swiss-Prot release on 2018_04 at http://www.uniprot.org/according to the following steps (Lin et al. 2013). Proteins belonging to all eight types were collected. Those proteins annotated with ‘‘fragment’’ were removed; meanwhile, those proteins with the length of sequence less than 50 residues were also excluded, in case of the influence of the fragment. Sequences annotated with ambiguous or uncertain terms, such as ‘‘potential,’’ ‘‘probable,’’‘‘probably,’’ ‘‘maybe,’’ or ‘‘by similarity,’’ were removed for further consideration. The Dataset 4 is divided as training dataset and testing dataset with 1332 and 1033 respectively.

Institutions

Government College of Engineering

Categories

Membrane Proteins

Licence