Abstract
ABSTRACT Recognition of function of protein is one of the major steps in drug discovery process and it is known that functions of polypeptides are dictated by their structures. This study aims to predict secondary structures of proteins utilizing a two-stage probabilistic algorithmic approach. First stage is determination of structural class (all-a, all-P, a/p or a+P) of unknown protein by a Mixed Integer Linear Programming formulation. Next phase is searching 3 to 7-residues-long segments of unknown protein's residue sequence in database of structures of experimentally identified proteins, belonging to the same structural class determined for the unknown one. Source for data on structures of known proteins is Protein Data Bank; http://www.pdb.org. Three states (a-helix, P-sheet or loop) per residue predictions are obtained through a probabilistic approach utilizing outcomes of database search process. We achieved 100% accuracy in folding type determination phase. Weights to put on probabilities obtained from 3 to 7-residues-long segments for each structural class are determined optimally for each structural class via Non-Linear Programming Formulations again utilizing structural information on experimentally identified proteins. The prediction method is tested on 419, 579, 707 and 601 known proteins from all-a, all-P, a/p and a+p classes, respectively, as if they were not known. 3-states-per-residue accuracy levels obtained for all-a, all-p, a/p and a+p classes are 80.5%, 72.4%, 71.9% and 75.5%, respectively. m