Computational representation of protein sequences for homology detection and classification

Oğul, Hasan

dc.contributor.advisor	Mumcuoğlu, Erkan
dc.contributor.author	Oğul, Hasan
dc.date.accessioned	2020-12-10T09:16:11Z
dc.date.available	2020-12-10T09:16:11Z
dc.date.submitted	2006
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/225956
dc.description.abstract	ÖZPROTEİN DİZİLİMLERİNİN HOMOLOJİ SEZİMİ VE SINIFLANDIRMAAMAÇLI BİLİŞİMSEL GÖSTERİMİOğul, HasanDoktora, Bilişim Sistemleri A.B.DTez Yöneticisi: Yrd. Doç. Dr. Erkan Ü. MUMCUOĞLUOcak 2006, 102 sayfaOtomatik öğrenme yöntemleri bilişimsel biyolojide sınıflandırma problemleri içinsıkça kullanılmaktadır. Bu yöntemlerin girdilerinin sabit uzunlukta özellikvektörlerinden oluşması gerekir. Proteinler farklı uzunluklarda olabileceği için,protein dizilimlerini sabit sayıdaki özelliklerle temsil edecek yöntemlere ihtiyaçduyulmaktadır. Bu tezde bu amaçla üç farklı yöntem sunulmaktadır. Bunlardanbirincisi azaltıltılmış alfabelerle n-peptid bileşimi, ikincisi en büyük benzersizeşleşmelere göre ikili benzerlik değerleri, ve üçüncüsü ise olasılıksal sonek ağaçlarıile ikili benzerlik değerleridir.viTezde tarif edilen yeni dizilim gösterim yöntemleri, probleme özgü değişiklilerlebirlikte, bilişimsel biyolojinin üç önemli problemi üzerinde uygulanmıştır; uzakhomoloji sezimi, hücresel konumlanma tahmini, çözgen erişebilirlik tahmini. Herproblem için, ortak kıyaslama kümeleri üzerinde yapılan deneyler sonucunda,mevcut yöntemlerle yeni yöntemler arasında karşılaştırma analizleri sunulmuştur.Uzak homoloji sezimi testlerinde, üç yeni yöntemin hepsi mevcut en iyiyöntemlerle karşılaştırılabilir doğruluk değerleri elde ederken, bunların çok dahaverimli çalıştıkları gözlenmiştir. Yeni yöntemlerin bir kombinasyonu, proteinlerinhücresel konumlanmalarını tahmin eden PredLOC isimli sistemi geliştirmek içinkullanılmış ve bu sistem iki farklı ökaryotik protein kümesi için test edilmiştir.PredLOC her iki veri kümesi için de şu ana kadar elde edilen en iyi doğrulukdeğerine ulaşmıştır. En büyük benzersiz eşleşmelerin kullanımı, çözgen erişebilirliktahmininde az miktarda iyileştirme sağlayabilmiştir.Anahtar kelimeler: n-peptid bileşimi, en büyük benzersiz eşleşme, olasılıksal sonekağacı, uzak homoloji, hücresel konumlanma.vii
dc.description.abstract	ABSTRACTCOMPUTATIONAL REPRESENTATION OF PROTEIN SEQUENCES FORHOMOLOGY DETECTION AND CLASSIFICATIONOğul, HasanPh.D., Department of Information SystemsSupervisor: Assist. Prof. Dr. Erkan Ü. MUMCUOĞLUJanuary 2006, 102 pagesMachine learning techniques have been widely used for classification problems incomputational biology. They require that the input must be a collection of fixed-length feature vectors. Since proteins are of varying lengths, there is a need for ameans of representing protein sequences by a fixed-number of features. This thesisintroduces three novel methods for this purpose: n-peptide compositions withreduced alphabets, pairwise similarity scores by maximal unique matches, andpairwise similarity scores by probabilistic suffix trees.ivNew sequence representations described in the thesis are applied on threechallenging problems of computational biology: remote homology detection,subcellular localization prediction, and solvent accessibility prediction, with someproblem-specific modifications. Rigorous experiments are conducted on commonbenchmarking datasets, and a comparative analysis is performed between the newmethods and the existing ones for each problem.On remote homology detection tests, all three methods achieve competitiveaccuracies with the state-of-the-art methods, while being much more efficient. Acombination of new representations are used to devise a hybrid system, calledPredLOC, for predicting subcellular localization of proteins and it is tested on twodistinct eukaryotic datasets. To the best of author?s knowledge, the accuracyachieved by PredLOC is the highest one ever reported on those datasets. Themaximal unique match method is resulted with only a slight improvement insolvent accessibility predictions.Keywords: n-peptide composition, maximal unique match, probabilistic suffix tree,remote homology, subcellular localization.v	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Computational representation of protein sequences for homology detection and classification
dc.title.alternative	Protein dizilimlerinin homoloji sezimi ve sınıflandırma amaçlı bilişimsel gösterimi
dc.type	doctoralThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilişim Sistemleri Anabilim Dalı
dc.identifier.yokid	151516
dc.publisher.institute	Enformatik Enstitüsü
dc.publisher.university	ORTA DOĞU TEKNİK ÜNİVERSİTESİ
dc.identifier.thesisid	180820
dc.description.pages	112
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_151516.pdf
Size:: 1.016Mb
Format:: PDF
Description:: File_151516

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess