Critical assessment of protein-protein interaction databases and features towards prediction of interactions

Ulubaş, Mehmet Cengiz

dc.contributor.advisor	Gürsoy, Attila
dc.contributor.author	Ulubaş, Mehmet Cengiz
dc.date.accessioned	2020-12-08T08:07:26Z
dc.date.available	2020-12-08T08:07:26Z
dc.date.submitted	2009
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/170540
dc.description.abstract	Protein-protein etkileşimleri (PPE) biyolojik süreçlerin her seviyesinde çok önemlidir. Deneysel olarak kanıtlanmış PPE farklı veritabanlarına koyulmaktadır. Bu veritabanları PPE hakkında çeşitli bilgiler içermektedir, fakat hücrelerdeki tüm süreçler göz önüne alındığında, kapsamları düşüktür. Bu yüzden, PPE kapsamını genişletmek için güvenilir, daha doğru hesaplamalı metotlar gerekmektedir. Birçok araştırma grubu farklı bilgi ve metotlara dayanan çeşitli doğrulukta PPE tahmin algoritmaları geliştirmiştir. Ancak, yüksek doğrulukta bir PPE tahmin etme metodu geliştirmek ilgi çekicidir.Bu çalışma, var olan dizilim tabanlı PPE tahmin etme metotlarını değerlendirmeyi ve doğruluk oranları geliştirilmiş yeni bir metot önermeyi hedeflemektedir. Tahminler bir makine öğrenimi algoritması olan Destek Vektör Makineleri (DVM) ile yapılmaktadır. DVM, öğrenim etkileşim veri kümelerine göre kalıplar oluşturur ve etkileşimleri bu kalıplar ile tahmin eder. Bu çalışmada, pozitif öğrenim veri kümeleri deneysel PPE'leri, negatif öğrenim veri kümeleri hesaplanmış etkileşmeyen proteinleri içermektedir. Etkileşim bilgisini DVM'de betimlemek için, proteinlerin amino asit dizilim sıralarına göre n-gram frekansları hesaplanmıştır. DVM performansının, öğrenim veri kümelerindeki etkileşimlerden, farklı amino asit sınıflandırması tekniklerinden, n-gram frekanslarından ve ? değerlerinden fazlaca etkilendiği gösterilmiştir. Sekiz öğrenim veri kümesi için DVM kalıpları oluşturulmuştur ve DVM skorları ile detaylı karşılaştırmaları yapılmıştır. Bu skorlara göre, her veri kümesindeki etkileşimleri iyi tahmin eden birleştirilmiş öğrenim veri kümeleri oluşturulur. Daha sonra, en yüksek DVM skorunu elde etmeyi sağlayan en belirleyici nitelikler kümesi bulunur. Son olarak, en iyi DVM kalıpları, YUPE (Yapısal Uyumlu Protein Etkileşimleri) algoritması tarafından tahmin edilen PPE içindeki yanlış pozitiflerin elenmesi için kullanılır.
dc.description.abstract	Protein-protein interactions (PPI) are of crucial importance at all levels of biological processes. The experimentally identified PPI are deposited in several databases. These databases contain diverse information about PPI; but their coverage is low when we consider full processes in cells. Thus, reliable, accurate computational methods are needed to improve the coverage. Many research groups have developed PPI prediction algorithms with varying accuracies based on different data and methods. However, to develop a new PPI prediction method with high accuracy is challenging.This study aims to assess existing sequence based PPI prediction methods and to propose a new algorithm with improved accuracies. The predictions are made via Support Vector Machines (SVM), which is a machine learning algorithm. SVM creates models based on training sets and predicts interactions via those models. In this study, positive training sets contain experimental PPI and negative training sets contain computational non-interacting proteins. In order to represent interaction data in SVM, n-gram frequencies of proteins are calculated according to their amino acid sequences. It is shown that SVM performance is strongly affected by interactions in training datasets, amino acid categorization techniques, n-gram frequencies, and ? values used. SVM models are created for eight datasets and the critical assessment of those datasets is made via their SVM scores. Based on those scores, combined training datasets are created that make accurate prediction of interactions in every dataset. Then, the best feature set that leads to the highest SVM scores is found. Finally, the best SVM models are utilized to eliminate false positives in putative protein interactions predicted by PRISM (Protein Interactions by Structural Matching) algorithm.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.subject	Biyoloji	tr_TR
dc.subject	Biology	en_US
dc.title	Critical assessment of protein-protein interaction databases and features towards prediction of interactions
dc.title.alternative	Etkileşim tahmini için protein-protein etkileşim kümelerinin ve niteliklerin detaylı karşılaştırılması
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Elektrik ve Bilgisayar Mühendisliği Anabilim Dalı
dc.subject.ytm	Protein analysis
dc.subject.ytm	Protein expression
dc.identifier.yokid	340781
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	KOÇ ÜNİVERSİTESİ
dc.identifier.thesisid	246830
dc.description.pages	87
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_340781.pdf
Size:: 1.024Mb
Format:: PDF
Description:: File_340781

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess