Biyolojik veritabanlarında etkin benzerlik hesaplama

Söylev, Arda

dc.contributor.advisor	Abul, Osman
dc.contributor.author	Söylev, Arda
dc.date.accessioned	2021-05-08T11:22:14Z
dc.date.available	2021-05-08T11:22:14Z
dc.date.submitted	2013
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/683376
dc.description.abstract	Canlının temel özelliklerini taşıyan en küçük birim olan hücrenin içerisinde meydana gelen olayların açıklanması biyolojik ağlarının incelenmesiyle mümkün olur. Bu inceleme için kullanılan tekniklerden biri benzerlik tabanlı analizdir. Bu kapsamda, bir sorgu ağıyla biyolojik ağlardan oluşan bir biyolojik veritabanı karşılaştırılmakta, sorgu ağıyla benzerliği belli bir eşik değerinin üzerinde ve aşağısında olan ağlar ayrışmaktadır. Bu problemin çözümü, iki ağın benzerliğinin bulunmasını gerektirir. Literatürde NP-tam olarak geçen alt çizge eşleniği problemi sebebiyle probleminçözümü hesaplamsal olarak çok maliyetlidir. Çözüm için literatürde çeşitli yöntemler geliştirilmiştir. Bu yöntemlerden biri olan QNET yöntemi, bu tez çalışması kapsamında Java diliyle ve Hadoop çatısında kodlanmıştır. 7 düğümlü sorgu ağları için Hadoop gerçekleştirimi 10 makinalı (18 çekirdekli) bir öbekte 11,42 hızlanma sağlamıştır.Ayrıca literatürde yer alan `referans tabanlı indeksleme yöntemi` incelenerek ESBiD yöntemi geliştirilmiş, bir referans tabanlı indeksleme yöntemi olan RINQ' nunzayıflıkları üzerine çalışmalar yapılmıştır. Bu kapsamda sezgisel yöntemler kullanılarak belirsizlik setindeki ağ sayısı %29,85 oranında, %93,22 doğruluk payıyla azaltılmış, referans ağların seçim yöntemi değiştirilmiş ve belirsizlik setinde biriken ağların daha hızlı hizalanması için `en yüksek dereceli düğüm` tekniğigeliştirilmiştir. Bu teknik, QNET' le yapılan tam hizalamanın %89,76 etkinliğine %51,14 daha kısa sürede ulaşmıştır .Anahtar Kelimeler: Biyolojik ağ hizalama, biyolojik veritabanı hizalama, çizge hizalama, referans tabanlı indeksleme, QNET, Hadoop, ESBiD, en yüksek derecelidüğüm
dc.description.abstract	It is possible to explain the events occurring inside the cell, the smallest unit in living things, by observing biological networks. Similarity-based analysis is one of the techniques for biological network analysis. In this context, a database consisting of biological networks is aligned with a query network, and the networks having a similarity score higher and lower than a predefined cut-off value are separated. The exact similarity score of two networks needs to be known in the solution of this problem. Unfortunately, because of the NP-complete sub-graph isomorphism problem, this is computationally too expensive. Several methods are proposed in the literature to solve the graph alignment problem. QNET, which is one of these methods, is coded in Java using Hadoop framework in the scope of this thesis. For query networks with 7 nodes, Hadoop implementation with 10 machine cluster (18 cores) achieved 11,42 speedup. A new method called ESBiD, taking the `reference based indexing method` approach has been developed. Particularly, ESBiD focused on the weaknesses of RINQ, another reference based indexing method. To this end, by using heuristics, the number of networks in the twilight zone has been reduced by 29,85% with 93,22%accuracy, the reference network selection strategy has been changed and a new technique called `highest degree node` has been proposed in order to align the networks in the twilight zone faster. This technique reached 89,74% effectiveness in 51,14% runtime with respect to the QNET's exact alignment method.Keywords: Biological network alignment, biological database alignment, graph alignment, reference based indexing, QNET, Hadoop, ESBiD, highest degree node	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.subject	Biyoloji	tr_TR
dc.subject	Biology	en_US
dc.title	Biyolojik veritabanlarında etkin benzerlik hesaplama
dc.title.alternative	Effective similarity calculation in biological databases
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Mühendisliği Ana Bilim Dalı
dc.subject.ytm	Graphics
dc.subject.ytm	Bioinformatics
dc.subject.ytm	Parallel computing
dc.identifier.yokid	10013034
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	TOBB EKONOMİ VE TEKNOLOJİ ÜNİVERSİTESİ
dc.identifier.thesisid	346549
dc.description.pages	72
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10013034.pdf
Size:: 7.547Mb
Format:: PDF
Description:: File_10013034

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess