Discovering discriminative and class-specific sequence and structural motifs in proteins

Meydan, Cem

dc.contributor.advisor	Sezerman, Osman Uğur
dc.contributor.author	Meydan, Cem
dc.date.accessioned	2020-12-10T07:34:54Z
dc.date.available	2020-12-10T07:34:54Z
dc.date.submitted	2013
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/216878
dc.description.abstract	Biyolojik motiflerin keşfi biyoinformatik için önemli problemlerden biridir. Bu tür motifler, dizilerin sınıflandırılması, veri madenciliği ve rasyonel protein mühendisliği gibi amaçlarla kullanılabilir. Bu tez, proteinlerin dizi ve yapısal özelliklerinden ayrımcı motiflerin bulunması ve makine öğrenimi yöntemlerinin araştırma ve geliştirilmesinde kullanılmak üzere daha iyi bir temel oluşturma amacı barındırmaktadır.Bu tez, çeşitli biyolojik problemlere uygulanabilirliği olan makine öğrenim yapı blokları önermektedir. Öğrenim algoritmalarının girdisi ideal olarak yalnızca biyolojik veri örneklemleri ve bu örneklerin ait olduğu sınıf verileri olmalıdır. Bu girdiye denk gelen çıktı ise bu sınıfları ayıran faktör ve motifler olmalıdır (rastgele olmayan, makul sınıf tanımları için). Bu ideal iş akışı iki ana adıma ihtiyaç duyar. Birinci adım, biyolojik örneklerin araştırma için önem arz eden özelliklerle temsil edilmesidir. Makromoleküller kompleks üç boyutlu yapılar olduğu için, bu komplike gösterimin soyutlaştırılarak makine öğrenimi ve motif keşfi için kullanmaya daha uygun sayısal ve simgesel temsillere dönüştürülmesi gerekmektedir. İkinci adım ise bu temsili gösterimler üzerinde kullanılmaya uygun motif keşfi ve makine öğrenimi algoritmalarının geliştirilmesidir. Bir algoritma ilk adımda çıkartılan tanıtıcı temsilleri kullanalarak sınıflandırıcı ve ayırt edici motifleri keşfedebilmelidir.Bu çalışma ile çeşitli makine öğrenimi yöntemlerinde kullanılmak üzere bir çok yeni protein temsil yöntemleri; ve bu temsil sistemleri ile çalışmak üzere iki ayrı motif keşif yöntemi (zamana bağlı motif madenciliği ve derin öğrenim temelli motif keşfi) geliştirilmiştir. Bu temsil ve öğrenim algoritmaları yaşam bilimlerinde karşılaşılan çeşitli hesaplamalı problemlere uygulanmıştır.
dc.description.abstract	Finding recurring motifs is an important problem in bioinformatics. Such motifs can be used for any number of problems including sequence classification, label prediction, knowledge discovery and biological engineering of proteins fit for a specific purpose. Our motivation is to create a better foundation for the research and development of novel motif mining and machine learning methods that can extract class-specific and discriminative motifs using both sequence and structural features.We propose the building blocks of a general machine learning framework to act on a biological input. This thesis present a combination of elements that are aimed to be applicable to a variety of biological problems. Ideally, the learner should only require a number of biological data instances as input that are classified into a number of different classes as defined by the researchers. The output should be the factors and motifs that discriminate between those classes (for reasonable, non-random class definitions). This ideal workflow requires two main steps. First step is the representation of the biological input with features that contain the significant information the researcher is looking for. Due to the complexity of the macromolecules, abstract representations are required to convert the real world representation into quantifiable descriptors that are suitable for motif mining and machine learning. The second step of the proposed workflow is the motif mining and knowledge discovery step. Using these informative representations, an algorithm should be able to find discriminative, class-specific motifs that are over-represented in one class and under-represented in the other.This thesis presents novel procedures for representation of the proteins to be used in a variety of machine learning algorithms, and two separate motif mining algorithms, one based on temporal motif mining, and the other on deep learning, that can work with the given biological data. The descriptors and the learners are applied to a wide range of computational problems encountered in life sciences.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.subject	Biyoistatistik	tr_TR
dc.subject	Biostatistics	en_US
dc.subject	Biyomühendislik	tr_TR
dc.subject	Bioengineering	en_US
dc.title	Discovering discriminative and class-specific sequence and structural motifs in proteins
dc.title.alternative	Proteinler içinde sınıflandırıcı dizisel ve yapısal motiflerin keşfedilmesi
dc.type	doctoralThesis
dc.date.updated	2018-08-06
dc.contributor.department	Diğer
dc.subject.ytm	Constraint based sequential pattern mining
dc.subject.ytm	Multimodal representation
dc.subject.ytm	Information display
dc.subject.ytm	Protein motifs
dc.subject.ytm	Data mining
dc.identifier.yokid	10011180
dc.publisher.institute	Mühendislik ve Fen Bilimleri Enstitüsü
dc.publisher.university	SABANCI ÜNİVERSİTESİ
dc.identifier.thesisid	389489
dc.description.pages	244
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10011180.pdf
Size:: 5.895Mb
Format:: PDF
Description:: File_10011180

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess