Gen ifade verileri ile işlemsel kanser sınıflandırılması

İdil, Namik Bariş

dc.contributor.advisor	Gasılov, Nızamı
dc.contributor.author	İdil, Namik Bariş
dc.date.accessioned	2020-12-04T08:42:35Z
dc.date.available	2020-12-04T08:42:35Z
dc.date.submitted	2009
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/67082
dc.description.abstract	Son yıllardaki bilgisayar teknolojilerinde elde edilen gelişmeler, özellikle işlemci gücünün artması, önceleri gerçekleştirilebilen sade, doğrusal modeller yerine fiziksel ve gerçek olayları daha iyi yansıtan; ama daha fazla bellek ve zaman gerektiren doğrusal olmayan modellerin kullanılmasına imkan yaratmıştır.Bu çalışma, A. Statnikov'un, mikrodizi gen ifade verileri kullanarak çok kategorili kanser sınıflandırması ile ilgili çalışması ve bu çalışmadan elde edilmiş sonuçlar üzerine önerilmiş olan optimizasyon çalışmalarını kapsamaktadır [1]. Mikrodizi analizi ile elde edilmiş gen ifade verilerinin üzerinde, destek vektör makinesi ile analiz edilmeden önce, doğrusal ve doğrusal olmayan indirgeme yöntemleri kullanılarak, verilerin eğitilme ve test sürecinin hızlandırılması amaçlanmıştır. Uygulanması amaçlanan indirgeme yöntemleri, bir dizi algoritmanın yanı sıra, bu algoritmaların probleme yönelik yeni yorumlamalarıyla yapılmış, daha sonra bu yöntemler karmaşıklık, kaynak kullanımı ve indirgeme performansı göz önünde bulundurularak test edilmiştir. Böylece, eğitim ve test işlemlerinin performans ve başarı oranlarını kabul edilebilir düzeyin üstünde tutmak koşuluyla, veri kümelerindeki nitelik sayısını küçülterek, işlem hızının arttırılması amaçlanmıştır.Yapılan testlerin sonucunda, gen ifade verilerinin bulunduğu veri kümesi üzerinden yapılan Bağımsız Bileşen Analizi (BBA), Çekirdek Temel Bileşen Analizi (ÇTBA), İz Düşümü Takip Analizi (İDTA) indirgeme algoritmaları üzerine oluşturulmuş programların, veri kümesindeki nitelik sayısının aşırı yüksek olmasından dolayı kilitlendiği ya da hafıza yetersizliğinden dolayı olağandışı sonlandırıldığı tespit edilmiştir. Diğer algoritmalar olan Temel Bileşen Analizi (TBA), Doğrusal Olmayan Temel Bileşen Analizi (DOTBA), Kendi Düzenlenen Haritalar (KOH), Doğrusal Diskriminant Analizi (DDA) ve Korelasyon Analizi (KA) ile yapılan nitelik indirgemeleri sonucu, karar destek vektör makinesinin eğitim sürelerinin değişken olarak azaldığı görülmüştür. Buna dayanarak, çalışmada kullanılan veri kümesinin içerdiği niteliklerin büyük bir kısmının, veri kümesinin destek vektör makinesindeki eğitim ve test performansına çok az etkisi olduğu, ayırt edici özellikler taşımadığı veya bazı niteliklerin bir araya gelerek, tüm kümeyi temsil edebilen bir alt grup oluşturabildiğini, bu yüzden etkisiz niteliklerin ya da nitelik alt gruplarının indirgeme algoritmaları kullanılarak orijinal veri kümesinden çıkarılmasının, maliyet ve süre açısından yararlı olacağı anlaşılmıştır.
dc.description.abstract	Recent improvements in computer technologies, especially significant increase in processing power of central processing units, leads to usage of non ? linear models which represents physical and abstract problems better but require more memory and time, instead of simple, linear models.This study focuses on A. Statnikov?s article about multicategory cancer classification using of microarray gene expression data and optimization suggestions [1]. Before the training of support vector machines with the gene expression data which is gathered by microarray analysis, it is intented to accelerate the training and test speed process with both linear and non ? linear reduction methods. Reduction methods which are intented to be used are both implemented by using some algorithms and new interpretation of these algorithms. After that, these methods are tested according to their complexity, resource allocation and reduction performance. Therefore, by keeping the performance and success ratios of training and testing process above an acceptable treshold, it is intented to reduce the feature size in data sets as it will also increase the overall speed of the process.The results of the test show that, Independent Component Analysis (ICA), Kernel Principle Component Analysis (KPCA), Projection Pursuit Analysis (PPA) reduction algorithms used on data set failed to give any results due to excessive amount of features in data set by either locking down or terminating itself.With the usage of other algorithms which are Principle Component Analysis (PCA), Non ? Linear Principle Component Analysis (NLPCA), Self Organizing Maps (SOM), Linear Discriminant Analysis (LDA) and Correlation Analysis (CA), it is observed that the training and testing process times of the support vector machine is reduced variably. Taking this into consideration, most of the the features of the data set which is used in this study do not have any differentiative property and therefore have low - level of effect on the training and testing of the support vector machine. On the other hand, some features may become high ? level effective when combined together and form a sub group feature sets. So, by eliminating low ? level effective features and revealing high ? effective sub group features by feature selection and feature reduction, a significant improvement in both cost and time consume can be established.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Biyoistatistik	tr_TR
dc.subject	Biostatistics	en_US
dc.title	Gen ifade verileri ile işlemsel kanser sınıflandırılması
dc.title.alternative	Operational cancer classification using gene expression data
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Mühendisliği Anabilim Dalı
dc.subject.ytm	Bioinformatics
dc.identifier.yokid	340654
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	BAŞKENT ÜNİVERSİTESİ
dc.identifier.thesisid	237762
dc.description.pages	107
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_340654.pdf
Size:: 1.883Mb
Format:: PDF
Description:: File_340654

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess