Genetik alanında elde edilen verilerin makine öğrenimi algoritmaları yardımıyla karşılaştırılarak en etkin yöntemin belirlenmesi

Koç, Senem

dc.contributor.advisor	Tomak, Leman
dc.contributor.advisor	Karabulut, Erdem
dc.contributor.author	Koç, Senem
dc.date.accessioned	2021-05-08T10:58:02Z
dc.date.available	2021-05-08T10:58:02Z
dc.date.submitted	2019
dc.date.issued	2021-03-05
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/677528
dc.description.abstract	Amaç: Makine Öğrenimi (MÖ) sağlık alanında karmaşık veri setlerini çözmek için farklı yöntemler sunmaktadır. Bu çalışmanın amacı sınıflama için kullanılan MÖ algoritmaları ile Super Learner (SL) algoritmasının performansının farklı özellikte genetik veriler üzerinde karşılaştırılmasıdır.Materyal ve Metot: MÖ için farklı sınıflama algoritmaları kullanılmakta olup, bunlar K En Yakın Komşuluğu (EYK), Naive Bayes (NB), Destek Vektör Makineleri (DVM) ve Rastgele Orman (RO)'dır. Algoritmaların performansları eğri altında kalan alan (EAA) ile değerlendirilmiştir. Çalışmada dengesiz tipteki veriler için yeniden örnekleme yöntemleri kullanılmıştır. Veriyi analize hazırlamak için ön-işleme adımları uygulandıktan sonra, eğitim ve test verisi farklı oranlarda ayrılmıştır. Çalışmada genetik bilgiler içeren, örnek büyüklükleri 587 infertilite verisi ile 174 olan peridontitis veri seti ve iki farklı büyüklükte benzetim veri seti bulunmaktadır. Analizler için R yazılımı kullanılmıştır.Bulgular: Analiz sonucunda en iyi performanslar, infertilite veri seti %80-%20 olarak ayrıldığında EAA için DVM'de %96, dengesiz veri özellikleri dikkate alındığında %60-%40 olarak ayrıldığında EAA için Sentetik Azınlık Yukarı Örnekleme Tekniği- EYK'de %96 ve SL'de %97 olarak elde edildi. Peridontitis veri seti %60-%40 olarak ayrıldığında EAA için RO %85 ve SL'de aynı sonuç saptandı. İlk benzetim verisi için %60-%40 olarak ayrıldığında EAA için NB'de %78 ve SL'de %81 elde edildi. İkinci benzetim verisi için tüm bölünmelerde NB'de %84 ve SL'de yaklaşık %86 di. Sonuç: Bu çalışmada MÖ algoritmaları farklı veri setleri üzerinde farklı bölünme oranları ile değerlendirilmiştir. Sonuç olarak SL algoritmasının aynı ya da daha iyi performans gösterdiği saptanmıştır. SL algoritması temel öğreticiler arasında asimtotik olarak aynı ya da tüm öğreticiler arasında en iyi performansı vermektedir.
dc.description.abstract	Aim: Machine Learning (ML) offers different methods to solve complex data sets in the field of health. The aim of this study is to compare the performances of ML algorithm used for classification and Super Learner (SL) algorithm on different genetic data. Material and Method: Different classification algorithms are used for ML. K Nearest Neighbour (KNN), Naive Bayes (NB), Support Vector Machines (SVM) and Random Forest (RF) algorithms were used within the context of this study. Performances of the algorithms were assessed with area under curve (AUC). In the study, resampling methods were used for unbalanced data. Pre-processing steps were applied for analysis, the training and test data were divided in different proportions. Infertility data with a sample size of 587 and periodontitis data set with a sample size of 174, which included genetic information, and two simulation data sets with different sizes were used for analyses. R software was used for analyses. Results: As a result of the analyses, the best performances were found in SVM for AUC as 96% when infertility data set was divided as 80%-20%, and when unbalanced data were taken into consideration as 96% in KNN with Syntetic Minority Over- Sampling Technique when it was divided as 60%-40% and 97% in SL for AUC. When periodontitis data set was divided as 60%-40%, they were found as 85% in RF and SL for AUC. They were as 78% in NB when divided as 60%-40% and 81% in SL for AUC for the first simulation data. For second simulation data, they were for all divisions 84% in NB for AUC and 86% in SL. Conclusion: In this study, machine learning algorithms were assessed with different division rates on different data sets. As a conclusion, SL algorithm was found to show as well as or better performance. According to the theory of SL, it performs as well as or better than any of the candidate learners.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Biyoistatistik	tr_TR
dc.subject	Biostatistics	en_US
dc.title	Genetik alanında elde edilen verilerin makine öğrenimi algoritmaları yardımıyla karşılaştırılarak en etkin yöntemin belirlenmesi
dc.title.alternative	Assesing the most effective methods by comparing machine learning algorithms for data obtained in the field of genetics
dc.type	doctoralThesis
dc.date.updated	2021-03-05
dc.contributor.department	Biyoistatistik ve Tıp Bilişimi Ana Bilim Dalı
dc.subject.ytm	Genetics
dc.subject.ytm	Machine learning
dc.subject.ytm	Machine learning methods
dc.subject.ytm	Health information systems
dc.subject.ytm	Data processing
dc.identifier.yokid	10319405
dc.publisher.institute	Sağlık Bilimleri Enstitüsü
dc.publisher.university	ONDOKUZ MAYIS ÜNİVERSİTESİ
dc.identifier.thesisid	618082
dc.description.pages	117
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10319405.pdf
Size:: 6.539Mb
Format:: PDF
Description:: File_10319405

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess