Veri madenciliğinde farklı karar ağaçları ve k-en yakın komşuluk yöntemlerinin incelenmesi: kadın hastalıkları ve doğum verisinde bir uygulama

Elasan, Sadi

dc.contributor.advisor	Keskin, Sıddık
dc.contributor.author	Elasan, Sadi
dc.date.accessioned	2020-12-10T11:18:09Z
dc.date.available	2020-12-10T11:18:09Z
dc.date.submitted	2019
dc.date.issued	2020-01-30
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/258454
dc.description.abstract	Veri madenciliğinde, sınıflandırma amacıyla kullanılan algoritmalar genel olarak; `denetimsiz (unsupervised)` ve `denetimli (supervised)` olmak üzere iki başlık altında incelenebilir. Denetimli veri madenciliğinde `karar ağaçları (decision trees)` ve `k-en yakın komşu (k-nearest neighbor KNN)` algoritmaları; parametrik olmayan yöntemler arasında olup, tahmin edici özelliğe sahiptir. Sınıflandırma amacıyla uygulanan bu algoritmalarla, çalışmadaki cevap değişkeni (bebeklerin doğum ağırlığı) üzerine etkili olan açıklayıcı değişkenler belirlenmiştir. Karar ağaçlarından; `CART, CHAID, Ayrıntılı CHAID, QUEST, Rastgele Orman ve C4.5` algoritmaları kullanılmıştır. K-en yakın komşu algoritmasında; `Öklid` ve `Manhattan (City block)` uzaklık ölçüleri kullanılarak uygulama yapılmıştır. Sınıflandırma performansları göz önüne alınarak, en iyi tahmin değerini veren algoritmalar belirlenmeye çalışılmıştır. Bu sonuçlara göre; Duyarlık (Sensitivity) ölçütü bakımından en yüksek tahmin oranı %88.4 ile `CART` algoritmasında gözlenmiştir. Özgüllük (Specificity) ölçütü bakımından en yüksek tahmin oranı %98.2 ile `Rasgele Orman` algoritmasında görülmüştür. Genel doğruluk ölçütü bakımından ise en yüksek tahmin oranı %94.5 ile `C4.5` algoritmasında gözlenmiştir. Risk (hata) tahmin ölçütü bakımından en düşük algoritma, %5.6 ile `C4.5` algoritması olmuştur. Genel olarak sonuçlar incelendiğinde; tüm algoritmaların `iyi sınıflandırma, yüksek tahmin ve düşük hata oranı` ile çalıştığı söylenebilir. Ayrıca bu çalışma, yeni doğacak bebeklerin doğum ağırlığının, düşük doğum ağırlığında olup olmayacağına erken karar verme ve böylece koruyucu tedbirlerin alınması açısından araştırmacılara katkı sağlayabilir.Anahtar kelimeler: Çapraz Geçerlik, Denetimli Yöntemler, Öklid Uzaklığı, Risk Tahmini, Sınıflama
dc.description.abstract	In data mining, the algorithms used for classification can generally be examined under two headings as `unsupervised` and `supervised`. `Decision trees` and `k-nearest neighbor (KNN)` algorithms in supervised data mining; nonparametric methods and has predictive feature. With these algorithms applied for classification purposes, explanatory variables which are most effective on the answer variable in the study (birth weight of babies) have been determined. From decision trees; `CART, CHAID, exhaustive CHAID, QUEST, Random Forest and C4.5` algorithms have been used. In k-nearest neighbor algorithm; `Euclidean` and `Manhattan (City block)` distance measurements have been applied. Considering the classification performances, it has been tried to determine optimal estimation algorithms. According to these results; the highest estimation rate in terms of sensitivity has been observed in the `CART` algorithm with 88.4%. The highest estimation rate in terms of specificity criterion has been seen 98.2% in the `Random Forest` algorithm. The highest estimation rate in terms of accuracy criterion has been seen 94.5% in the `C4.5` algorithm. The lowest rate in terms of the risk estimate has been observed in the `C4.5` of 5.6%. When the results are examined in general; it can be said that all algorithms work with `good classification, high estimation and low error rate`. In addition, this study may contribute to early investigations of the birth weight of newborn babies, whether it is low birth weight or not, and thus taking preventive measures.Keywords: Cross Validation, Supervised Methods, Euclidean Distance, Risk Estimation, Classification	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Biyoistatistik	tr_TR
dc.subject	Biostatistics	en_US
dc.title	Veri madenciliğinde farklı karar ağaçları ve k-en yakın komşuluk yöntemlerinin incelenmesi: kadın hastalıkları ve doğum verisinde bir uygulama
dc.title.alternative	Investigation of different decision trees and k-nearest neighbor methods in data mining: An application on gynecology and birth data
dc.type	doctoralThesis
dc.date.updated	2020-01-30
dc.contributor.department	Biyoistatistik Anabilim Dalı
dc.subject.ytm	Bioistatistics
dc.subject.ytm	Data mining
dc.subject.ytm	Decision tree
dc.subject.ytm	Bioistatistics
dc.subject.ytm	Statistical methods
dc.subject.ytm	Cross validity
dc.subject.ytm	Risk forecasting
dc.subject.ytm	Classification
dc.identifier.yokid	10247956
dc.publisher.institute	Sağlık Bilimleri Enstitüsü
dc.publisher.university	VAN YÜZÜNCÜ YIL ÜNİVERSİTESİ
dc.identifier.thesisid	549742
dc.description.pages	97
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10247956.pdf
Size:: 3.171Mb
Format:: PDF
Description:: File_10247956

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess