Diagnoses of coronary heart disease (CHD) using data mining techniques based on classification

Fayez, Mustafa Adil Fayez

dc.contributor.advisor	Ata, Oğuz
dc.contributor.author	Fayez, Mustafa Adil Fayez
dc.date.accessioned	2021-05-06T12:26:05Z
dc.date.available	2021-05-06T12:26:05Z
dc.date.submitted	2018
dc.date.issued	2018-11-23
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/588569
dc.description.abstract	dünyada çok fazla ilgiyi çekmiştir. Bu günlerde, veri madenciliği özellikle ticari ve medikal alanda olmak üzere birçok alanda kullanılmaktadır. Özellikle medikal alan, veri üretiminin sürekli olması ve farklı öznitelik çıkarımı yöntemlerinin bulunmasından dolayı hastalığın yayılmasına dair çözümler önermektedir. Veri madenciliği sınıflandırma teknikleri ve bir programlama dili kullanarak süreç için gereken maliyet ve zamanın daha iyi azaltılması için KKH teşhisine yardımcı olacak bir sistem tasarladık. Bu algoritmalar iyi sonuçlar ve yüksek doğruluk elde etmiştir. Çalışmamızı çeşitli KKH veri kümelerine uyguladık. Hungarian iki sınıflı verisetinde Rastgele Orman(Random Forest - RF) algoritması kullanılarak en iyi doğruluğu % 99 oranında elde ettik. Cleveland veri seti ile, aynı algoritmayı kullanarak % 94 oranında doğruluk elde ettik, kıyasladığımız bir başka çalışmadaki sonuçta aynı veri kümesinde elde ettikleri doğruluk oranı SVM algoritması ile % 58 idi. Ayrıca, Hungarian beş sınıflı veri kümesi ile kıyasladığımız önceki çalışmada SVM algoritması kullanılarak % 67 doğruluk oranı elde edilmişken biz Rastgele Orman(RF) algoritması ile %99 doğruluk oranı elde ettik. Buna ek olarak, AdaBoast algoritması ile Hungarian veri setinde %88 ve heart.csv veri setinde Logistic Regression algoritması ile %87 doğruluk oranı elde ettik. Ayrıca Switzerland veri seti ile Rastgele Orman(RF) algoritması kullanarak %95 ve Long-Beach veri seti ile aynı algoritmadan %91 doğruluk oranı elde ettik. Son olarak, Switzerland veri seti ile AdaBoost ve Logistic Regression algoritmaları ile %78, Long-Beach veri setinde AdaBoost algoritması ile %80, Logistic Regression algoritması ile %76, heart.csv veri setinde Logistic Regression ile %87 ve AdaBoost algoritması ile %86 doğruluk oranı elde ettik.xivBu çalışmada KKH için farklı veri setleri için ortak önişlem ve eğitim-test veri bölmesi kullandık. Bu işlem önceki çalışmadan önemli ölçüde farklıdır ve aynı KKH veri setleri ile elde edilen sonuçlardan daha başarılı sonuçlar almamıza katkıda bulunmuştur.Anahtar Kelimeler: KKH, Sınıflandırma teknikleri, Python, Veri madenciliği.
dc.description.abstract	Coronary heart disease (CHD) has attracted the most attention around the world because it leads to death. These days, data mining in many fields, including commercial fields and medical fields, where medical fields are the most productive of large data on a continuous basis, and which must find different ways to extract information, may be important in predicting the spread of this disease. We have designed a system to help the diagnosis of CHD with better reduction of costs and time required for the process by using a programing language with data mining classification techniques. These algorithms produced good results and high accuracy. We applied our study to various CHD datasets. We obtained the best accuracy at 99% through the use of the Random Forest (RF) algorithm with Hungarian two classes. With Cleveland, we obtained 94% accuracy using the same algorithm while the better accuracy with the same dataset in the previous study was 58% when using the SVM algorithm. Moreover, with the Hungarian five class dataset, we obtained 99% as the best accuracy using random Forest Classifier algorithm rather than the accuracy that was achieved with this dataset in previous work, which was close to 67% using the SVM algorithm. In addition, we obtained 88% as a better accuracy using the AdaBoost classifier with the Hungarian data set and 87% accuracy using the Logistic Regression classifier with the heart.csv dataset. With the Switzerland dataset, we had 95% as the best accuracy using Random Forest and 91% best accuracy with the Long-Beach dataset using the same classifier. Finally, with the Switzerland dataset, we achieved a 78% better accuracy using the AdaBoost and Logistic Regression classifier. With Long-Beach, we had 80% using the AdaBoost classifier and 76%xiiusing the Logistic Regression classifier. Also with the heart.csv dataset, we achieved 87% best accuracy using the Logistic Regression classifier and 86% accuracy when using the AdaBoost classifier. We used a train test split and preprocessing for the CHD dataset in this study and processed the missing values that were found with attributes with a less complicated system. This process differs significantly from previous study is proposed results and accuracy for this purpose with the same CHD dataset.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Diagnoses of coronary heart disease (CHD) using data mining techniques based on classification
dc.title.alternative	Sınıflandırma temelli veri madenciliği teknikleri kullanılarak koroner kalp hastalığı (KKH) tanısı
dc.type	masterThesis
dc.date.updated	2018-11-23
dc.contributor.department	Bilişim Teknolojileri Ana Bilim Dalı
dc.identifier.yokid	10208408
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	ALTINBAŞ ÜNİVERSİTESİ
dc.identifier.thesisid	520268
dc.description.pages	82
dc.publisher.discipline	Bilişim Teknolojileri Bilim Dalı

Files in this item

Name:: yokAcikBilim_10208408.pdf
Size:: 1.389Mb
Format:: PDF
Description:: File_10208408

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess