Veri madenciliğinde kullanılan kestirim yöntemlerinin performanslarının karşılaştırlması

Gültürk, Esra

View/Open

File_10116547 (3.896Mb)

Date

2016

Author

Gültürk, Esra

Metadata

Show full item record

Abstract

Bu çalışmada, literatürde eksikliği fark edilen `Destek Vektör Regresyon, Random Forest ve Regresyon Ağacı` yöntemlerinin kestirim performanslarının kıyaslanması amaçlanmıştır. Bağımlı değişkeni kategorik ve sürekli değişken olarak alıp hem sınıflama, hemde regresyon yöntemlerinin kestirim performansları incelenmiştir. Bu amaçla, Cumhuriyet Üniversitesi Tıp Fakültesi Enfeksiyon Hastalıkları ve Çocuk Sağlığı Hastalıkları servisinde yatan kırım kongo kanamalı ateş tanısı ile tedavi gören 2009-2011 yılları arası tüm hasta bireylerin verileri servis kayıtlarından alınmıştır. Bu üç yıl içerisindeki toplam 245 hastaya ait 6125 veri girişi yapılmıştır. Çalışmada yetişkin, çocuk ve tüm hasta olmak üzere toplam üç grup hasta verisi kullanılmıştır. Regresyon modellerinin performanslarını karşılaştırmak için hata kareler ortalaması ve açıklayıcı yüzdesine bakılmıştır. Sınıflamada modellerin karşılaştırılmalarına bakmak için duyarlılık, kesinlik, doğruluk oranı ve F ölçütüne bakılmıştır. Gerçek veri seti için regresyon yöntemlerinden, her üç gruptada destek vektör regresyon açıklayıcılık yüzdesi en fazla, hata kareler ortalaması en az olan regresyon modeli olarak bulunmuştur. Simülasyon çalışmasında, her bir senaryo 1000 kez tekrar edilmiş, her bir tekrarda sözü edilen regresyon yöntemleri uygulanmıştır. Senaryo yapılarına göre en iyi regresyon yöntemi destek vektör regresyon olarak bulunmuştur.

In this study, performance comparison of estimation methods as `Support Vector Regression, Random Forest and Regression Tree` were aimed. By taking categorical and continuous variables as dependent variable, performances of classification and regression estimation methods were examined. For this purpose, data of all patients, who were hospitalized with the diagnosis of crimean-congo haemorrhagic fever between 2009 and 2011 years in Cumhuriyet University Faculty of Medicine, Infectious Diseases and Children's Health ward, were obtained from the service records. 6125 data entry of 245 patient's were made within three years. In this study, three sets of data including adults, children and all patients were used. To compare the performances of regression models, mean square error and explanatory percentage were examined. Sensitivity, precision, accuracy and F measure were examined to look into comparison of models in classification. For real data set in all of three groups, explanatory percentage of support vector regression was maximum, mean square error of support vector regression was minimum. In the simulation study, each scenario was repeated 1000 times, relevant regression methods were applied in each repetition. According to the scenario structures, support vector regression was the best regression method.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/608913

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess