Prognozda ve tanıda lojistik regresyon ve neural network yaklaşımı: Epidemiyolojik veri setlerinde karşılaştırılmalı uygulamalar

Ünal, İlker

View/Open

File_175460 (3.806Mb)

Date

2004

Author

Ünal, İlker

Metadata

Show full item record

Abstract

ÖZET PROGNOZDA VE TANIDA LOJİSTİK REGRESYON VE NEURAL NETWORK YAKLAŞIMI: EPİDEMİYOLOJİK VERİ SETLERİNDE KARŞILAŞTIRMALI UYGULAMALAR Hastalık tanısı koymak (diagnoz) (veya hastalıkta prognozu etkileyen faktörleri saptamak) bir çok tıbbi araştırmanın amacı olmuştur. Bu alanda yapılan çalışmalarda çeşitli bir çok istatistiksel yönteme başvurulmuştur. Yanıt değişkeni ikili (binary) bir değişken ise lojistik regresyon modelleri veya tekrarlı bölme (recursive partitioning) yöntemleri, eğer yanıt değişkeni birden fazla seçeneği içeriyorsa çoklu lojistik regresyon modelleri standart yöntemler arasındadır. Bununla birlikte son zamanlarda Neural Networks yaklaşımı da bu yöntemlere eklenmiştir. Neural Networks yaklaşımı genellikle bir tahmin veya sınıflandırma ya da kontrol problemlerinde kullanılmaktadır. Bu tezde iki yöntem, performansları yönünden farklı büyüklük ve özellikteki örneklemlerde, karşılaştırılıp, hangi durumda hangi yöntemin kullanılması ile ilgili karar verilmeye çalışılmıştır. Bu karşılaştırmalar Lojistik Regresyon, Neural Networks ve Neural Networks' ün farklı bir uygulaması olan Genetik Algoritmalı Neural Networks sonuçlarını içermektedir. Performansın doğru ölçülebilmesi için farklı özelliklerde oluşturulan örneklemler eğitim ve test alt örneklemlerine bölünmüştür. Örneklemler büyüklüğüne göre büyük, orta ve küçük olmak üzere, astımlı öğrenci oranına göre 0.1, 0.2, 0.3, 0.4 ve 0.5 olmak üzere toplam 8 örneklemden oluşmuştur. Yöntemlerin karşılaştırılmasında, elde edilen Receiver Operating Characteristic (ROC) eğrileri altında kalan alanlar ve duyarlılık-seçicilik değerleri kullanılmıştır. Büyüklük farklılıklarına göre en yüksek alan değerleri, büyük veri setinin eğitim ve test örneklemlerinde 0.848 ve 0.825 ile Lojistik Regresyona, orta büyüklükteki veri setinin eğitim ve test örneklemlerinde 0.882 ve 0.735 ile Neural Networks'e ve küçük veri setinin eğitim örnekleminde 1.000 ile Genetik Algoritmalı Neural Networks'e, test örnekleminde ise 0.902 ile Lojistik Regresyona aittir. Ancak bu değerlerin güven aralıkları birbirlerini içerdiği için yöntemler arasında fark bulunamamıştır. Oran farklılıklarına göre ise tüm eğitim örneklemlerinde Genetik Algoritma yüksek değeri almıştır. Astımlı oranı 0.3 ve 0.3'ten yukarıda olan 3 örneklemdeki Genetik Algoritma eğitim sonuçları, güven aralıkları incelendiğinde, Lojistik Regresyon sonuçlarından daha iyidir. Tüm test örneklemlerinde ise Lojistik Regresyon daha yüksek alan değerleri almasına rağmen güven aralıklarına bakıldığında yöntemler arasında fark bulunamamıştır. Bu tez, eldeki verilere en iyi uyarlanmış model seçimi konusundaki kararın, aynı amaç için kullanılabilecek tüm analizlerin sonucuna bakılmadan verilemeyeceğini göstermiştir. Anahtar Sözcükler : Duyarlılık - Seçicilik, Genetik Algoritma, Lojistik Regresyon, Neural Networks, ROC Eğrileri. ıx

ABSTRACT COMPARATIVE STUDIES OF LOGISTIC REGRESSION AND NEURAL NETWORK IN THE DIAGNOSIS AND THE PROGNOSIS: APPLICATIONS WITH EPIDEMIOLOGIC DATA Several medical studies are conducted with a purpose for diagnosis and prognosis for many diseases. In these studies, several statistical approaches are used such as logistic regression model with a binary response, recursive partitioning, multiple logistic regression model when the response is more than binary (multi response). In addition to these approaches, a new method, Neural Networks have been recently used by researchers. In general, prediction (forecasting) or classification or controlling is the main use of this methodology. In this thesis, two methodologies, Logistic Regression and Neural Networks, were compared with using randomly chosen samples that are different in size and in properties. In these comparisons ordinary Logistic Regression, Neural Networks and Neural Networks with Genetic Algorithm were used. Samples were divided into two as training and test sets. Samples were used with different sizes as large (n = 3300), medium (n = 1000) and small (n = 100) and with different properties as having different proportions of students with asthma (p = 0.1, 0.2, 0.3, 0.4 or 0.5). The comparisons among three methods were primarily based on analysis of the receiver operating characteristic (ROC) curves as well as a number of scalar performance measures pertaining to the classification contingency tables. For all samples none of methods outperformed one another when confidence interval for ROC curves areas were used as comparison. When using sample large in size, the largest ROC curve areas were obtained from Logistic Regression with 0.848 in training and 0.825 in testing sets. When using sample medium in size, Neural Networks had the largest area with 0.882 and 0.735 in training and testing sets respectively. In the case of small sample, Neural Networks with Genetic Algorithm had the area 1.000 in the training set but in the test set, Logistic Regression outperformed the others with the area 0.902. For samples with different properties, in all training Neural Networks with Genetic Algorithm had taken the lead. Moreover, samples in which asthma prevalence is greater than or equal to 0.3, it can be said that Neural Networks with Genetic Algorithm is superior to the others according to the confidence interval of areas. On the other hand, although Logistic Regression got the better results in the test set, three methods were similar. In this thesis, it is concluded that the decision as to choosing the model that fit data best can not be finalized before all analysis with possible competing methods are performed. Keywords: Genetic Algorithm, Logistic Regression, Neural Networks, ROC Curves, Sensitivity and Specificity

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/127569

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/embargoedAccess