Nadir olaylarda cezalandırılmış lojistik regresyon yöntemlerinin karşılaştırılması

Nazman, Ezgi

dc.contributor.advisor	Erbaş, Semra
dc.contributor.author	Nazman, Ezgi
dc.date.accessioned	2020-12-10T12:45:09Z
dc.date.available	2020-12-10T12:45:09Z
dc.date.submitted	2019
dc.date.issued	2020-03-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/293681
dc.description.abstract	İkili lojistik regresyon (LR) yöntemi, yanıt değişkeni iki mümkün sonuca sahip olduğunda yaygın olarak kullanılan çok değişkenli istatistiksel bir yöntemdir. Örnek hacminin küçük ve ilgilenilen olayın nadir olduğu durumlarda LR yöntemi için en çok olabilirlik tahminleri tam olarak elde edilememektedir. Firth (1993), hem bu tahmin problemini hem de birinci mertebeden asimtotik yanlılığı ortadan kaldıran cezalandırılmış bir yöntem olarak Firth lojistik regresyon (FLR) yöntemini önermiştir. Daha sonra, FLR yöntemi kestirilen olasılıklara ilişkin sonuçlarda aşırı tahmine sebep olduğu için, Puhr ve diğerleri (2017) sabit terim düzeltmeli Firth lojistik regresyon (FLIC) yöntemini önermiştir. Öte yandan, ilgilenilen olay nadir iken açıklayıcı değişkenler arasında çoklu bağlantının olduğu durum için Shen ve Gao (2008) iki kat cezalandırılmış lojistik regresyon (DPLR) yöntemini önermişlerdir. Ancak, bu yöntem yine de kestirilen olasılıklarda aşırı tahmine sebep olmaktadır. Bu çalışmada, FLIC ve DPLR yöntemlerinden yola çıkarak sabit terim düzeltmeli iki kat cezalandırılmış lojistik regresyon (MDPLR) yöntemi yeni bir yaklaşım olarak önerilmiştir. MDPLR yöntemi ile LR, Ridge lojistik regresyon (RLR), FLR, DPLR, zayıflatılmış Firth lojistik regresyon (WFLR), FLIC ve eş değişken eklenmiş Firth lojistik regresyon (FLAC) yöntemleri parametreye ilişkin ortalama tahmin edilen yan, ortalama kestirilen olasılık yan, standart hatalar ve ortalama RMSE bakımından karşılaştırılmıştır. Modelde farklı sayıda açıklayıcı değişken olduğu durumlar ele alınarak, farklı örnek hacimleri ve farklı nadir olay oranlarına ek olarak çoklu bağlantının olduğu ve olmadığı durumlar için detaylı bir Monte Carlo simülasyon çalışması yürütülmüştür. Gözlemsel veriye dayalı ters koşullu dağılım kullanılarak ele alınan veri üretim yaklaşımı, literatürde ilk kez cezalandırılmış LR yöntemleri için kullanılmıştır. Ayrıca, gerçek bir veri seti ile simülasyon sonuçları değerlendirilmiştir. Sonuçlara göre, parametreye ilişkin istatistiksel çıkarsama yapmak için FLAC, DPLR ve MDPLR yöntemlerinin kullanılması ve kestirilen olasılık üzerine çalışmalar için FLIC, FLAC, RLR ve MDPLR yöntemlerinin kullanılması önerilmiştir.
dc.description.abstract	Binary logistic regression (LR) method is a widely used multivariate statistical method when response variable has two possible outcomes. Maximum likelihood estimations cannot be exactly obtained for LR method in the cases where sample size is small and event of interest is rare. Firth (1993) suggested Firth's logistic regression (FLR) as a method which eliminates both this estimation problem and first order term of the asymtotic bias. Then, Puhr et al. (2017) suggested Firth's logistic regression with intercept correction because FLR method causes overestimation in predicted probability. On the other hand, Shen and Gao (2008) suggested double penalized logistic regression (DPLR) method for the case where rare event and multicollinearity occur simultaneously. However, this method has already caused overestimation in predicted probability. In this study, DPLR with intercept correction (MDPLR) method was suggested as a new approach considering FLIC and DPLR methods. MDPLR method was compared with LR, Ridge logistic regression (RLR), FLR, DPLR, weakened FLR (WFLR), FLIC and Firth's logistic regression with added covariate (FLAC) in terms of parameter estimation bias, average predicted probability bias, standard errors and average root mean suared error (RMSE). A detailed Monte Carlo simulation study was conducted considering that there are different number of explanatory variables in the model for multicollinearity and non-multicollinearity cases in addition to the different sample sizes and rare event rates. Data generation approach using inverse conditional distribution based on observational data was first used in the literature for penalized LR methods. Besides, simulation results were evaluated with a real data set. According to the results, it is recommended to use FLAC, DPLR, and MDPLR methods for statistical inferences on the parameter and to use with FLIC, FLAC, RLR, and MDPLR methods for studies on predicted probability.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	İstatistik	tr_TR
dc.subject	Statistics	en_US
dc.title	Nadir olaylarda cezalandırılmış lojistik regresyon yöntemlerinin karşılaştırılması
dc.title.alternative	Comparison of penalized logistic regression methods in rare events
dc.type	doctoralThesis
dc.date.updated	2020-03-06
dc.contributor.department	İstatistik Anabilim Dalı
dc.identifier.yokid	10306578
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	GAZİ ÜNİVERSİTESİ
dc.identifier.thesisid	614134
dc.description.pages	131
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10306578.pdf
Size:: 8.376Mb
Format:: PDF
Description:: File_10306578

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess