Yeni bir çapraz geçerleme yöntemi ve biyoinformatik alanında öznitelik seçimi üzerinde uygulanması

Alptekin, Ahmet

dc.contributor.advisor	Kurşun, Olcay
dc.contributor.author	Alptekin, Ahmet
dc.date.accessioned	2020-12-07T13:26:34Z
dc.date.available	2020-12-07T13:26:34Z
dc.date.submitted	2012
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/152351
dc.description.abstract	Birini Dışarıda Bırak ve K-Kat çapraz geçerleme yöntemleri model ve öznitelik seçimi için en sık kullanılan yöntemlerdendir. Bu yöntemlerde veri seti rastgele birçok gruba ayrılır ve bu gruplar içinden bir grup, sırasıyla, test için ayrılırken diğer gruplar ise eğitim için kullanılırlar.Bu tezde, bu fikri geliştirerek ?Birini Bilerek Iskala? (kısaca, BBI) isminde yeni bir çapraz geçerleme yaklaşımı sunulmaktadır. Bu yöntem, her parçayı (bir örnek veya bir grup) sırasıyla dışarıda bırakmak ve test sırasında doğru bilinip bilinemediğini ölçmek yerine; onu ıskalayıp, diğer bir deyişle, yanlış sınıf bilgisi ile eğitim kümesinde tutarak, buna rağmen, test sırasında doğru bilinip bilinemediğini ölçmektedir. Prensip olarak, K ? 1 tane iyi parça ve bu kötü parçayı kullanarak, sınıflandırıcı üzerinde küçük bir ?deneme? etkisi oluşturabilir ve eğitimden sonraki test aşamasında yanlış etiketli verdiğimiz örnekler için doğru sınıf bilgilerinin kestirilebilirliğini, bir genelleme ölçütü olarak, değerlendirebiliriz. İdeal bir çapraz geçerleme yönteminin, önceden verilmeyen örneklerin doğru sınıflandırılıp sınıflandırılmadığını sınamak yanında, veriye empoze edilen ?öğretici gürültüsünün model tarafından ne kadar tolere edebildiğini de ölçmesini isteriz.Önerilen yöntem, UCI yapay öğrenme veri ambarında yer alan beş farklı veri kümesi üzerinde model değerlendirmesi ve öznitelik seçimi problemlerine uygulanmıştır. Son olarak İstanbul Üniversitesi DETAE tarafından, Epilepsi ve Behçet hastalıkları için oluşturulan veri kümelerinde TNP (tek nükleotid polimorfizmi) seçimi için de etkili bir yöntem olduğu gösterilmiştir.
dc.description.abstract	Leave-one-out (LOO) and K-Fold cross-validation methods are among the most frequently used methods for model evaluation and feature selection. These well-known methods are based on the idea that the sample is divided into many parts, each of which is, in turn, to be left out as test and use the other parts for training. In this thesis, improving this idea, we propose a new cross validation approach that we called miss-one-out (MOO). This new method miss-labels example(s) in each fold and keeps this fold in the training set rather than leaving it out so that it can test whether or not the trained classifier can correct the erroneous label of the training sample. In principle, having only one fold deliberately labeled incorrectly should have only a small effect on the classifier that uses this bad-fold along with K ? 1 good-folds and as a measure of its generalization, we can utilize how much it can correct these wrongly provided labels. What we would like in the ideal case is that the cross validation method should not only test the model for correct classification of new test examples, but also test the capability of the model for tolerating the ?teacher noise? imposed into the dataset.The proposed method is applied for model evaluation and feature selection on five distinct benchmark datasets from the UCI Machine Learning Repository. Finally, MOO cross-validation has been demonstrated to be an effective method for SNP (single-nucleotide polymorphism) selection on original Epilepsy and Behcet datasets collected by DETAE in Istanbul University.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Yeni bir çapraz geçerleme yöntemi ve biyoinformatik alanında öznitelik seçimi üzerinde uygulanması
dc.title.alternative	A novel cross validation method and an application to feature selection in bioinformatics
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Mühendisliği Anabilim Dalı
dc.subject.ytm	Pattern recognition
dc.subject.ytm	Bioinformatics
dc.identifier.yokid	423959
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	İSTANBUL ÜNİVERSİTESİ
dc.identifier.thesisid	305457
dc.description.pages	52
dc.publisher.discipline	Diğer

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess