Evaluation of the relationship between the stability of feature selection techniques and classification performance in data mining

Büyükkeçeci, Mustafa

dc.contributor.advisor	Okur, Mehmet Cudi
dc.contributor.author	Büyükkeçeci, Mustafa
dc.date.accessioned	2021-05-08T12:06:57Z
dc.date.available	2021-05-08T12:06:57Z
dc.date.submitted	2019
dc.date.issued	2020-01-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/698118
dc.description.abstract	Her yıl üretilen ve depolanan veri miktarı üstel olarak artmaktadır. Hem veri kümelerihem de veri kümesi boyutlarındaki yaşanan bu önemli artış, veri analizi tekniklerinive algoritmalarını olumsuz yönde etkileyerek karmaşık modellerin üretilmesine,performans kayıplarına ve artan hesaplama maliyetlerine neden olmuştur. Buproblemlerin önlenmesi ve üstesinden gelinmesi için, Öznitelik seçimi gibi, çeşitli veriönişleme teknikleri geliştirilmiştir. Boyut küçültme (indirgeme) tekniği olan öznitelikseçimi, sınıflandırıcıların analiz kalitesini, verimliliğini ve genelleme kapasitesinigeliştirmek, hesaplama maliyetlerini azaltmak ve yüksek sınıflandırma veyakümeleme doğruluğuna sahip basit ve anlaşılabilir modeller oluşturmak için kullanılır.Öznitelik seçim algoritmaları tarafından elde edilen öznitelik altkümelerininsınıflandırma veya kümelenme performanslarının yanı sıra, öznitelik seçimalgoritmasının kararlılığı veya sağlamlığı da test edilmelidir. Kararlılık, öznitelikseçim algoritmasının eğitim setinde yapılan değişikliklere karşı hassasiyetininölçüsüdür. Düşük hassasiyete sahip algoritma, yani kararlı bir algoritma, eğitimkümesinde yapılan her değişiklikten sonra aynı veya çok benzer sonuçlar (öznitelikaltkümeleri veya sıraları) verirken, yüksek hassasiyete sahip algoritma, yani kararsızbir algoritma, her değişiklikten sonra farklı sonuçlar verir. Kararsız bir algoritmatarafından üretilen sonuçlar değişken olacağından, sınıflandırma modellerininoluşturulmasında kullanılacak sonuçların (öznitelik kümesinin) seçilmesini ve girdi veçıktılar arasındaki ilişkinin kurulmasını zorlaştırır. Öznitelik seçim algoritmasına olangüveni sarsar. Bu nedenle, algoritma kararlılığı öznitelik seçim algoritmaları içinönemli bir başarı kriteridir. Bu tezde kararlılık ile sınıflandırma performansı arasındakiilişkiyi belirlemek ve yorumlamak için toplam yedi filtreleyen (T-Testi,viiBhattacharyya, Wilcoxon, ROC, Entropi, ReliefF ve Karar Ağacı Topluluğu) ve ikiardışık seçim (Ardışık İleri Öznitelik Seçimi (SFS) ve Ardışık Geri Öznitelik Seçimi(SBS)), veya sarmalayan, öznitelik seçimi algoritması, on iki kararlılık ölçüsü, üçsınıflandırıcı ve yedi gerçek dünya veri kümesi kullanılmıştır.
dc.description.abstract	Each year the amount of data produced and stored increases exponentially. Thissignificant increase in both datasets and dataset sizes adversely affects data analysistechniques and algorithms, results in the production of complex models, performancelosses and increased computational costs. Various data preprocessing techniques, suchas feature selection, have been developed to prevent and overcome these problems.Feature selection, which is a data size (dimension) reduction technique, is used toimprove analysis quality, efficiency and generalization capacity of classifiers, toreduce computational costs and to create simple and understandable models that havehigh classification or clustering accuracy. Besides the classification or clusteringperformances of the feature subsets obtained by the feature selection algorithms,stability, i.e., robustness, of the feature selection algorithm should also be tested.Stability is the measure of the sensitivity of the feature selection algorithm against thechanges (perturbations) made on the training set. Algorithm with low sensitivity, i.e.,a stable algorithm, produces the same or very similar results (feature subsets or ranks)after each change done in the training set, whereas algorithm with high sensitivity, i.e.,an unstable algorithm, produces different results after each change. Since the resultsproduced by an unstable algorithm will be variant, it makes it difficult to select theresult set (feature set) to be used in building classification models and to establish therelationship between inputs and outputs. This undermines trust in the feature selectionalgorithm. Therefore, algorithm stability is an important success criterion for featureselection algorithms. In this thesis, a total of seven filter (T-Test, Bhattacharyya,Wilcoxon, ROC, Entropy, ReliefF and Decision Tree Ensemble) and two sequential(Sequential Forward Feature Selection (SFS) and Sequential Backward FeaturevSelection (SBS)), or wrapper, feature selection algorithms, twelve stability measures,three classifiers and seven real-world datasets were used to determine and interpret therelationship between feature selection algorithm stability and classificationperformance.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Evaluation of the relationship between the stability of feature selection techniques and classification performance in data mining
dc.title.alternative	Veri madenciliğinde öznitelik seçim tekniklerinin kararlılıkları ve sınıflandırma performansları arasındaki ilişkinin değerlendirilmesi
dc.type	doctoralThesis
dc.date.updated	2020-01-06
dc.contributor.department	Bilgisayar Mühendisliği Ana Bilim Dalı
dc.identifier.yokid	10295760
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	YAŞAR ÜNİVERSİTESİ
dc.identifier.thesisid	599735
dc.description.pages	146
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10295760.pdf
Size:: 2.496Mb
Format:: PDF
Description:: File_10295760

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess