Feature subset generation for ensemble learning using feature clustering and mutual information

Amar, Hana

View/Open

File_10123286 (966.1Kb)

Date

2016

Author

Amar, Hana

Metadata

Show full item record

Abstract

Topluluk öğrenmesi (TÖ) gözetimli makine öğrenmesi problemlerinin çözümünde uygulanan en etkili yöntemlerinden birisidir. Bu tez çalışmasında TÖ yönteminin uygulanabilmesi için bir öznitelik kümesinden çok sayıda öznitelik kümesioluşturulması amacıyla kümeleme ve öznitelik seçimi algoritmalarının kullanıldığı bir yöntem önerilmiştir. Bu amaçla öncelikle öznitelikler kümelenmiş ve çok sayıda öznitelik alt kümesi elde edilmiştir. Daha sonra bu öznitelik alt kümeleri bireysel tahminlerin elde edilebilmesi için destek vektör makineleri (DVM) sınıflandırıcısına beslenmiş ve çoğunluk oylaması tekniğiyle birleştirilmiştir. Bu aşamadan sonra, bu sınıf tahminleri minimum Artıklık-Maksimum İlgililik (mAMİ) öznitelik seçimi yöntemine verilmiş ve öznitelik alt kümeleri TÖ yöntemi için çok önemli faktörler olan farklı ve nitelikli alt kümeler oluşturulması için mAMİ skorlarına göre sıralanmıştır. Biyomedikal veri kümeleri üzerinde yapılan deneysel çalışmalar yöntemimizin tekil küme başarımlarını geliştirdiğini göstermektedir.Anahtar Kelimeler: Topluluk Öğrenmesi, Öznitelik Kümeleme, Öznitelik Alt Kümesi Oluşturulması, Minimum Artıklık-Maksimum İlgililik Algoritması, Desktek Vektör Makineleri.

Ensemble Learning (EL) is considered one of the most effective techniques which is applied to address supervised machine learning problems. In this thesis, we used clustering and feature selection algorithms in order to generate multiple feature subsets from a single feature set to apply EL method. For this purpose, first we clustered the features and obtained many feature subsets. Then, we fed these subsets of features to support vector machine classifier (SVM) to get individual class predictions and combined those predictions using majority voting. After that, we gave the predictions to minimum Redundancy-Maximum Relevance algorithm (mRMr) feature selection algorithm and ranked the feature subsets according to their mRMR scores for generating diverse and accurate subsets which are vital factors for EL. Experimental results on various biomedicaldatasets show that our method improves the single set accuracies.Key words : Ensemble Learning (EL), Feature Clustering, Feature Subset Generation (VG), minimum Redundancy-Maximum Relevance algorithm, Support Vector Machine (SVM).

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/60019

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess