SVM classification for imbalanced datasets with multi objective optimization framework

Öztürk, Ayşegül

View/Open

File_349080 (680.3Kb)

Date

2009

Author

Öztürk, Ayşegül

Metadata

Show full item record

Abstract

Negatif sınıf (çoğunluk sınıfı) örneklerinin pozitif sınıf (azınlık sınıfı) örneklerinden fazla olduğu Dengesiz Veri Kümelerinde sınıflandırma önem taşımaktadır. Bu tip, dağılımları dengesiz olan veri kümelerine gerçek hayat problemlerinde sıklıkla rastlanmaktadır. Ancak veri kümesinin dengesiz olması durumunda sıklıkla bilinen sınıflandırma algoritmalarının performansı düşmektedir. Literatürde, dağılımın dengesiz olması durumundan ortaya çıkan problemleri çözmek için veri seviyesinde veya algoritma seviyesinde çeşitli çözüm yaklaşımları sunulmuştur. Veri seviyesindeki yaklaşımlar genellikle, çoğunluk sınıfından bazı örnekler eleyerek veya azınlık sınıfındaki bazı örnekleri yineleyerek veri kümesinin dağılımını dengelemeyi amaçlarlar. Algoritma seviyesindeki yaklaşımlar ise önerilen algoritmaya veya altında yatan modele sapma ekleyerek sınıflandırma performansını arttırmayı amaçlarlar. Güçlü bir teorik alt yapıya sahip olan SVM' lerde de veri kümesinde dengesizlik olduğu durumlarda performansta ciddi bir düşüş gözlenmektedir.Bu çalışma SVM' lerin dengesiz veri kümelerindeki sınıflandırma performansını iyileştirmeyi amaçlamaktadır. Sunulan yöntem L1 Norm SVM formulasyonuna her iki sınıf için hata toplamlarını birbirinden ayrı bir şekilde dahil ederek üç ölçüt fonksiyonuna sahip bir optimizasyon problemi yaratmaktadır. SVM' lerin çok ölçütlü yapısından dolayı çözüm yaklaşımı Çok Ölçütlü Optimizasyon temelleri üzerine kurulmuştur. Sunulan yöntem, problem formulasyonunu alternatif iki ölçüt fonksiyonlu formulasyonlara indirgemeyi ve etkin kümeyi sistematik bir şekilde incelemeyi önermektedir. Etkin kümeyi sistematik bir şekilde incelemek, yalnızca empirik olarak belirlenmiş kısıtlı sayıda parametreye değer vererek problemi değerlendiren yöntemlerin aksine, önemli sayıda parametreyle sistematik olarak problemi değerlendirmeyi sağlar. Böylece, önerilen yöntem daha az hesaplama zamanı ile daha çok parametre değeri kullanarak SVM' in performansında artış sağlamaktadır. Sonuçlar sıklıkla kullanılan üç metrik cinsinden rapor edilmiştir ve tüm deneyler ayrıntılı bir şekilde tartışılmıştır.

Classification of imbalanced datasets in which negative instances, also called majority class, outnumber the positive instances, also called minority class, is a significant challenge. These kind of datasets are commonly encountered in real-life problems. However, performance of well-known classifiers are limited in case there exists imbalance in the dataset. Various solution approaches are proposed in the literature, applied on either data-level or algorithm-level to address the problems that arise in case of imbalance. Data-Level approaches mainly aim to balance the distribution of the dataset either by eliminating some instances of majority class or by replicating some instances of minority class. On the other hand, Algortihm-Level approaches either bias the algorithm proposed or adjust some parameters in order to bias the underlying model. Support Vector Machines (SVMs) that have a solid theoretical background also encounter a dramatic decrease in performance when the distribution of the datasets is imbalanced.The objective of this study is to improve the classification performance of SVMs for imbalanced datasets. The method proposed is based on modifying L1 Norm SVM formulation to create a three objective optimization problem so as to incorporate into the formulation the error sums for the two classes independently. Motivated from the multi objective nature of the SVMs, the solution approach uses the fundamentals of Multi Objective Optimization. The proposed method suggets to reduce the problem formulation into two criteria variations and to investigate the efficient frontier systematically. Investigating the efficient frontier by a systematic procedure leads the method to evaluate the problem for a remarkable set of parameters rather than adjusting a few parameters empirically as in the existing approaches. Therefore the proposed method improves the performance of a SVM by decreasing the computational effort needed for evaluating the problem for the same amount of parameters. The results are reported in terms of three widely used metrics and computational experiments are discussed in detail.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/170396

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess