Spectral methods for outlier detection in machine learning

Erdoğan, Göker

View/Open

File_433346 (1.526Mb)

Date

2012

Author

Erdoğan, Göker

Metadata

Show full item record

Abstract

Aykırılıklar verinin genelinden önemli farklılık gösteren örneklerdir. Gerçek yaşamda karşımıza çıkan pek çok uygulamada aykırı örneklerin bulunması hem kavramsal hem de eylemsel açıdan değerli bilgi taşıdıkları için önemlidir. İzgesel yöntemler yüksek boyutlu verilerdeki düşük boyutlu yapıları ortaya çıkarabilen gözetimsiz öğrenme yaklaşımlarıdır. Bu yöntemlerden Temel Bileşenler Çözümlemesi (TBÇ), Laplasyen Özharitalar (LÖH) ve Çok Boyutlu Ölçekleme incelenerek ortak bir çatı altında sunulmaktadır. Bu çalışmada, izgesel yöntemlerin boyut düşürme özelliklerinin aykırılık bulmakta değerli olduğu öne sürülmekte ve aykırılık bulma öncesinde izgesel yaklaşımla veriyi dönüştüren izgesel aykırılık bulma yöntemi önerilmektedir. Etkin-Aykırı, Yerel Aykırılık Etkeni, Tek Sınıflı Karar Vektör Makineleri ve Parzen Pencereleri aykırılık bulma yöntemleri olarak kullanılmakta ve bu yöntemler Temel Bileşenler Çözümlemesi (TBÇ), Laplasyen Özharitalar (LÖH) ve Çok Boyutlu Ölçekleme'yle birleştirilerek farklı veri kümeleri üzerinde aykırılık bulma başarımı sınanmaktadır. Deney sonuçları özellikle LÖH izgesel yönteminin başarımı artırdığını göstermektedir. Sonrasında, LÖH yöntemini aykırılık bulma için değerli kılan özgün özellikleri tartışılmaktadır. Önerdiğimiz yaklaşım yüz tanıma problemine de uygulanarak, öne sürülen yöntemin geçerliliği doğrulanmaktadır. Ayrıca, bu alandaki araştırmalarda kullanılmak için, aykırılık bulma ve izgesel yöntemlerin gerçeklenmesini içeren bir MATLAB kütüphanesi de bu tez ile paylaşılmaktadır.

Outliers are those instances in a sample that deviate significantly from the others. Their identification bears much importance since they carry valuable and actionable information in many real life scenarios. Spectral methods are unsupervised learning techniques that reveal low dimensional structure in high dimensional data. We analyze spectral methods, such as, Principal Components Analysis (PCA), Laplacian Eigenmaps (LEM), Kernel PCA (KPCA), Multidimensional Scaling (MDS) and present a unified view. We argue that the ability of such methods to reduce dimensionality is valuable for outlier detection. Hence, we propose spectral outlier detection algorithms where spectral decomposition precedes outlier detection. The four outlier detection methods we use are Active-Outlier, Local Outlier Factor, One-Class Support Vector Machine and Parzen Windows. We combine these methods with the spectral methods of LEM and MDS to form our algorithm. We evaluate the performance of our approach on various data sets and compare it with the performance of outlier detection without spectral transformation and with PCA. We observe that combining outlier detection methods with LEM increases the outlier detection accuracy. We discuss how the unique characteristics of LEM make it a valuable spectral method for outlier detection. We also confirm the merits of our approach on a face detection problem. Additionally, we provide an outlier detection toolbox in MATLAB that will be useful for researchers in this field containing the implementations of the outlier detection algorithms and the spectral methods discussed in this thesis.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/74560

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess