A new outlier detection method based on probabilistic outputs of support vector machines in binary classification

Ceesay, Habib

dc.contributor.advisor	Kardiyen, Filiz
dc.contributor.author	Ceesay, Habib
dc.date.accessioned	2020-12-10T12:49:33Z
dc.date.available	2020-12-10T12:49:33Z
dc.date.submitted	2019
dc.date.issued	2019-10-18
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/295430
dc.description.abstract	Hızla büyüyen veri teknolojisi ile, belirli özelliklere sahip bir gözlemin doğru sınıfa atanması bağlamında sınıflandırma Makine Öğrenmesi ve Uygulamalı istatistik alanlarında en önemli ve etkin araçlardan biri haline gelmiştir. Sınıflandırma, biyomedikal çalışmalar, genetik, sosyal bilimler, pazarlama gibi pek çok alanda kullanılmaktadır. Her bir gözlemin sağ-ölü, pozitif-negatif gibi iki kategoriden birine ait olduğu veriye ikili very denir. Destek Vektör Makineleri ilk olarak 1960'ların ortasında Vladimir Vapnik tarafından geliştirilen doğrusal olarak ayrılamayan veriyi sınıflandırmaya yardımcı Kernel fonksiyonlarının da kullanımı ile oldukça esnek bir istatistiksel modeller sınıfıdır. Ancak SVM verinin aykırı gözlem veya yanlış veri gibi kirlenmiş gözlem içermesinden olumsuz yönde etkilenebilir. Bu tez çalışmasında amaç, SVM'nin temiz ve kirli veri için sınıflandırma kesinliğini karşılaştırmak olup, çalışmada Destek Vektör Makinelerinin olasılıksal çıktılarına dayanan (PoC) yeni bir aykırı değer tespit yöntemi önerilmiştir. Önerilen yöntem ile Sağlam Mahalanobis uzaklığı (MCD) yönteminin aykırı gözlem tespit oranları karşılaştırılmıştır. Sonuçlar, önerilen yöntemin daha iyi performans gösterdiğini göstermiştir.
dc.description.abstract	With data growing so rapidly, classification has become one of the most important and effective tools in Machine Learning and Applied Statistics, in which a given observation can be predicted in the right class given some features. Classification is used in most sectors such as; Biomedical Studies, Genetics, Social Science, Marketing, etc. Data are said to be binary when each observation falls into one of two categories, such as: alive or dead, positive or negative, etc. Support Vector Machines are a class of statistical models first developed in the mid-1960s by Vladimir Vapnik and they are very flexible due to the incorporation of Kernel Functions which can help separate and classify data that are not linearly separable. However, Support Vector Machines can suffer a lot from unclean data containing, for example, outliers or mislabeled observations. The goal of this thesis is to compare the classification accuracy of the SVM on both clean and contaminated data and also a new method based on the probabilistic outputs of SVM (PoC) is proposed. The outlier detection rate for this new method and the Robust Mahalanobis distance (MCD) are compared. The results show that PoC performs better than MCD at detecting outliers.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	İstatistik	tr_TR
dc.subject	Statistics	en_US
dc.title	A new outlier detection method based on probabilistic outputs of support vector machines in binary classification
dc.title.alternative	İkili sınıflama probleminde aykırı gözlem tespiti için destek vektör makineleri olasılıksal çıktılarına dayalı yeni bir yöntem
dc.type	masterThesis
dc.date.updated	2019-10-18
dc.contributor.department	İstatistik Anabilim Dalı
dc.identifier.yokid	10270181
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	GAZİ ÜNİVERSİTESİ
dc.identifier.thesisid	569846
dc.description.pages	91
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10270181.pdf
Size:: 1.971Mb
Format:: PDF
Description:: File_10270181

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess