Analysis of machine learning-based spam filtering techniques

Nazli, Nazli

dc.contributor.advisor	Doğdu, Erdoğan
dc.contributor.advisor	Choupanı, Roya
dc.contributor.author	Nazli, Nazli
dc.date.accessioned	2020-12-04T11:16:47Z
dc.date.available	2020-12-04T11:16:47Z
dc.date.submitted	2018
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/77672
dc.description.abstract	Bu tezde, otamatik spam eposta filtreleme problem çalışıldı. Bazı varolan makina öğrenme algoritmaları açık bir veri seti üzerinde test edildi ve sonuçlar analiz edildi. Geliştirilen metotlar makina öğrenme ve yazı sınıflandırma teknikleri kullanılarak geliştirildi. Değişik veri setleri ve test metotları karşılaştırıldı. Ağırlıklı TF-IDF, SciKit Learn tabanlı ve Word2Vec vektörizasyonu kullanarak problem çözüm için metotlar geliştirildi. Eposta yazıları için farklı vektör gösterim metotları geliştirildi ve denetimli makina öğrenme algoritmaları ile epostalar spam veya ham olarak sınıflandırıldı. WEKA yazılım aracı kullanılarak epostaların vektör gösterimleri üzeride makina öğrenme sınıfladırma metotları uygulandı. Sınfılandırma için Destek Vektör Mekanizması SVM (POLY), SVM (RBF), Naive Bayes, Bayesian Ağları, J48 ve Rastgele Orman algoritmaları kullanıldı. Sınıflandırma yöntemlerinden elde ettiğimiz sonuçları karşılaştırdık ve analiz ettik. Sonuçlarımız Word2Vec vektörü ile SVM (Poly) algoritmasının 300 e-posta veri kümesi için 98.33% spam algılama hassasiyeti ile en iyi performansı göstermektedir.
dc.description.abstract	In this thesis, automatic spam e-mail detection problem is examined. Some existing machine learning algorithms are tested on an open dataset and the results are analyzed. The methods we developed have been implemented using machine learning and text classification techniques. We have used different data sets to develop and test the methods. The proposed methods for solving the problem are based on using weighted TF-IDF, SciKit Learn and Word2Vec vectorization. We developed and used vector representation methods for email text and then used supervised machine learning algorithms to classify emails as spam or ham. We used WEKA software tool to apply machine learning classification methods on vector representations of email. For classifications, we used the algorithms Support Vector Mechanism SVM (POLY), SVM (RBF), Naive Bayes, Bayesian Networks, J48 and Random Forest algorithms. We compared and analyzed the results we obtained from the classification methods. Our results show that the Word2Vec vector and the SVM (poly) algorithm perform better with 98.33% spam detection accuracy for 300 email data set.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Analysis of machine learning-based spam filtering techniques
dc.title.alternative	Makine öğrenme tabanlı spam filtreleme teknikleri analizi
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Mühendisliği Anabilim Dalı
dc.identifier.yokid	10185425
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	ÇANKAYA ÜNİVERSİTESİ
dc.identifier.thesisid	495963
dc.description.pages	79
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10185425.pdf
Size:: 2.983Mb
Format:: PDF
Description:: File_10185425

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess