Adaptive anti-spam filtering based on Turkish morphological analysis, artificial neural networks and Bayes filtering

Özgür, Levent

View/Open

File_144710 (2.242Mb)

Date

2003

Author

Özgür, Levent

Metadata

Show full item record

Abstract

ÖZET TÜRKÇE MORFOLOJİK ÇÖZÜMLEME, YAPAY SINIR AĞLARI VE BAYES FİLTRELEME TABANLI UYARLAMALI SPAM-ÖNLER FİLTRELEMESİ Bu çalışmada, Yapay Sinir Ağları ve Bayes Filtresi uygulamalarını temel alarak Türkçe spam mesajlarını filtreleyen bir algoritma ortaya koyuyoruz. Sonuçta, Microsoft Outlook ile bağlantılı çalışabilen bir spam-önler filtresi ortaya çıktığı için, ürün son kul lanıcıya yöneliktir ve bundan dolayı kullanıcıya özeldir; program kullanıcının spam ve normal mesajlarını öğrenerek kendini her kullanıcı için uyumlu hale getirir. Algorit mamızın iki temel kısmı var: birinci kısım Türkçe kelimelerin morfolojisini incelerken ikinci kısım morfolojik incelemeden gelen kelime köklerini kullanarak spam mesajları filtreler. Öğrenme algoritmalarının girdi vektörleri iki şekilde belirlenir: ikili model ve olasılık modeli. Bu çalışmada, Yapay Sinir Ağlarının iki yapısı kullanılmıştır: tek katmanlı ve çok katmanlı algılayıcı birimleri. Bayes Filtresi de üç değişik yaklaşımla gerçekleştirilmiştir: ikili model, olasılık modeli ve ileri olasılık modeli. Bu çalışma için 750 (410 spam, 340 normal) mesaj kullanılmıştır. Filtrelemede yüzde 90'dan yüksek başarı oranı elde edilmiştir.

IV ABSTRACT ADAPTIVE ANTI-SPAM FILTERING BASED ON TURKISH MORPHOLOGICAL ANALYSIS, ARTIFICIAL NEURAL NETWORKS AND BAYES FILTERING We propose an anti-spam filtering algorithm that is used for Turkish language based on Artificial Neural Networks and Bayes Filter. The final product is an anti-spam filtering program which works compatible with Microsoft Outlook so it is user-specific, thus adapts itself with the characteristics of incoming e-mails. The algorithm has two parts: the first part deals with morphology of Turkish words. The second part classifies the e-mails by using the roots of words extracted by the morphology part. The input vectors to the learning algorithms are chosen with two models: binary model and probabilistic model. Two structures of ANN are employed in this study: single layer perceptron and multi layer perceptron. Bayes Filter is also implemented with three different approaches: binary Model, Probabilistic Model, Advance Probabilistic Model. Spam detection performance of the proposed system is improved by including non-Turkish words. A total of 750 mails (410 spam and 340 normal) are used in the experiments. A success rate over 90 per cent is achieved.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/78366

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/embargoedAccess