A lexicon based method for subjectivity and sentiment analysis using an Arabic twitter corpus

Al-Buhruzi, Naseer Mohammed Jasim

dc.contributor.advisor	Görür, Abdül Kadir
dc.contributor.author	Al-Buhruzi, Naseer Mohammed Jasim
dc.date.accessioned	2020-12-04T11:19:07Z
dc.date.available	2020-12-04T11:19:07Z
dc.date.submitted	2017
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/77929
dc.description.abstract	Sosyal medya için duyarlılık analizi, her alanda veri madenciliği yapmak için çok ilginç bir alandır. Bu nedenle, kullanıcılar tarafından her gün itilen büyük miktarda veriyi kapsayacak şekilde bu alanda sürekli araştırma yapılmaktadır. Arapça, sosyal medyada kullanılan on önemli dillerden biridir; Bu sebeple karar verme konusundaki ilginin her yere ihtiyacı vardır. Twitter, kullanıcılar arasındaki görüş ve fikir alışverişi için bir platform sağlar; gelecekteki kararların gelişimine ve planlanmasına yönelik bir bilgi tabanı oluşturmak için önde gelen kararlar verir. Çalişmamızda lexicon temelli yaklaşımı kullanarak sınıflamanın yüksek doğruluk derecesine sahip modellerin nasıl elde edileceğini sunuyoruz ve gösteriyoruz. Yaklaşımımız Arapça kelimeler için önişleme adımlarından başlayarak üç aşamada uygulanmaktadır. İkinci aşamada, istatistiksel ve semantik yönelimlerle ilgili daha fazla özellik çıkarılması tartışılmaktadır. Ayıklanan özelliklerin (ağırlık, puan ve olumsuzlama) açıkça yararlı olabilecek iki Arapça sözlüğün türüne nasıl bağlı olduğunu gösteriyoruz. Son olarak, üçüncü aşama, performans ölçümleri üzerinde daha fazla etkiye sahip özellikleri bulmak için Bilgi Kazanım Özellik Değerlendirme ve Sıralama yöntemi ile bir özellik seçme yöntemi uygular. Yüksek sıralamaya sahip özellikleri korur ve veri kümesinden düşük sıralamaya sahip olanları kaldırırız. Son iki aşamada, değerlendirmelerimizi, K-Nearest Neighbor ve Naive Bayes olmak üzere iki makine-öğrenme algoritması kullanarak tüm görevler için yerine getiriyoruz. Sınıflandırma doğruluğunun, Naïve Bayes sınıflandırıcısı ile skor özellikli 93.56'ya ulaştığı tespit edildi ve bu görev, seçilen iki makine öğrenme modelinden hangisinin Arapça tweetler için daha uygun olduğunu belirledi.Anahtar kelimeler: Arapça duygu analizi, sözlüğe dayalı, özellik çıkarma, Özellik seçimi, KNN, Naïve Bayes, sıralaması, bilgi kazanma özelliği.
dc.description.abstract	Sentiment analysis for social media is an interesting area of data mining for decision making in various domains. Therefore, continuous research is carried out in this area to cover the huge amount of data being pushed by users. Arabic is one of the ten important languages used in social media; therefore, interest in decision making anywhere needs knowledge about this. Twitter provides a platform for the exchange of opinions and ideas among users, leading decision making to building a knowledge base towards the development and planning of future outcomes. We present and illustrate how to obtain models with a high accuracy of classification by using the Lexicon-based approach. Our approach is implemented in three phases, beginning with preprocessing steps for Arabic words. The second phase discusses the extraction of more features relating to statistical and semantic orientations. We demonstrate how the extracted features (weight, score and negation) depend on two types of Arabic lexicon being clearly useful. Finally, the third phase applies a feature selection method with the Information Gain attribute evaluation and Ranker search method to find the features that have greater impact on the performance measures. We keep the features that have high rankings and remove those that have low rankings from the dataset. In the last two phases, we carry out our evaluations for all tasks using two machine-learning algorithms, namely K-Nearest Neighbor and Naïve Bayes. The accuracy for classification was found to have reached 93.56 with the Naïve Bayes classifier with a score feature, and this task determined which one of the two selected machine-learning models is more suitable for classifying the sentiment of Arabic tweets. Keywords: Arabic sentiment analysis, lexicon-based, feature extraction, feature selection, KNN, Naïve Bayes, Ranker, information gain attribute.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	A lexicon based method for subjectivity and sentiment analysis using an Arabic twitter corpus
dc.title.alternative	Arapça twitter korpusu ile öznellik ve sentıment analizi için sözlük tabanlı yöntem
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Matematik Anabilim Dalı
dc.identifier.yokid	10160090
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	ÇANKAYA ÜNİVERSİTESİ
dc.identifier.thesisid	495905
dc.description.pages	81
dc.publisher.discipline	Bilgi Teknolojileri Bilim Dalı

Files in this item

Name:: yokAcikBilim_10160090.pdf
Size:: 2.337Mb
Format:: PDF
Description:: File_10160090

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess