Gizli dirichlet ayrımı ve Word2vec yöntemlerinin birleşimi ile özgün bir metin temsil modeli geliştirilmesi

Çelenli, Halil İbrahim

dc.contributor.advisor	İlhan Omurca, Sevinç
dc.contributor.advisor	Ganiz, Murat Can
dc.contributor.author	Çelenli, Halil İbrahim
dc.date.accessioned	2020-12-29T12:17:52Z
dc.date.available	2020-12-29T12:17:52Z
dc.date.submitted	2020
dc.date.issued	2020-08-11
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/410902
dc.description.abstract	Son zamanlarda veri miktarındaki artış ile derin öğrenme, makine öğrenmesinin en popüler alanı olmaya başlamıştır. Bu artış ile Doğal Dil İşleme alanında da yeni yöntemlerin geliştirilmesini sağlamıştır. Metinsel verilerin temsil edilmesi, geleneksel yöntemler üzerinde Kelime Çantası Modeli gibi kelime temsil yöntemleri kullanılarak temsil edilir. Fakat yeni yöntemler üzerinde hızlı ve verimli olabilmesi için kelime kalıplama yöntemleri kullanılmaya başlanmıştır. Kelime kalıplama yöntemlerinin en popüler olanı Word2vec yöntemidir. Word2vec yöntemi kelimelerin bağlamlarındaki istatistiklere bakarak, yapay sinir ağlarını kullanarak her kelime için bir vektör gösterimini öğrenmektedir. Dokümanların temsil edilmesi için ise Doc2vec olarak bilinen kelime kalıplama yöntemi temelli yöntem kullanılmaktadır. Konu modelleme teknikleri ise kelimelerin konu olasılık dağılımları üzerinde rastgele bir araya gelerek dokümanları oluşturmaktadır. En sık kullanılan modeli Gizli Dirichlet Ayırımı (LDA) modelidir. LDA modeli konuların dokümanlar üzerindeki dağılımı ile kelimelerin konular üzerindeki dağılımı olmak üzere 2 farklı dağılım üretmektedir.Tez çalışması içerisinde Word2vec yöntemi, LDA model dağılımları ile birleştirip yeni bir kelime kalıplama vektörü geliştirilmiştir. Bu sayede dokümanlar daha iyi temsil edilmiştir. Geliştirilen yöntem ile doküman temsilinde kullanılan Doc2vec yöntemleri sınıflandırma algoritmaları kullanılarak karşılaştırılmıştır. Sınıflandırma sonucunda geliştirilen yöntemin sonuçları iyileştirdiği ve model karmaşıklığını azalttığı gösterilmiştir.
dc.description.abstract	Recently, with the increase in the amount of data, deep learning has become the most popular field of machine learning. With this increase, new methods have been developed in the field of Natural Language Processing. Representation of textual data is represented on traditional methods using word representation methods such as the Bag of Words model. However, word embeddings methods are use in order to be fast and efficient on new methods. The most popular method of word embeddings is Word2vec. The Word2vec method learns to view a vector for each word using artificial neural networks, looking at the statistics in the context of the words. For the representation of the documents, the word embedding method known as Doc2vec is use.Topic modeling techniques are randomly generated on the topic probability distributions of the words and establish the documents. The most commonly used model is the Latent Dirichlet Allocation (LDA). The LDA model produces 2 different distributions, the distribution of topics on documents and the distribution of words on topics.This thesis, a new word embedding vector was developed by combining the Word2vec method with the LDA model distributions. In this way, the documents are better represented. The developed method and Doc2vec methods document representation were compared using classification algorithms. It has been shown that the method developed as a result of classification improves results and reduces model complexity.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Gizli dirichlet ayrımı ve Word2vec yöntemlerinin birleşimi ile özgün bir metin temsil modeli geliştirilmesi
dc.title.alternative	Combining latent dirichlet allocation and Word2vec for a novel document representation model
dc.type	masterThesis
dc.date.updated	2020-08-11
dc.contributor.department	Bilgisayar Mühendisliği Anabilim Dalı
dc.identifier.yokid	10329980
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	KOCAELİ ÜNİVERSİTESİ
dc.identifier.thesisid	629631
dc.description.pages	58
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10329980.pdf
Size:: 1.158Mb
Format:: PDF
Description:: File_10329980

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess