Learning word representations with deep neural networks for Turkish

Dündar, Enes Burak

dc.contributor.advisor	Alpaydın, Ahmet İbrahim Ethem
dc.contributor.author	Dündar, Enes Burak
dc.date.accessioned	2020-12-04T10:08:58Z
dc.date.available	2020-12-04T10:08:58Z
dc.date.submitted	2019
dc.date.issued	2019-04-30
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/72387
dc.description.abstract	Bu çalışmada, Türkçe metinlerde kullanılan sözcük gösterim yöntemlerinin (word-2vec, fastText ve ELMo) analizine yönelik bir çalışma yapılmıştır. Sözcük gösterimleri, sözcükleri yüksek boyutlu vektör uzayında göstermek için kullanılır. Benzer anlamdaki sözcüklerin bu uzay içinde yakın yerlerde konumlanması amaçlanır. Sözcük vektörleri metin sınıflandırma ve çeviri gibi alanlarda kullanılabilir. Farklı boyutlardaki Türkçe derlemler üzerinde word2vec, fastText ve ELMo yöntemleri üzerinde deneyler yapılıp sözcük çantası yöntemiyle karşılaştırılmıştır. Word2vec yöntemi sözcük seviyesinde çalışırken, fastText harf seviyesindeki gösterimleri kullanarak sözcükleri temsil edebilmektedir. ELMo, cümledeki bağlam bilgisini kullanarak sözcük vektörleri oluşturur. Word2vec ve fastText yöntemleri ise bağlam bilgisini kullanamaz. Öğrenilen sözcük vektörleri sözdizimsel ve anlamsal sınama kümelerinde ve konu sınıflandırmada karşılaş-tırılmıştır. Deneylerimiz, fastText modelinin konu sınıflandırma konusunda, word2vec modelinin ise anlam benzeşmelerinde daha başarılı olduğunu göstermektedir.
dc.description.abstract	In this study, we analyze the effect of different word embedding methods in representing Turkish texts, namely word2vec, fastText, and ELMo. Word embeddings are used for representing words in a high dimensional vector space such that similar words are placed nearby. This will help in different tasks, such as document classification, machine translation, and so on. We conduct experiments on Turkish corpora of different sizes using word2vec, fastText, and ELMo, and compare them with bag-of-words (BOW). Word2vec works at the word level; fastText works at the character (subword) level and the representation of a word is calculated by combining the representations of subwords. ELMo is context-dependent, that is, the representation of a vector depends on other words in the sentence, whereas word2vec and fastText are context-independent. Learned word embeddings are evaluated on noun and verb inflections, semantic analogy tests, as well as on topic classification of news documents. Our experiments indicate that fastText vectors are better on classification tasks. Word2vec vectors are more useful on semantic analogies.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Learning word representations with deep neural networks for Turkish
dc.title.alternative	Türkçe için derin sinir ağları ile sözcük gösteriminin öğrenilmesi
dc.type	masterThesis
dc.date.updated	2019-04-30
dc.contributor.department	Bilgisayar Mühendisliği Anabilim Dalı
dc.identifier.yokid	10231043
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	BOĞAZİÇİ ÜNİVERSİTESİ
dc.identifier.thesisid	539284
dc.description.pages	77
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10231043.pdf
Size:: 512.3Kb
Format:: PDF
Description:: File_10231043

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess