Compositional representations of language structures in multilingual joint-vector space

Dalaman, Şaban

dc.contributor.advisor	Arslan, Barış
dc.contributor.author	Dalaman, Şaban
dc.date.accessioned	2021-05-08T07:33:17Z
dc.date.available	2021-05-08T07:33:17Z
dc.date.submitted	2018
dc.date.issued	2018-12-10
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/631442
dc.description.abstract	Son dönemdeki yapay sinir ağları ve derin öğrenme tekniklerinde ki gelişmelerle beraber,temsili öğrenme pek çok araştırmanın odak noktasında yer almaya başladı. Doğal dilişleme(DDÍ) alanında, temsili öğrenme tekniklerinin uygulamasında ve diğer metodlaragöre DDÍ problemlerinin çözümünde ilerleme sağlamıştır. Bu alandaki ana araıstırmakonularından biri, dil yapılarının ortak çok dilli uzayda birleşimsel temsillerini oluşturmaktır.Bu çalışmanın hedefi derin öğrenme ve DDÍ mede kullanılan bazıtekniklerinbirleştirilerek temsillerin DDÍ uygulamalarındaki etkisini araştırmaktır.Bu amaçla 4 değişik birleşim vektör modeli üzerinde çalışılmıştır. Token yada morphemegibi dil yapılarının temsil uzaylarının oluşturulması için ilk olarak token yada morfolojikayrıştırma ile paralel korpus hazırlanmış sonra değişik hiyerarşik birleşim metodları ikilidilmodelleri üzerinden kullanılmıştır. Íkili-dil modelleri 4 dil için hazırlanan cümlesıralı korpuslar kullanılarak eğitilmiştir. Bu sayede model, birleşimsel vectör modelinikullanarak cümle elemanlarının temsillerini oluşturmayı öğrenmektedir.Degişik birleşimsel vektör metodlarını değerlendirmek için iki test senaryosu kullanılmıştır.Ílki açımlama testidir. Bu senaryoda ikili model, birleşimsel vektör modelini kullanarakegitilir. Sonra paralel korpusdan iki dil için seçilen karşılıklı cümle çiftlerinin karşılaştırılmaları ile performansları hesaplanır.Diğer test senaryosu ise gözetimli döküman sınıflama testidir. Bir dilden seçilen dökümanlarkullanılarak eğitilen sınıflandırıcı diğer bir dilden seçilen test dökümanları ile testedilir. Dökümanlar değişik konu başlıkları için pozitif ve negatif olarak işaretlenmiştir.Sınıflandırıcı pozitif ve negatif örnekleri ayırmayı ögrenmektedir.
dc.description.abstract	After the recent developments in Artificial Neural Networks and deep learning techniques,representation learning has become the focus of many research interests. In the field ofNatural Language Processing, representation learning techniques have gained many implementationadvances and improved different tasks compared to any other methods.One of the primary research topics in this area is to construct compositional representationsof discrete language structures in multilingual joint-vector space. In this thesisstudy, several techniques from deep learning and NLP are combined to investigate theirpotential impact on NLP tasks.For this purpose, 4 different composition vector models (CVM) by using tokens andmorphemes as basic language structures are studied. To construct the embedding space oflanguage structures such as tokens and morphemes, first, a parallel corpus is preparedby segmenting into discrete objects via tokenization and morphological analysis. Severalhierarchical composition methods via the bilingual method are employed to construct theembeddings of these structures. Bilingual models are trained by using sentence-alignedcorpora for 4 languages. The models learn how to employ compositional vector modelsand construct embeddings of sentence constituents as well.Two different test scenarios are performed to evaluate different CVMs. The first one is the paraphrase test. In this case, the bilingual models using CVMs are trained with eachlanguage pair L1-L2 ( English, Turkish, German and French) parallel corpus. Then themodels are tested by evaluating their performance in finding the corresponding pairscorrectly from 100 randomly selected sentences from each L1-L2 pair.The other test scenario is Cross-lingual document classification. In this case, the trainedmodels are employed by a document classifier model to evaluate their performance inclassification task by first training in L1 documents and testing with L2 documents	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Compositional representations of language structures in multilingual joint-vector space
dc.title.alternative	Çok dilli eklem-vektör uzayda dil yapılarının bileşim temsili
dc.type	masterThesis
dc.date.updated	2018-12-10
dc.contributor.department	Bilgisayar Mühendisliği Ana Bilim Dalı
dc.subject.ytm	Natural language processing
dc.identifier.yokid	10199389
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	İSTANBUL ŞEHİR ÜNİVERSİTESİ
dc.identifier.thesisid	522506
dc.description.pages	50
dc.publisher.discipline	Bilgisayar Mühendisliği Bilim Dalı

Files in this item

Name:: yokAcikBilim_10199389.pdf
Size:: 1.292Mb
Format:: PDF
Description:: File_10199389

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess