Identification of multiword expressions in Turkish based on web data

Aka Uymaz, Hande

dc.contributor.advisor	Kumova Metin, Senem
dc.contributor.author	Aka Uymaz, Hande
dc.date.accessioned	2021-05-08T07:52:36Z
dc.date.available	2021-05-08T07:52:36Z
dc.date.submitted	2016
dc.date.issued	2019-01-10
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/635781
dc.description.abstract	Çok sözcüklü ifade, doğal dillerde, sözcüklerin anlam bütünlüğü oluşturmaküzere tekrarlayan kombinasyonlarıdır. Metinlerden çok sözcüklü ifadelerin belirlenmesi bir çok doğal dil işleme uygulamaları ( Doğal dil üretme, hesaplamalı sözlükbilim, makine çevirileri vb.) için çok önemli bir konudur. çoksözcüklü ifadelerin belirlenmesi için gözlenme sıklığı bağımlı yöntemler ( Bileşikolasılık (joint probability), noktasal karşılıklı bilgi katsayısı (pointwise mutualinformation), karşılıklı bağlılık (mutual dependency) v.b) sıklıkla kullanılır. Buyöntemlerin en büyük dezavantajı, çok sözcüklü ifadelerin belirlenmesinin performansının frekansın ölçüldüğü veri kaynağının büyüklüğüne bağlı olmasıdır. Butezin amacı, küçük veri setlerinin yarattığı problemlerin önüne geçmek için bilinenen büyük veri kaynağı olan web'i kullanarak gözlenme sıklığını elde etmektir.Bu tezde, 2 farklı aday veri seti kullanılarak, Türkçe dili için frekans tabanlıçok sözcüklü ifade belirleme metotlarının performansı araştırılmıştır. Veri setlerindeki adayların gözlenme sıklığı bilgisi popüler bir arama motoru olan Googlekullanılarak elde edilmiştir. Aday çok sözcüklü ifadelerin arama motoruna sorguolarak gönderildiğinde alınan sayfa sayısı (ing. page count) adayın gözlenmesıklığı olarak kabul edilmiştir. Kullanılan 20 yöntemin başarısı anma(recall), duyarlılık(precision) ve F-ölçütü (F-measure) ile değerlendirilmiştir. Web tabanlıfrekans bilgisinin çok sözcüklü ifadelerin belirlenmesindeki performansı geleneksel derlem tabanlı frekans ile karşılaştırılmıştır ve çok sözcüklü ifadelerin belirlenmesinde web verilerinin kullanılması umut verici sonuçlar göstermiştir.Anahtar Kelimeler : çok sözcüklü ifade, sıklık tabanlı yöntemler, web verisi.
dc.description.abstract	Multiword expressions (MWEs) are recurrent combinations of words in naturallanguages. The extraction of MWEs in a text is signicant for a number ofnatural language processing applications (e.g. natural language generation, computationallexicography, machine translation etc.). There are various occurrencefrequency based methods (e.g. joint probability, pointwise mutual informationand mutual dependency) that are used frequently for MWE extraction ([12],[13]).The major disadvantage of these methods is that extraction performance dependsmainly on the size of the data set in which the occurrence frequency is measured.The main goal of this thesis is obtaining the frequency from a massive data source,the World Wide Web, in order to by-pass the negative eect of small data set.In this thesis, we applied frequency based MWE extraction methods on twoTurkish MWE data sets. The occurrence frequencies of MWE candidates in datasets are obtained from popular search engine Google. The retrieved page countswhen the candidates are sent as queries to Google are employed as the occurrencefrequencies. The evaluation of the 20 frequency based methods is performed byprecision, recall and F-measures. The performance of web-based frequencies inidentication of MWEs is compared to the traditional corpus based frequenciesand it is showed that the use of web data in identication of MWEs revealspromising results.Keywords: Multiword expression, frequency based methods, web data.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Identification of multiword expressions in Turkish based on web data
dc.title.alternative	Web verisi kullanılarak Türkçe çok sözcüklü ifadelerin belirlenmesi
dc.type	masterThesis
dc.date.updated	2019-01-10
dc.contributor.department	Bilgisayar Mühendisliği Ana Bilim Dalı
dc.subject.ytm	Computerized linguistics
dc.subject.ytm	Text linguistics
dc.subject.ytm	Corpus linguistics
dc.subject.ytm	Natural language
dc.identifier.yokid	10119360
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	İZMİR EKONOMİ ÜNİVERSİTESİ
dc.identifier.thesisid	434360
dc.description.pages	55
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10119360.pdf
Size:: 3.472Mb
Format:: PDF
Description:: File_10119360

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess