Word sense disambiguation based on sense similarity and syntactic context

Mutlum, Başak

View/Open

File_198187 (735.1Kb)

Date

2005

Author

Mutlum, Başak

Metadata

Show full item record

Abstract

Yüksek Lisans Tez Özet FormuÖğrencinin Adı : Başak MutlumAna Bilim Dalı : Bilgisayar MühendisliğiTez Başlığı : Sözdizim ve Anlam Benzerliğine Dayalı Sözcük AnlamıBelirlemeÖzetSözcük Anlamı Belirleme, anlamı belirsiz bir sözcüğe bulunduğu bağlama göre anlambelirlenmesi işlemidir. Sözcük Anlamı Belirleme henüz çözümü bulunamamış bir problemdir.Diğer doğal dil işleme yöntemlerinin de gereksinimlerini karşılayabilmek için bu problemeetkili bir çözüm bulunmalıdır. Bugüne kadar olan sözcük anlamı belirleme çalışmalarındahem öğreticiyle öğrenme hem de öğreticisiz öğrenme algoritmaları denenmiş; öğreticiyleöğrenme yöntemlerinden daha başarılı sonuçlar elde edilmiştir. Fakat ilk anlam buluşsalı veöğreticiyle öğrenme yöntemleri doğal sınırlarına ulaştığından, öğreticisiz öğrenme yöntemleridaha ayrıntılı incelenmelidir.Bu tezde, anlam benzerliğine ve sözdizimine dayalı bir öğreticisiz öğrenmealgoritması kullanılmıştır. Bu algoritma, iki farklı sözcük benzer yerel bağlamlardakullanılırsa benzer anlamlara sahip olurlar mantığını kullanmaktadır. Eğitim evresinde, 100milyon sözcükten oluşan bir eğitim verisi ayrıştırılmış ve yerel bağlama dayalı özniteliklerbelli kurallar doğrultusunda özütlenmiştir. Anlamı belirsiz sözcükler ve bu sözcüklerle benzerbağlamda bulunan sözcükler arasındaki benzerlik değerleri hesaplanmıştır, bir enbüyütmealgoritması yardımıyla sözcüklerin anlamları bulunmuştur. Sistemin performansıSENSEVAL-2 ve SENSEVAL-3 verileri üzerinde denenmiş ve %59 başarı elde edilmiştir.Danışman: Deniz Yüret Tarih: 22.09.2005Enstitü Müdürü: Tarih:

M.S. Thesis Abstract FormName of the Student : Basak MutlumProgram of Study : Computer EngineeringThesis Title : Word Sense Disambiguation Based on SenseSimilarity and Syntactic ContextAbstractWord Sense Disambiguation (WSD) is the task of determining the meaning ofan ambiguous word within a given context. It is an open problem that has to besolved effectively in order to meet the needs of other natural languageprocessing tasks. Supervised and unsupervised algorithms have been triedthroughout the WSD research history. Up to now, supervised systems achievedthe best accuracies. However, these systems with the first sense heuristic havecome to a natural limit. In order to make improvement in WSD, benefits ofunsupervised systems should be examined.In this thesis, an unsupervised algorithm based on sense similarity andsyntactic context is presented. The algorithm relies on the intuition that twodifferent words are likely to have similar meanings if they occur in similar localcontexts. With the help of a principle-based broad coverage parser, a 100-million-word training corpus is parsed and local context features are extractedbased on some rules. Similarity values between the ambiguous word and thewords that occurred in a similar local context as the ambiguous word areevaluated. Based on a similarity maximization algorithm, polysemous words aredisambiguated. The performance of the algorithm is tested on SENSEVAL-2 andSENSEVAL-3 English all-words task data and an accuracy of 59% is obtained.Advisor: Deniz Yuret Date: 22.09.2005Director: Date:

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/171434

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess