Varlık isimlerinin bağlı veriler (lınked data) kullanılarak anlamlandırılması

Hakimov, Şerzod

View/Open

File_10019820 (3.583Mb)

Date

2013

Author

Hakimov, Şerzod

Metadata

Show full item record

Abstract

Doğal dilde yazılmış metinlerde geçen kişi, kuruluş, yer ve benzeri isimlerin belirlenmesi varlık isimlerinin tespit edilmesi (named entity recognition) olarak adlandırılır. Benzer veya aynı isme sahip farklı türde varlık isimlerinin olması belirsizlik oluşturur. Belirsizliğin çözülmesi için aday varlık isimlerinin içinden doğru olanı seçilmelidir yani anlamlandırılmalıdır. İnsanlar bu problemi çözerken okudukları metnin içeriğinden yararlanarak doğru kararı verirler. Yazılımlarla çözüm üretmek için metnin içeriğinin anlaşılması gerekir. Varlık isimlerinin anlamlandırılması (named entity disambiguation) için birçok araştırma yapılmış ve devam etmektedir. Son zamanlarda Wikipedia gibi ansiklopedi verilerinin kullanıldığı projelerin bu alan için elde ettikleri başarı oranı yüksektir. DBpedia, YAGO, Freebase gibi semantic veritabanlarının bu alan için katkı sağlayacağı üzerine araştırmalar yürütülmektedir. Bu tezde kapsamında, varlık ismi anlamlandırılması için bağlı veri (linked data) ve çizge merkezlilik algoritması kullanarak geliştirdiğimiz bir teknik sunulmaktadır. Geliştirdiğimiz sistemin adı NERSO?dur. Bu sistem açık veri kümeleri ile test edilerek değerlendirilmiş, ayrıca diğer varlık ismi tespiti ve anlamlandırılması projeleri ile karşılaştırılmış ve sonuçlar bu tezde sunulmuştur. Elde ettiğimiz sonuçlara göre NERSO diğer araçlara benzer ve daha iyi sonuçlar vermektedir.

Named entity recognition is the task of identifying named entities such as persons, organizations and locations in natural language texts. One of the problems in named entity recognition is that different entities having the same or similar names, that is the ambiguity problem. This problem requires a decision on choosing the right entity among a number of possible entities with the same or similar names. Human readers do this naturally when reading a text, because they are aware of the context. But, for the software to do this, a ?named entity disambiguation? task needs to be executed to decide about the right entities. There are many techniques developed for named entity recognition and disambiguation tasks in the literature. A recent approach in these tasks is using open knowledge bases such as Wikipedia, and more recently the structured linked data counterparts that are derived from Wikipedia and similar knowledge bases, such as DBpedia, YAGO, and Freebase. In this thesis, we present a named entity disambiguation technique that is based on using Linked Open Data and a graph centrality algorithm for disambiguation. The system we developed is called NERSO. We evaluated our system using publicly available data sets, and compare its performance with other named entity recognition and disambiguation (or annotation) tools, and present the results in this thesis. Our results show that NERSO performs either similar or better than the other tools.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/683343

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess