A semi-automated text classification and code organization system for academic papers

Öztürk, Alican

View/Open

File_10096887 (1.409Mb)

Date

2015

Author

Öztürk, Alican

Metadata

Show full item record

Abstract

Bu tezde, yerel olarak girilmiş 'kodlar'ı (belgedeki anahtar kelimeler) kullanarak, belgeye, kullanıcıya ait bir başlığın atanması için WordNet'in bağlantılarının (synsetlerini ve hypernymlerini) kullanılması amaçlanmıştır. WordNet veritabanı; kelimelerin anlamlarını içermesinin yanı sıra, bu kelime ile alakalı olan alt kelimeleri, kapsayıcı kelimeleri, eş anlamlı sözleri, eşsesli sözleri ve meronimleri içeren zekice bir araya getirilmiş bir sözlüktür. Bütün bu kelimeler birbirine bir ağ yapısı aracılığı ile, aralarında yukarıda belirtilmiş ilişkiler ile bağlıdır. Bir 'kod' kümesinin içindeki kelimelerin, ikililer halinde WordNet üzerinde aralarındaki mesafeyi ölçerek ve buradan yüksek değer olarak sınıflandırılanların da kapsayıcı kelimelerini zenginleştirme amaçlı kullanarak, sonuçta bütün dokümanın konusunu kapsayabilecek potansiyel başlık olabilen anahtar kelimeler elde edilebilmektedir. Sisteme girilen kodlar kişinin tercihleri ve belgeye bakış açısına göre değişmektedir, bu nedenle aynı belgeden elde edilen iki sonucun birbirinden tamamen farklı olması mümkündür. Bunun amacı, genel bir başlık sunmak yerine, başlığı kullanıcının ilgilendiği konuya göre kişiselleştirmektir.Bu projede kelimeler arası benzerliği bulmak için JWS ve kelimelerin anlamlarının seçimi, hypernymlerin elde edilmesi için RitaCore'dan Rita WordNet Java kütüphaneleri kullanılmıştır.

In this thesis, the aim is to use the locally entered `codes` (keywords in the document) to determine what the users' associated topic with that document corresponds to via WordNet's connections, synsets and hypernyms. WordNet has a neatly arranged structure that not only includes meaning for each sense of the word but also all the other words associated with it, in forms of hyponyms, hypernyms, synonyms, holonyms and meronyms. All of these words are connected in a network structure with appropriate links in between. By using the distance between the words to calculate the similarities between each pair of words inside a code cluster and enriching them with the hypernyms of high value nodes, it is possible to obtain a list of possible words that can be associated as topic keywords for the document itself. Since the codes entered into the system differ by the users' preferences and point of view on the document, it is highly possible for two instances to have completely different topics derived from the same document. The purpose of this is to personalize the topic according to the users' interest in the document instead of the presenting a generic topic about it.The project uses the Java library JWS to find the similarity between words and RitaWordNet from RitaCore to extract meanings and hypernyms of the words to select proper senses.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/698648

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess