Automatic knowledge extraction for filling in biography forms from Turkish texts

Demirci, İlknur

View/Open

File_341246 (1.754Mb)

Date

2009

Author

Demirci, İlknur

Metadata

Show full item record

Abstract

Bu çalışma, Türkçe metinlerden biyografi formları oluşturmak için otomatik bilgi çıkarımı projesinin nasıl yapıldığını anlatmaktadır. Çalışmanın verimini ve sonuçların kalitesini arttırmak için, altı biyografi kategorisi seçilmistir. Bu kategoriler; Cumhurbaşkanları, Devlet adamları, Yazarlar, Şairler, Oyuncular ve Şarkıcılar olmak üzere okuyucular tarafından en sık incelenen biyografi türleridir.Yapılan incelemeler sonucu bu biyografilerde en çok vurgulanan altı tane alan belirlenmiştir. Bu alanlar; Doğum Tarihi. Ölüm tarihi, Eğitim,Tecrübe, Eserler ve Ödüller bilgilerini içermektedir.Belirtilen alanlar için düzenli ifadeler ile kurallar oluşturarak bilgi çıkarımı yapılmıştır. Bu kuralların herbiri belirlenmiş olan alanlar için özel olarak oluşturulup, kuralların Türkçe metinler üzerinde uygulanması ile herbir alan için bilgi çıkarımı yapılmıştır.Çıkarımı yapılan bilginin doğruluğunu ölçmek için özel bir test platformu oluşturulmuştur. Bu platformdan çıkan sonuçlara göre, otomatik biyografi formu oluşturma projesi, özellikle Türkçe ile oluşturulacak formlar için ileri seviyede geliştirilebilir ve gelecek vaadeden bir projedir.

This study represents the idea on building an automatic knowledge extraction for filling in biography forms from Turkish Texts. There are six biography categories, chosen to be analysed in this study: Presidents, Politicians, Authors, Poets, Actors, and Singers, which are found to be the most frequently read biography types by the users.Analyzing these biographies led to the observation that the most important emphasis is put on six particular fields; these fields are Date of Birth, Date of Death, Education, Experience, Contributions, and Rewards. Information for the fields to be filled is extracted by creating rules of regular expressions. The rules are tailored according to the structure of desired data blocks. Information is then extracted for each field by running these regular expression rules on Turkish texts.A separate testing platform is designed to evaluate the accuracy of extracted data. Results of the testing platform have shown this study to be a promising process to be further developed especially for Turkish language forms.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/616053

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess