An infrastructure model for collecting electronic data to develop large scale corpus

Kizilay, Fatma

dc.contributor.advisor	Çebi, Yalçın
dc.contributor.author	Kizilay, Fatma
dc.date.accessioned	2021-05-01T14:20:32Z
dc.date.available	2021-05-01T14:20:32Z
dc.date.submitted	2009
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/560123
dc.description.abstract	Dokuz Eylül Üniversitesi Bilgisayar Mühendisliği Bölümünde, Doğal Dil İşleme alanında farklı çalışmalar yürütülmektedir. Doğal Dil İşleme çalışmalarında dilin dilbilgisi kuralları belirlenmeli ve derlem olarak adlandırılan metin örnekleri hazırlanmalıdır. Bu örnekler dilin dilbilgisi kurallarını karşılamak zorundadır.Bu çalışmada, büyük ölçekli derlem için altyapı tasarlanmış ve gerçekleştirilmiştir. Gazete, rapor dergi, kitap, meclis tutanağı ve resmi gazete gibi 6 farklı doküman tipini destekleyen bir veri tabanı modeli tasarlanmıştır.Veri tabanı modeline bağlı olarak gerçekleştirilen uygulama ile 5 gazeteden 195256 makale indirilmiştir ve bu dokümanların üst verileri daha sonar yapılacak çalışmalar için depolanmıştır.
dc.description.abstract	In the Dokuz Eylül University Computer Engineering Department, different studies on Natural Language Processing (NLP) have been carried out. For NLP research grammatical rules of the language must be determined and a text sample of that language, which is called as corpus, must be prepared. These sample texts should satisfy the grammar rules of language.In this study, an infrastructure for a large scale corpus is designed and implemented. A database model, which supports 6 different document type such as newspaper, report, magazine, book, parliamentary report and official gazette, is designed.By implementing the developed application depending on the database model, 195256 articles were downloaded from 5 newspapers, and their metadata was stored for future use.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	An infrastructure model for collecting electronic data to develop large scale corpus
dc.title.alternative	Büyük ölçekli derlem geliştirmek amacıyla elektronik veri toplamak için bir altyapı modeli
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Mühendisliği Ana Bilim Dalı
dc.subject.ytm	Natural language processing
dc.identifier.yokid	354403
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	DOKUZ EYLÜL ÜNİVERSİTESİ
dc.identifier.thesisid	276572
dc.description.pages	94
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_354403.pdf
Size:: 3.750Mb
Format:: PDF
Description:: File_354403

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/embargoedAccess