Information extraction and manipulation system for the web sources

Tatar, Serhan

dc.contributor.advisor	Eyler, M. Akif
dc.contributor.author	Tatar, Serhan
dc.date.accessioned	2020-12-10T09:25:22Z
dc.date.available	2020-12-10T09:25:22Z
dc.date.submitted	2002
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/230083
dc.description.abstract	ÖZET WEB KAYNAKLARINDAN BİLGİ SAĞLANMASI VE MANİPÜLASYONU İnternette başlıca gezinim tekniği bağlantıları kullanarak bilgiye ulaşma ve anahtar kelimeleri kullanarak aramalar yapmaktır. Fakat, internetin devasa büyüklüğü düşünüldüğünde bu tekniklerin yeterli olmadığı görülmektedir. Ayrıca, diğer bir problem de heterojen yapılarda bulunan bilginin belirli bir veri modeli ve biçimi içinde sunulabilmesidir. Bu tezde, WebXtractor isimli sistem anlatılmış ve geliştirilmiştir. Sistem temel olarak Web kaynaklarından bilgi elde edilmesini ve elde edilen bilginin rafine hale getirilmesini sağlamaktadır. Özellikle, internetten veri aktarma işlemlerinde oldukça etkili bir şekilde kullanılabilmektedir. WebXtractor' un sahip olduğu başlıca özellikler aşağıda sıralanmıştır:. Kaynakların internetten otomatik olarak getirilmesi ve ayrıştırılması. Kaynaklardan otomatik olarak kullanıcının belirttiği bilginin ayıklanması. Kaynakların ilişkilendirilmesi. Veri modeli tasarımı. Görsel araçlar sayesinde hızlı ve kolay uygulama geliştirme imkanı WebXtractor sistemi içerisinde, kullanıcının sistemi kolayca yapılandırabilmesi için 3 araç geliştirilmiştir. Tez içerisinde bu araçların nasıl kullanıldığı ve WebXtractor ile nasıl uygulama geliştirileceği konulan da detaylı bir şekilde anlatılmıştır. Ayrıca sistemin kullanımını anlatan örnek uygulamalar gerçeklenmiş ve gösterilmiştir. Bu uygulamalardan ilkinde, Web üzerinde bulunan çoklu bir veri iv ÎSSSSRBSTkaynağından elde edilen bilgi entegre hale getirilmiş ve kullanıcının istediği veri modeli ve biçimi içerisinde kullanıcıya sunulmuştur. İkinci uygulamada ise tek bir dokümandan oluşan kaynaktan elde edilen bilgi sadece biçim değişikliği yapılarak kullanıcıya sunulmuştur. Anahtar Kelimeler: World Wide Web, Web kaynaklan, bilgi ayıklama Ağustos, 2002 Serhan TATAR
dc.description.abstract	ABSTRACT INFORMATION EXTRACTION AND MANIPULATION SYSTEM FOR THE WEB SOURCES Clicking on links and using keyword search for links is the main navigation technique in the Internet. However, it seems that the method is not useful when we consider the enormous size of the Internet. Moreover, another important problem is presentation of the information, which is stored in heterogeneous structures, in a specified data model and format. In this thesis, WebXtractor system is described and developed. The system is used to extract information from the Web sources and refine the extracted information. Especially, when migrating data from the Web, the system can be used efficiently. Main features of WebXtractor include:. Automatic retrieval and parsing of the Web sources. Automatic information extraction. Source integration. Data model design. Easy and rapid application development facilities by the help of visual tools In WebXtractor system, three tools were developed for user to configure the system easily. In the thesis, the toolkit was analyzed in detail. In addition, application development in WebXtractor was explained. Sample applications that show the usability of the system were also implemented and shown. In the first example, data that is stored on a multiple-instance Web source was integrated and the integrated VIinformation was presented to user in user-specified data model and format. In the second example, data that is stored on a single-instance Web source was presented to user in user-specified format. Keywords: World Wide Web, Web sources, information extraction August, 2002 Serhan TATAR Vll	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Information extraction and manipulation system for the web sources
dc.title.alternative	Web kaynaklarından bilgi sağlanması ve manipülasyonu
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Diğer
dc.subject.ytm	WWW
dc.subject.ytm	Information sources
dc.subject.ytm	Internet
dc.identifier.yokid	129745
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	MARMARA ÜNİVERSİTESİ
dc.identifier.thesisid	126562
dc.description.pages	61
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_129745.pdf
Size:: 6.302Mb
Format:: PDF
Description:: File_129745

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/embargoedAccess