Bağlı veri üzerinde dağıtık sorgulama optimizasyonu

Özkan, Ethem Cem

dc.contributor.advisor	Doğdu, Erdoğan
dc.contributor.author	Özkan, Ethem Cem
dc.date.accessioned	2021-05-08T11:21:49Z
dc.date.available	2021-05-08T11:21:49Z
dc.date.submitted	2015
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/683166
dc.description.abstract	SPARQL anlamsal ağın (semantik web) standart sorgulama dilidir ve büyük anlamsal ağ veri kaynakları olan `bağlı veri` kaynaklarını sorgulamada kullanılmaktadır. SPARQL dağıtık sorgular yazılarak, dağıtık bağlı veri kaynaklarını sorgulamak içinde kullanılır. Bu işlemde sorgu veya alt sorguları farklı veri kaynaklarında çalıştırılır ve sonuçlar sorgunun sonucu olarak birleştirilir.Bu tezde, `biricik yüklem veri kaynağı eleme` (unique predicate source pruning) (UPSP) adlı dağıtık SPARQL sorgusunda veri kaynağı seçen bir algoritma önerisi öneriyoruz. Algoritmanın amacı dağıtık SPARQL sorgusu çalıştırılmadan önce ilgili bağlı veri kaynaklarını bulmaktır. Bu sayede sorgu tüm veri kaynaklarına gönderilmek yerine, sorgu ile alakalı veri bulunduran dolayısı ile sorguya katkı sağlayabilecek veri kaynaklarına gönderilebilecektir. Önerdiğimiz algoritma, öncelikle sorgudaki yıldız, yol, alıcı ve hibrit adı verilen alt sorgu tiplerini eşleştirmektedir. Daha sonra sorgudaki tüm düğümler için özne-özne, özne-nesne, nesne-özne, nesne-nesne adı verilen uygun biricik yüklem tiplerini kontrol etmektedir. Eğer algoritma uygun biricik yüklem tipi ve alt sorgu tiplerini bulursa harici veri kaynaklarını elemektedir.UPSP algoritması, önceden çevrim dışı oluşturulmuş dizin yapısı kullanmaktadır. Bu dizin yapısı bu alanda daha önce yapılmış olan Hibiscus çalışması ile uyumlu olacak şekilde tasarlanmıştır. Hibiscus dizin yapısına her biricik yüklem tipi için bir tane olmak üzere dört adet isteğe bağlı alan eklenmiştir.UPSP algoritması, açık kaynak dağıtık sorgulama motoru olan Hibiscus üzerine gerçekleştirilmiştir. Algoritma, Hibiscus veri kaynağı eleme algoritmasından hemen önce çalışmaktadır.Algoritmanın performansı, FedBench test aracı kullanılarak orijinal Hibiscus veri kaynağı eleme yöntemi ile karşılaştırıldı. Sonuçlar algoritmanın veri kaynağı seçimini bazı durumlarda %20'ye kadar iyileştirdiğini göstermektedir.
dc.description.abstract	SPARQL is the standard query language of the semantic Web and it is used to query linked data sources which are big semantic Web data sources. SPARQL can also be used to query `distributed` linked data sources by writing federated SPARQL queries in which case query or its sub queries are executed in separate sites and the results are combined and returned as the result of the query. In this thesis, we propose a new algorithm called `unique predicate source pruning` (UPSP) that reduces the federated SPARQL query execution time. The idea behind the algorithm is to find all relevant distributed linked data sources before executing federated SPARQL queries. This way the query is not sent to all data sources but only to the linked data sources that have data relevant to the query and therefore might return results. UPSP algorithm checks the sub query patterns in the query being processed first, looks for `star`, `path`, `hybrid`, `sink` patterns. For each node UPSS algorithm checks appropriate unique predicate types which are subject-subject, subject-object, object-subject and object-object. If UPSP algorithm finds appropriate unique predicate type for query pattern it prunes all external sources.UPSP algorithm uses an index structure that is built offline before the algorithm executes. UPSP algorithm index structure is designed to be compatible with Hibiscus index that was proposed in the literature before. UPSP algorithm index has four more optional fields which are for each unique predicate types.We implemented UPSP algorithm on Hibiscus federated query engine which is an open source federated SPARQL query engine. UPSS algorithm executes just before Hibiscus pruning algorithm. We evaluated UPSP using FedBench benchmark. We compared the performance of the algorithm against standard Hibiscus source selection. The results show that algorithm improves source pruning up to 20% in some cases.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Bağlı veri üzerinde dağıtık sorgulama optimizasyonu
dc.title.alternative	Federated query optimization on linked data
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Mühendisliği Ana Bilim Dalı
dc.identifier.yokid	10073578
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	TOBB EKONOMİ VE TEKNOLOJİ ÜNİVERSİTESİ
dc.identifier.thesisid	387568
dc.description.pages	108
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10073578.pdf
Size:: 2.902Mb
Format:: PDF
Description:: File_10073578

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess