Belge derlemlerinde sorgu terimlerinin frekans dağılımlarının analizi ve sorguya göre en uygun terim ağırlıklandırma modelinin seçimi

Arslan, Ahmet

dc.contributor.advisor	Dinçer, Bekir Taner
dc.contributor.author	Arslan, Ahmet
dc.date.accessioned	2021-05-06T12:42:34Z
dc.date.available	2021-05-06T12:42:34Z
dc.date.submitted	2016
dc.date.issued	2019-10-05
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/589163
dc.description.abstract	Bilgi erişimi için bir çok terim ağırlıklandırma modeli geliştirilmiştir. Fakat her terim ağırlıklandırma modelinin başarımı bazı sorgularda yüksek bazı sorgularda da düşüktür --- başarımın gürbüzlüğü problemi. Diğer taraftan bir terim ağırlıklandırma modelinin başarımının düşük olduğu bir sorgu için diğer terim ağırlıklandırma modellerinin başarımı da düşük olmak zorunda değildir: herhangi bir sorgu için tatminkar düzeyde başarım sağlayacak bir terim ağırlıklandırma modelini mevcut teknolojiler içinde bulmak mümkün olabilir. Yani sisteme gelen her sorguyu tek bir terim ağırlıklandırma modeli ile cevaplamak, kullanıcıların bilgi ihtiyaçlarını en tatminkar şekilde karşılamak için uygun olmayabilir. Tüm sorgular için tekil bir terim ağırlıklandırma modeli kullanmak yerine, her bir ayrı sorgu için uygun bir terim ağırlıklandırma modeli kullanıldığında bilgi erişim başarımının mertebe kertesinde artış olduğu deneysel bir gerçektir. Ancak, verilen herhangi bir sorgu için en iyi başarımı sağlayacak olan modelin, bugünkü bilinen en gelişkin modeller arasından otomatik olarak seçiminin yapılması işi halen çözülememiş zor bir araştırma konusudur. Bu uğraş, seçkili bilgi erişimi çalışma alanında, genel olarak, seçkili terim ağırlıklandırma ya da seçkili ağırlıklandırma fonksiyonu olarak adlandırılır. Bu doktora tezinde, seçkili terim ağırlıklandırma uğraşı için sorgu terimlerinin derlemler üzerindeki frekans dağılımlarına dayanan özgün bir istatiksel/olasılıksal yaklaşım incelenmiştir.Bir sorguda iyi çalışan terim ağırlıklandırma modeli başka bir sorguda iyi çalışmayabilmektedir. Verilen herhangi bir sorgunun en iyi çalışacağı terim ağırlıklandırma modelini önceden belirleyemiyoruz. Terim ağırlıklandırma modellerinin başarımı üzerine etki eden sorgu ve derlem karakteristikleri hakkında çok az bilgiye sahibiz. Bu doktora tezinde, söz konusu gizeme bir nebze olsun ışık tutmak amaçlanmaktadır.Bu tezde sunulan bütün deney sonuçlarını tekrarlamak ve yeniden üretmek için gerekli olan veri ve kod çevrimiçi olarak mevcuttur.
dc.description.abstract	Many term-weighting models have been proposed for information retrieval but the effectiveness of each term-weighting model varies across queries (i.e., information needs of users). Thus, using a single term-weighting model to process all kinds of queries may not be appropriate for fulfilling every information need of users. Instead of using a single term weighting model, it is an empirical fact that using different term weighting models for different queries could provide an increase in information retrieval effectiveness by an order of magnitude. However, for any given query, automatically selecting the term-weighting model that could provide the highest achievable retrieval effectiveness in the current state-of-the-art of information retrieval technology is still an open and challenging research problem. This issue is, in general, referred to as selective term weighting or selective weighting function or selective retrieval model in the field of selective information retrieval. In this PhD dissertation, we will investigate a novel statistical/probabilistic approach to the selective term weighting problem, based on the frequency distributions of query terms on document collections.A term-weighting model that works well for one query, may not work well for another. We are not capable of determining or justifying in advance the best term-weighting model to use with a given query. We know little of the characteristics of queries and document collections that affect the effectiveness of term-weighting models. This PhD dissertation aims to shed some light on this mystery by analyzing the frequency distributions of query terms on document collections.All the results presented in this dissertation are fully repeatable and reproducible with data and code available online.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Belge derlemlerinde sorgu terimlerinin frekans dağılımlarının analizi ve sorguya göre en uygun terim ağırlıklandırma modelinin seçimi
dc.title.alternative	Analysis of the frequency distributions of query terms on document collections & per-query selection of best term weighting model
dc.type	doctoralThesis
dc.date.updated	2019-10-05
dc.contributor.department	Bilgisayar Mühendisliği Ana Bilim Dalı
dc.subject.ytm	Information access
dc.subject.ytm	Text retrieval
dc.identifier.yokid	10127006
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	ANADOLU ÜNİVERSİTESİ
dc.identifier.thesisid	446758
dc.description.pages	137
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10127006.pdf
Size:: 2.129Mb
Format:: PDF
Description:: File_10127006

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess