Discovery of frequent item set in peer-to-peer networks

Çem, Emrah

View/Open

File_383607 (814.4Kb)

Date

2010

Author

Çem, Emrah

Metadata

Show full item record

Abstract

Bir çok görevdeş ağ uygulaması sistemdeki düğümlerde kısmi olarak bulunan veri erişim sıklığı, öğe sıklığı, sorgu ve olay sayısı gibi sistem genelindeki bilgiye ihtiyaç duyarlar. Dağıtık bir sistemde sık bulunan öğelerin belirlenmesi problemi sistem genelindeki bir bilginin hesaplanmasını gerektiren yaygın bir problemdir. Sistem genelindeki sıklığı belirli bir eşik değerinin üstünde bulunan öğelere sık bulunan veya yaygın öğeler denir. Günümüz P2P ağlarında, sistem genelindeki sık öğelerin bilgisine ihtiyaç duyan uygulamaların sayısı hızla artmaktadır. Bu yüzden, sık bulunan öğelerin verimli bir şekilde tespit edilmesi, bir çok görevdeş ağ uygulamalarında, düğümler için değerli bir servistir. Ayrıca, bu servis dağıtık veri tabanı uygulamalarında, önbellek yönetiminde, veri yineleme yöntemlerinde, algılayıcı ağlarda ve güvenlik mekanizmalarının önemli olduğu sistemlerde de kullanılabilir.Bu tezde, ProFID olarak adlandırdığımız, yapılandırılmamış görevdeş ağlarda sık bulu-nan öğelerin belirlenmesi için epidemik tabanlı yöntemi kullanan dağıtık bir yöntem sunulmuştur. Ortalama yönteminin epidemik tabanlı yöntem ile birlikte sık bulunan öğelerin tespit edilmesinde kullanılması ve pratik yakınsama yöntemi önerdiğimiz yöntemin yenilikçi ve yararlı özellikleridir. Bu konudaki çalışmalara katkılarımızı şu şekilde sıralayabiliriz. İlk olarak, sık bulunan öğe setinin bütün düğümlerde hesaplandığı tam dağıtık bir yöntem önerisi sunulmuştur. Bu yöntem yapılandırılmamış ağlarda ikili gruplar halinde ortalama yöntemi ile ağ boyutu tahminini kullanarak sık öğeleri tespit eden ilk yöntemdir. İkinci olarak, algoritmanın sonlandırılması için pratik yakınsama yöntemi sunulmuştur. Bu yöntem ile düğümler yerel durumlarındaki değişimleri değerlendirerek, diğer düğümlerden bağımsız olarak yakınsama kararı alırlar. Ayrıca, PeerSim simülatörü ile elde edilen kapsamlı sonuçlar kullanılarak önerilen yöntemin verimliliği, ölçeklenebilirliği ve uygulanabilirliği ölçülmüştür. Son olarak, ProFID ile uyarlanmış Push-Sum yöntemleri karşılaştırılmıştır. Bu karşılaştırma sonuçlarında, ProFID hem doğruluk hem de ölçeklenebilirlik açısındanuyarlanmış Push-Sum yöntemine göre daha iyi sonuçlar vermiştir.

Several P2P applications require a global view of system information such as data access frequencies, item frequencies, query and event counts, that are available locally and partially at peers. Frequent item set discovery (FID) in a distributed environment is a common problem requiring global information computation. Items that globally occur more than a threshold value are referred as frequent or popular and the number of diverse applications that need globally frequent items is increasing expeditiously in today's P2P networks. Therefore, efficiently discovering frequent items would be a valuable service for peers. Being significant for P2P systems, FID problem is also applicable to distributed database applications, cache management, data replication, sensor networks, and security mechanisms in which identifying frequently occurring items in the entire system is useful.In this thesis, we propose and develop a gossip-based distributed approach, namely ProFID, for discovering frequent items in unstructured P2P networks. In contrast to the prior studies, our solution progresses in a fully distributed manner using an atomic averaging function to discover frequent items. Utilizing averaging function with gossip-based aggregation in frequent item set discovery problem and a practical convergence rule are novel and beneficial features of our approach. We make the following contributions to the current state of the art. First, we propose a fully distributed Protocol for Frequent Item Discovery (ProFID) where the result is produced at every peer. ProFID uses a novel pairwise averaging function and network size estimation together to discover frequent items in an unstructured P2P network. We also propose a practical rule for convergence of the algorithm. In contrast to previous works, each peer gives local decision for convergence based on the change of updated local state. Moreover, we developed a model of ProFID in PeerSim and performed various experiments to compare and evaluate its efficiency, scalability, applicability. Finally, we compared the accuracy and scalability of ProFID with adaptive Push-Sum algorithm. The comparison results show that ProFID outperforms adaptive Push-sum in terms of accuracy, convergence speed and message overhead.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/170060

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess