Yoğunluk tabanlı kümeleme metodları kullanılarak paralel veri madenciliği gerçekleştirilmesi

Sever, Süleyman Zafer

dc.contributor.advisor	Bilgin, Turgay Tugay
dc.contributor.author	Sever, Süleyman Zafer
dc.date.accessioned	2021-05-08T09:05:10Z
dc.date.available	2021-05-08T09:05:10Z
dc.date.submitted	2010
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/662475
dc.description.abstract	Bu tez çalışmasında DBSCAN ve OPTICS algoritmaları, yazarlarının makalelerinde ortaya koyduğu sözde kodlar temel alınarak kodlanmıştır. Tek bilgisayar üzerinde gerçekleştirilen çalışmalarda komşuluk sorgularının çok zaman aldığı görülmüş ve bu sürenin kısaltılması için R*-Tree veri yapısı kullanılmıştır. DBSCAN algoritmasının paralelleştirilmesi için LAM/MPI kütüphanesi kullanılmıştır. DBSCAN uygulamasının en çok zaman harcayan kısmı olan komşuluk sorguları, LAM/MPI yardımı ile tüm bilgisayarlara eşit şekilde paylaştırılarak yapılmıştır. 3 farklı veri seti ile gerçekleştirilen testlerde DBSCAN algoritmasının paralelleştirmeye elverişli olduğu ve paralel çalışan DBSCAN'in Amdahl Kanunu'na uygun olarak çalışma süresinin kısaldığı, bununla birlikte küme oluşturma performansının ve kalitesinin etkilenmediği görülmüştür.Toplam 6 bölümden oluşan tezin birinci bölümünde genel kavramlardan bahsedilmiştir. İkinci bölümde veri madenciliğinin genel tanımı, uygulama alanları, veri madenciliği süreci ve veri madenciliği tekniklerinden bahsedilmiştir. Üçüncü bölümde kümeleme analizinin türlerinden, yoğunluk tabanlı kümeleme yöntemleri ağırlıklı olmak üzere bahsedilmiştir. Dördüncü bölümde paralel hesaplamanın amacı, paralel bilgisayar bellek mimarileri, paralel programlama modelleri ve paralel program tasarımından bahsedilmiştir. Beşinci bölümde paralel DBSCAN uygulamasının geliştirilme amacı, geliştirme ortamı, kullanılan araçlar, kullanılan veri setleri ve uygulamanın geliştirme adımlarından bahsedilmiştir. Altıncı ve son bölümde deneysel sonuçlar tablolar ve grafiklerle verilmiş ve elde edilen sonuçlar irdelenmiştir. Ayrıca bu konuda çalışma yapacak araştırmacılar için öneriler sunulmuştur.
dc.description.abstract	In this master thesis, DBSCAN algorithm and OPTICS algorithm have been coded by taking the pseudo-codes; that the writers set forth in their articles, as the basis. It has been noticed that neighborhood queries take too long time on the works carried out on a single computer and R*-Tree data structure is used in order to shorten this period. LAM/MPI library has been used to parallelize DBSCAN algorithm. Neighborhood queries are the part that spends most of the runtime of the DBSCAN application, and this has been performed by equally distributing to all the computers by the help of LAM/MPI. It has been evaluated in the tests; which had been implemented by 3 different data sets, that DBSCAN algorithm is suitable for parallelization and the runtime period of DBSCAN that works parallelly is shortened in accordance with Amdahl Principle.The general concept has been mentioned in the first section of the thesis that consists of 6 chapters. In the second chapter, general definition of data mining, its application areas, data mining process and techniques of data mining have been explained. Third chapter mainly encloses density based clustering methods that is one of the types of clustering analysis. Fourth chapter includes the objective of parallel programming, parallel computer memory architecture, parallel programming models and parallel program design. Fifth chapter consist of the development objective of parallel DBSCAN application, development environment, the tools used, the data sets and the development steps of the application. In the sixth and the last chapter, experimental results have been given with tables and graphics and attained results have been examined. Besides, suggestions have been presented for those who want to make research on this subject.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Yoğunluk tabanlı kümeleme metodları kullanılarak paralel veri madenciliği gerçekleştirilmesi
dc.title.alternative	Parallel data mining by using density based clustering methods
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Mühendisliği Ana Bilim Dalı
dc.subject.ytm	Cluster analysis
dc.subject.ytm	Data mining
dc.subject.ytm	Parallel computing
dc.identifier.yokid	369357
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	MALTEPE ÜNİVERSİTESİ
dc.identifier.thesisid	266135
dc.description.pages	100
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_369357.pdf
Size:: 1.303Mb
Format:: PDF
Description:: File_369357

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess