New cluster ensemble algorithm with automatic cluster number and new pruning technique for fast detection of neighbors on binary data

Akşehirli, Mehmet Emin

dc.contributor.advisor	Mimaroğlu, Selim Necdet
dc.contributor.author	Akşehirli, Mehmet Emin
dc.date.accessioned	2021-05-01T07:15:21Z
dc.date.available	2021-05-01T07:15:21Z
dc.date.submitted	2011
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/550629
dc.description.abstract	Kümeleme, birbirine benzeyen gerçek ya da soyut nesnelerin denetimsiz bir biçimde bir araya gruplanmasıdır. Küme analizi ya da kümeleme, veri analizi için çok önemli bir araçtır ve veri madenciliği, makina öğrenmesi, bioinformatik ve sosyal ağ analizi de dahil olmak üzere neredeyse bütün bilimsel alanlarda sıklıkla kullanılır. Kümelemenin denetimsiz doğası özgün fırsatlara ve sorunlara neden olur. Doğru kümeleme algoritmasını veriye uyacak parametreler ile uygulamak kolay değildir. Dahası, sağlanan etiketleri kullanan sınıflama algoritmalarının aksine kümeleme algoritmaları bilgiyi verinin kendisinden çıkarttığı için çoğu algoritmanın çalışması uzun sürer.Çoklu kümelemeleri birleştiren metodlar yalnızca algoritma ve parametre seçimini kolaylaştıran değil aynı zamanda bazı özgün kümeleme sorunlarını da çözen, umut vadeden çözümler olarak belirmiştir. Bu tezde daha iyi bir kümeleme elde etmek için eldeki çoklu kümelemeleri birleştiren metodları ve bunlardan biri olan DiCLENS'i gösteriyoruz. DiCLENS hiç bir argüman almadan çalışır ve nesnel ölçümler kullanarak kümelerin sayısını otomatik olarak bulur. Nesneler arasında eş-atamaların bulunması fazla hesaplama gerektirse de, eş-atamalar alandaki en güçlü benzerliklerden biridir. DiCLENS benzerlikleri etkin bir biçimde hesaplamak için yeni bir metod kullanmaktadır. Deneylerimiz DiCLENS'in neredeyse bütün senaryolarda daha iyi bir sonuç kümelemesi ürettiğini göstermiştir. Dahası diğer metodlar ile karşılaştırıldığında DiCLENS'in çalışma zamanı oldukça iyidir.Aynı zamanda, ikili veri ve Hamming uzaklığı üzerinde bir budama yöntemi kullanarak DBSCAN kümeleme algoritmasının çalışma hızı performasını artıran DBSCAN_BV'yi de gösteriyoruz. DBSCAN oldukça iyi bilinen bir yoğunluk temelli kümeleme algoritmasıdır. Uzam dizinleme teknikleri DBSCAN ile birlikte yaygın olarak kullanılsa da, bu teknikler kategorik ve ikili veri setlerinde düşük performans gösterirler. Yoğun testler, kümeleme doğruluğu aynı kalmakla birlikte DBSCAN_BV'nin DBSCAN'den 40 kata kadar daha hızlı çalıştığını göstermiştir. Testler aynı zamanda yeni budama metodunun DBSCAN'in kaynağı sınırlı olan ortamlarda da kullanımının yolunu açtığını göstermektedir.
dc.description.abstract	Cluster analysis is to group similar, real or abstract data objects together in an unsupervised way. Cluster analysis, or clustering is a very important tool for data analysis and widely-used in almost every scientific field including data mining, machine learning, bioinformatics, and social network analysis. Unsupervised nature of clustering comes with unique opportunities and challenges. Applying the optimum clustering algorithm with correct parameters is not straight forward. Moreover, unlike classification algorithms which use the provided labels, clustering algorithms extract the information from the data itself, therefore most of the algorithms suffer from long execution times.Combining multiple clusterings methods emerge as a promising solution that not only ease the algorithm and parameter selection for cluster analysis but also solve some unique clustering problems. In this theses we discuss the methods that combine multiple clusterings to obtain a better overall clustering of the data, including a recent method: DiCLENS. DiCLENS does not take any input arguments and finds the number of clusters automatically using objective measures. Although finding the co-associations between objects is a computationally expensive task, it is one of the strongest similarities in the field. DiCLENS utilizes a recent method to compute the similarities in an efficient way. Our experiments show that DiCLENS produces a better final clustering at almost all of the scenarios. Moreover execution time of the DiCLENS is very good compared to other methods.We also discuss DBSCAN_BV, a novel method that improves the execution time performance of DBSCAN clustering algorithm by utilizing a pruning method on binary data and Hamming distance. DBSCAN is a well-known density-based algorithm. Even though space indexing techniques are widely used with DBSCAN, they do not perform well on categorical and binary data sets. Extensive tests show that DBSCAN_BV works up to 40 times faster than DBSCAN while keeping the same clustering accuracy. Tests also show that the new pruning method allows the application of DBSCAN to resource limited environments.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	New cluster ensemble algorithm with automatic cluster number and new pruning technique for fast detection of neighbors on binary data
dc.title.alternative	Küme sayısını otomatik bulan bir kümelenme birleştirme algoritması ve ikili veride komşuların hızlı bulunması için yeni budama yöntemi
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Mühendisliği Ana Bilim Dalı
dc.subject.ytm	Cluster method
dc.subject.ytm	Cluster analysis
dc.subject.ytm	Cluster technics
dc.subject.ytm	Clustering
dc.identifier.yokid	409691
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	BAHÇEŞEHİR ÜNİVERSİTESİ
dc.identifier.thesisid	292798
dc.description.pages	88
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_409691.pdf
Size:: 39.68Mb
Format:: PDF
Description:: File_409691

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess