Application of mean gain ratio (MGR) model for the clustering of electrical generator failures

Al-Saadi, Saddam Raheem Salih

View/Open

File_10154110 (3.321Mb)

Date

2017

Author

Al-Saadi, Saddam Raheem Salih

Metadata

Show full item record

Abstract

Kategorik veri kümeleme veri madenciliğinin gittikçe önem kazanan bir parçası haline gelmektedir. Bu çalışmada, dört gerçek yaşam veri tabanının kümelenmesi için VPRS, MTMDP, ITDR ve MGR olmak üzere dört farklı veri kümeleme yöntemi karşılaştırılmıştır. VPRS, MTMDP ve ITDR algoritmaları Kaba Küme Teorisine dayanmakta iken, MGR algoritması Bilgi Teorisi dayanmaktadır. Veri tabanlarından üçü UCI veri tabanlarından kullanılmış, diğer veri tabanı ise elektrik jeneratörleri arızaları için Irak'taki bir mobil şirketten toplanmıştır. Ortaya çıkan kümeler için, veri tabanı sınıfları ve veri tabanlarının işlenmesinde her bir algoritma tarafından kullanılan süre bakımından saflık ve F-ölçümü hesaplanarak yöntemlerin performansını değerlendirmek için üç performans ölçümü kullanılmıştır. Karşılaştırma sonuçları MGR 'nin diğer algoritmalara karşı bir üstünlüğünün olduğunun göstermiştir. Bu nedenle, MGR sonuçlarının karar alıcılar için önerilmesi karar verilmiştir. Aynı zamanda, bakım ekibinin performansını geliştirmek ve elektrik jeneratörü arızalarını azaltmak için yapılacak olan müdahalelerin nasıl tasarlanması gerektiği konusunda da potansiyel olarak katkıda bulunacaktır. Bunun yanında, Kaba küme teorisinde bilgi yitimine dayanan kümelenme eğilimini seçmek için Minimum Bilgi Kazanç Faktörü (MIGR) isimli yeni bir teknik önermekteyiz. Bu tekniğin performansını değerlendirmek için, üç gerçek yaşam numunesinden oluşan veri seti (UCI) MIGR kullanılarak kümelenme için seçilmiştir, oluşan kümeler k-modları, bulanık ağırlık merkezi ve bulanık k-modları gibi bir çok kümeleme yöntemi ile karşılaştırılmış olan Min-Min-Rough(MMR) ve Bilgi-Teori Bağımlılık Pürüzlülüğü-Information Theoretic Dependency Roughness (ITDR) tekniklerinden elde edilen kümeler ile karşılaştırılmıştır. Oluşan kümelerin kalite karşılaştırması için Doğruluk ve F-ölçümü seçilen ölçütler olmuştur. Deneysel sonuçlar MIGR algoritmasının MMR ve ITDR algoritma sonuçlarından daha üstün olduğunu göstermektedir; bu nedenle kategorik verinin kümelenmesi için kullanılabilmektedir.Anahtar Kelimeler: Kategorik veri (bakım), Kaba küme teorisi, Kümeleme, Bilgi Sistemi, Bilgi teorisi, Karar alıcılar

Categorical data clustering is getting more and more important part of data mining. In this study, we compared four data clustering methods which are VPRS, MTMDP, ITDR and MGR to cluster four real life databases. The VPRS, MTMDP and ITDR algorithms are based on the Rough Set Theory while the MGR algorithm is based on the Information Theory. Three of the databases used from UCI databases while the other database is collected for electrical generators failure from a mobile company in Iraq. Three performance measures are used to evaluate the performance of each method by calculating the purity and F-measure for the resulting clusters with respect to the database classes and the time consumed by each algorithm to process the databases. The comparison results show that the MGR has the superiority over the other algorithms. Thus, the MGR results are chosen to be proposed to the decision makers and it may potentially contribute to give a recommendation how to design intervention in order to improve the efficiency of the maintenance team performance and moreover to reduce electrical generators failure. In addition, we propose a new technique called Minimum Information Gain Roughness (MIGR) to select the clustering attribute based on information entropy in rough set theory. To evaluate the performance of this technique, three real life sample data sets (UCI) are chosen to be clustered using MIGR, the resulting clusters are compared to the clusters resulted from the Min-Min-Rough (MMR) and Information-Theoretic Dependency Roughness (ITDR) techniques which are compared with many other clustering techniques, such as k-modes, fuzzy centroids and fuzzy k-modes. Accuracy and F-measure are the measures chosen to compare the quality of the resulting clusters. The experimental results show that the MIGR algorithm outperforms the MMR and ITDR algorithms; therefore, it can be used for clustering categorical data.Keywords: Categorical data (maintenance), Rough set theory, Clustering, Information system, Information theory, Decision markers.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/131209

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess