Scalable streaming profile clustering for TELCO analytics

Abbasoğlu, Mehmet Ali

View/Open

File_10013252 (1.577Mb)

Date

2013

Author

Abbasoğlu, Mehmet Ali

Metadata

Show full item record

Abstract

Birçok telekom analizi geçmiş arama desenlerine dayalı arama profillerine gereksinim duyar. Bu arama profilleri değişik zamanlardaki müşteri etkileşimlerinin yığılması ile oluşmaktadır. Telekom şirketlerinin pazarlama ve satış gibi operasyonlarını iyileştirecek analizler müşteri arama profilleri üzerinden yapılmaktadır. Örnek uygulamalar olarak tarife iyileştirme, müşteri bölümleme ve kullanım öngörüsü gösterilebilir. Bu tezde güncelleme katarları ile oluşan müşteri profillerinin kümelenmesi için bir yöntem sunulmaktadır. Profil kümeleri yüksek sayıda müşteri olması nedeniyle yüksek bellek ve işlemci gücü gerektir. Bu gereksinimleri karşılayabilmek için çözümümüzde dağıtık veri katarı işleme yöntemleri kullandık. Ancak profillerin makinalara dağılımını kümeleme kalitesini yüksek tutarken, her makinanın eşit miktarda profil saklamasını ve işlemesini sağlamak, dağıtık sistemlerde önemli bir zorluk. Buna ek olarak, müşterilerin arama deseni değiştirmesi ihtimali nedeniyle, profillerin makinalara dağılımı düzenli olarak güncellenmeli. Bu güncelleme işlemi çevirimiçi işleme sürecini aksatmamak için asgari miktarda yer değişimi gerçekleştirmeli. Bu tezde tüm bu ihtiyaçları karşılayan bir tekrar dağıtım tekniği sunulmuştur. Her makina kendi içerisinden mikro-kümeler oluşturmakta ve onların özetlerini merkezi makinaya göndermektedir. Merkezi makina mikro-küme özetlerini üzerinde yeni aitlik buluşsal yöntemleri içeren açgözlü bir işlemsel süreçten geçirerek profil dağıtımını güncellemektedir. Tezde ayrıca sunulan çözümün Storm ve Hbase tabanlı gerçekleştirmesini gösteren, telekom şirketleri için müşteri bölümleme amacıyla kullanılabilecek bir demo uygulaması sunulmuştur.

Many telco analytics require maintaining call profiles based on recent customer call patterns. Such profiles are typically organized as aggregations computed at different time scales over the recent customer interactions. Clustering these profiles is needed to group customers with similar calling patterns and to build aggregate models for them. Example applications include optimizing tarifs, segmentation, and usage forecasting. In this thesis, we present an approach for clustering profiles that are incrementally maintained over a stream of updates. Due to the large number of customers, maintaining profile clusters have high processing and memory resource requirements. In order to tackle this problem, we apply distributed stream processing. However, in the presence of distributed state, it is a major challenge to partition the profiles over machines (nodes) such that memory and computation balance is maintained, while keeping the clustering accuracy high. Furthermore, to adapt to potentially changing customer calling patterns, the partitioning of profiles to machines should be continuously revised, yet one should minimize the migration of profiles so as not to disturb the online processing of updates. We provide a re-partitioning technique that achieves all these goals. We keep micro-cluster summaries at each node, collect these summaries at a centralized node, and use a greedy algorithm with novel affinity heuristics to revise the partitioning. We present a demo application that showcases our Storm and Hbase based implementation in the context of a customer segmentation application.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/34728

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess