Application of privacy-preserving clustering methods using homomorphic encryption algorithms

Aydin, İsmail

View/Open

File_10212532 (16.38Mb)

Date

2018

Author

Aydin, İsmail

Metadata

Show full item record

Abstract

Günümüzde finans, sağlık, askeri sistemler veya sosyal platformlarda elde edilmiş vemahremiyeti korunması gereken büyük veri topluluklarının işlenmesi/anlamlandırılmasıihtiyacı mevcuttur. Mahremiyet koruma amacıyla klasik şifreleme yöntemlerinin kullanımı,verinin kullanılacağı sistemde şifreleme, şifre çözme veya verinin anlamlandırılması işlemlerininen az ikisinin aynı yerde yapılmasını gerektirir. Veri büyüklüğünün artması ileberaber bu işlemlerin aynı yerde yapılması durumunda büyük miktarlarda bir işlem gücüihtiyacı doğacaktır.Klasik anlamda kriptolama yöntemlerinin çok sayıda bağlantı içeren büyük veri sistemlerindekullanımı durumunda, işlem yüküne ek olarak çok sayıda kullanıcının her birindeuygun anahtar dağıtım mekanizmalarının da çalışması gerekecektir. Çok sayıda kullanıcınınbir araya gelmiş olduğu bir büyük veri sisteminde gerek anahtar dağıtım mekanizmalarınınkoşmasının, gerekse de büyük veri üzerinde yapılacak yüksek işlem gücü gerektirenişlemlerin ortak bir platform üzerinde yapılmasına imkan vermesi sebebiyle bu çalışmadahomomorfik şifreleme yöntemlerinin kullanımı önerilmektedir. Homomorfik şifrelemeyöntemleri ile beraber şifreli veri üzerinde uygun makine öğrenme yöntemleri kullanılmasısayesinde büyük verilerin paydaşlara dağıtımının ve veri işlemenin mahremiyete aykırıbir durum oluşturmadan yapılabilmesi mümkün hale gelmektedir.Bu sayede sistem paydaşlarının yüksek işlem kapasitesine sahip olmasına gerek kalmadanbüyük veri işleme mekanizmalarına dahil olup, işlem yapabilme imkanına sahip olmasısağlanacaktır. Tasarlanan sistemin çalışmasına uygun olması sebebiyle asimetrik birşifreleme algoritması olan ve homomorfik özellik göstermesi sebebiyle mahremiyet korumaamacıyla Paillier kriptolama sistemi kullanılmıştır. Makine öğrenme yöntemlerinin uygulaması amacıyla tasarlanan sistem üzerinde farklı veri uzunlukları, farklı anahtar uzunlukları kullanılarak mahremiyeti sağlanan sistemde 4 ayrı makine öğrenme yöntemi koşturulmuştur. Her algoritmanın farklı anahtar ve veri uzunluğu için göstermiş olduğu performans,aynı verinin açık ve kapalı halleri üzerinde koşturulan makine öğrenme algoritmalarının6 farklı ölçüt üzerinden değerlendirmeye tutulması ile tespit edilmiştir.

The need of protection and processing of the sensitive data in large scale data systems(for example data derived from financial systems, militaristic systems or social mediaplatforms) is a common problem. Usage of traditional cryptographic methods for dataprotection mainly needs at least two of the ciphering, deciphering and data processingworks to be done on the same side. Because of this, with increase of the data size therewill be a need for higher processing power to work on the data.Using traditional encryption algorithms for protection of the sensitive data on large scalesystems, also brings the need of exchanging the needed keys for protection and processingthe data. Homomorphic encryption schemes have enough flexibility that, they should beused on data systems that contains data from multiple parts, because of its feature ofallowing to process the encrypted data like its non-encrypted form.With the usage of homomorphic encryption schemes and proper data learning systemson encrypted data, distribution of sensitive data to different parties can be done withoutviolating its privacy. In this thesis, we propose a method to run mathematical computationswhich needs high processing power on a common platform which offers high processingpower of data but not on parties that the sensitive data will be distributed. As a resultthe partners of this systems will not need to have high processing power to function onthe data because the high processing demanding tasks would be done on the commonplatform.In this research Paillier Cryptographic system was used to protect data privacy. PaillierCryptographic algorithm's most prominent features are its asymmetrical and partiallyhomomorphic behavior. We proposed a system that uses privacy preserving distancematrix calculation as input for several clustering algorithms which are commonly usedin machine learning systems. Our system is evaluated considering different data lengthsand different key lengths. Four different data clustering methods have been tested. Byapplying clustering algorithms on both encrypted and plain forms of the same data fordifferent key and data lengths, we obtained performance results by using six differentmetrics.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/631435

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess