Handling categorical data in artificial neural networks

Anwar, Andam Omar Anwar

dc.contributor.advisor	Kurnaz, Sefer
dc.contributor.author	Anwar, Andam Omar Anwar
dc.date.accessioned	2021-05-06T12:25:50Z
dc.date.available	2021-05-06T12:25:50Z
dc.date.submitted	2019
dc.date.issued	2020-02-21
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/588501
dc.description.abstract	Makine öğrenmesinin teknikleri yaşadığımız gerçek hayatta geniş konulardan biri olmuştur. Verilerin toplandığı ortamlarla etkileşime girecek veri kümelerin yaygın olarak kullanılan makine öğrenme tekniklerinden biri olan ve diğer ML tekniklerine kıyasla üstün performans gösteren Yapay Sinir Ağları sayılabilir. Diğer metoda benzer, ANN'in verilen setindeki örnekler kullanılarak eğitildi ve böylece YSA'dan istenen görevin yerine getirilmesi sağlanır. Gerçek hayatta birkaç şekilden oluşuyor, değerler, ağırlıklı olarak sayılan ve ANN'nda kaliteli sayılan ölçü ya da sayım almak için kolay bir şeydi ve bu ağların girdileri sayısal olduğundan biri olur. Bununla beraber, sayılan ölçü ya da sayım kıyyem isimleri yoktur. Bu tür değerleri, sayısal formatta kodlamak için önemli bilgi kaybına nedeni olabilir. Bu yüzden, uygun sayısal değerlerin seçimi, sinir ağlarının performansını için önemli ölçüde iyileştirebilir ve bunun tersi de geçerli olabilir.Bu araştırmada, YSA'ların veri setindeki nitel değerler için uygun değerlerin üretmelerine olanak tanıyan yeni bir metot önerilmiştir. ANN'ın nöronları birbirine bağlayan ağırlıkları güncellemek için geri yayılım kullandığından önerilen yöntem bir sıcak kodlama (OHE) kullanarak, her nitel özeliği için bir vektör üretir. Eğitim sırasında, YSA, tahmini ile YSA'dan istenen gerçek değerler arasındaki yanlışlar azaltmak için nominal değerlerinin her birine karşılık gelen ağırlığı güncellerdir. Her vektör, bir sonraki katmandaki tek bir nörona bağlanır, böylece OHE kullandığı zaman, o nöronda görünen değer, değerin vektörde birine ayarlandığı konuma karşılık gelen ağırlığa eşittir.YSA'nın performansını önerilen kodlama yöntemini kullananının değerlendirmesi için, ağların performansını eğitmek ve değerlendirmek için farklı gerçek hayattaki veri kümeleri kullanılır. Her veri kümesi başına, tahminlerin doğruluğu ve eğitim dönemlerine karşı kaybı, etiket kodlu girişleri olan standart YSA'lar ve önerilen kodlama yöntemini kullananları için izlenmektedir. Değerlendirme, önerilen yöntemin ANN'in performansını daha yüksek öğrenme oranlarıyla, yani doğruluk, kayıpta ki düşüşün daha hızlı artması ve eğitim tamamlandığında daha iyi tahminlerle gösterildiği gibi önemli ölçüde iyileştirdiğini göstermektedir. Bundan dolayı, önerilen kodlama yöntemi aynı YSA'nın performansını artırabilir veya daha az karmaşık ağlar kullanarak benzer performans üretebilmektedir.Anahtar Kelimeler: Yapay Sinir Ağı; Etiket Kodlaması; Bir Sıcak Kodlama; Geri Yayılım.
dc.description.abstract	Machine Learning (ML) techniques are being widely used to extract knowledge from real-life datasets to interact with the environments that these data are collected from. One of the widely used machine learning techniques that has shown outstanding performance, compared to other ML technique, is the Artificial Neural Networks (ANN). Similar to other ML methods, ANNs are trained using the instances in the dataset, so that, the task required from the ANN can be achieved. Real-life datasets consist of different types of values, mainly quantitative and qualitative. Handling the quantitative data is an easy task in ANN, as the inputs of these networks is numerical. However, as qualitative data may contain nominal values do not have meaningful order, encoding such values into numerical format can cause the loss of important knowledge. Thus, the selection of appropriate numerical values can significantly improve the neural networks' performance, and vice versa.In this study, a novel method is proposed to allow ANNs produce the suitable values for the qualitative values in the dataset. As the ANN uses backpropagation to update the weights that connect the neurons in the network to each other, the proposed method produces a vector for each qualitative attribute, using One-Hot-Encoding (OHE). During training, the ANN updates the weight corresponding to each of the nominal values to reduce the error between the predicted values and the actual values required from the ANN. Each vector is connected to a single neuron in the next layer, so that, by using the OHE, the value that appears on that neuron is equal to the weight corresponding to the position where the value is set to one in the vector.To evaluate the ANN's performance using the proposed encoding method, different real-life datasets are used to train and evaluate the performance of the networks. Per each dataset, the predictions accuracy and loss versus the training epochs are monitored for standard ANNs with label-encoded inputs, and those using the proposed encoding method. The evaluation shows that the proposed method has improved the performance of the ANN significantly, illustrated by the higher learning rate, i.e. faster rise in accuracy and reduction in loss, as well as better predictions when the training is complete. Thus, the proposed encoding method can improve the performance of the same ANN, or produce similar performance using less-complex networks.Keywords: Artificial Neural Network; Label Encoding; One Hot Encoding; Backpropagation.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Handling categorical data in artificial neural networks
dc.type	masterThesis
dc.date.updated	2020-02-21
dc.contributor.department	Elektrik ve Bilgisayar Mühendisliği Ana Bilim Dalı
dc.identifier.yokid	10268731
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	ALTINBAŞ ÜNİVERSİTESİ
dc.identifier.thesisid	611280
dc.description.pages	64
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10268731.pdf
Size:: 1.012Mb
Format:: PDF
Description:: File_10268731

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess