Veri madenciliği sürecinde veri ayrıklaştırma yöntemlerinin karşılaştırılması ve bir uygulama

Koçoğlu, Fatma Önay

dc.contributor.advisor	Özkan, Yalçın
dc.contributor.author	Koçoğlu, Fatma Önay
dc.date.accessioned	2020-12-07T13:20:06Z
dc.date.available	2020-12-07T13:20:06Z
dc.date.submitted	2012
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/151778
dc.description.abstract	Toplumlar, farklı ihtiyaçlar doğrultusunda çeşitli dönüşüm süreçlerinden geçmiş, bu süreç günümüzde bilgi merkezli hale gelmiştir. Ancak amaç bilgi yığınına değil doğru ve değerli bilgiye sahip olmaktır. Bu noktada ise veri madenciliği oldukça önem kazanmaktadır.Veri Madenciliği, belirli yöntemlerin kullanılması ile var olan gizli bilgiyi ortaya çıkarma sürecidir (Özkan, 2008). Günümüzde her türlü veri, veri tabanları veya veri ambarlarında tutulmaktadır. Ancak tutulan tüm bu verilerin doğru olduğunu söylemek imkansızdır. Verilerin eksik ya da gerçeğe uygun olmayan yanlış şekilde girilmesi, aynı anlamdaki birden fazla verinin gereksiz var olması ve verilerin tutarsız olması veri madenciliği süreci sonrası elde edilecek bilgilerin yanlış ve doğrudan uzak olmasına neden olabilir. Verilerin etkin ve verimli bir şekilde işlenebilir ve yorumlanabilir olması için verilerin belirli kalite kriterlerini karşılayabilir olması gerekmektedir (Müller ve Freytag, 2003). Veri madenciliği farklı adımlardan oluşmak ile beraber bu adımlardan biri verilerin ön işlenmesidir. Nitelikli bilgilerin elde edilmesi nitelikli veriler ile sağlanabilir. Dolayısıyla bu adım elde edilecek sonuçlar için oldukça önemlidir.Veri önişleme süreci içerisinde yer alan adımlardan birisi de veri ayrıklaştırmasıdır. Veri ayrıklaştırma işlemi için farklı yöntemler kullanılmaktadır. Bu yöntemlerden hangisinin daha etkin olduğu merak konusudur. Buradan yola çıkılarak bu tez çalışmasında veri kümelerine farklı ayrıklaştırma yöntemlerinin uygulanması ve hangi yöntemin daha etkin olduğu hususunun incelenmesi amaçlanmıştır.Çalışma kapsamında Wisconsin Üniversitesi Hastaneleri'nde meme kanseri teşhisi sonucu ameliyat edilen hastalardan alınan örneklerin yer aldığı Wisconsin veri kümesi kullanılmıştır. Bu veri kümesi üzerine KEEL veri madenciliği yazılım aracı yardımı ile 1RD, CADD, CAIM, Chi2, ChiMerge, ID3, Eşit Genişlikli, Eşit Frekanslı olmak üzere sekiz farlı ayrıklaştırma yöntemi uygulanmıştır.Chi2, ChiMerge, CAIM algoritmalarının gözle görülür bir oranda tutarlı çalıştıkları, 1RD algoritmasının genelde bir, ID3 algoritmasının da çok sayıda kategorik değişken elde ettiği belirlenmiştir. Bunun gibi çok sayıda kategorik değişken atayan farklı ayrıklaştırma yöntemlerinin, aynı nitelik değeri için birebir olmasa da çok yakın kategorik değişkenler atadığı belirlenmiştir.Elde edilen bu sonuçlar veri madenciliği çalışmasında kategorik niteliklerle çalışmak gerektiğinde nasıl bir yol izlenmesi gerektiğini ve uygun yöntemin seçilmesi hususunda yol göstermektedir. Türkiye'de bu alanda yapılan tez çalışmaları taranmış olup literatürde eksikliği gözlemlendiğinden çalışmanın sonuçlarının literatüre de katkısının olması beklenmektedir.
dc.description.abstract	Societies have different needs in various conversion processes in the past but today this process has based on the information. The goal, not having a heap of information, is having accurate and valuable information. At this point, data mining is very important.Data Mining is a process that reveals confidential information with using certain methods (Özkan, 2008). Today, all kind of data are kept in databases or data warehouses. However, it is impossible to say that all of this data is true. Missing or incorrectly entered data, having multiple redundant data that have same meanings, inconsistent data may cause obtaining incorrect information after data mining process. To interpret or to process the data effectively and efficiently, the data has to have certain quality criterias (Müller & Fraytag, 2003). The first step of data mining is preparing the data. Obtaining quality information can be provided with qualified data. Therefore, this step is very important for results obtained.One of the step in the process of data preprocessing is data discretization. There are different methods used for data discretization process. It is an enigma that which of these methods is more effective. Thus, In this thesis study implementation of data discretization methods on the different data sets and to investigate which method is more efficient is aimed.Within this study, to apply the selected data discretization methods wisconsin data set which cases from a study that was conducted at the University of Wisconsin Hospitals about patients who had undergone surgery for breast cancer, was selected. On this data set with the help of data mining software tool KEEL; as 1RD, CADD, CAIM, Chi2, ChiMerge, ID3, Equal Width, Equal Frequency eight different discretization methods are applied.Chi2, ChiMerge, CAIM algorithms work consistently in a considerable proportion, usually with 1RD algorithm one and with ID3 algorithm a large number of categorical variables were obtained. The discretization methods? that assigns too many categorical variable, categorical values are too close but not same.These results lead that how to study when working with categorical attributes is needed and how to select the appropriate method. In addition, the lack of literature in this field in Turkey is scanned and results of the study are expected to contribute to the literature to fill a gap.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.subject	Bilim ve Teknoloji	tr_TR
dc.subject	Science and Technology	en_US
dc.title	Veri madenciliği sürecinde veri ayrıklaştırma yöntemlerinin karşılaştırılması ve bir uygulama
dc.title.alternative	Comparison of data discretization methods in data mining process and an application
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Enformatik Anabilim Dalı
dc.subject.ytm	Data warehouse
dc.subject.ytm	Data quality
dc.subject.ytm	Data mining
dc.subject.ytm	Data cleaning
dc.subject.ytm	Data
dc.subject.ytm	Data processing
dc.subject.ytm	Database
dc.identifier.yokid	435579
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	İSTANBUL ÜNİVERSİTESİ
dc.identifier.thesisid	316394
dc.description.pages	90
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_435579.pdf
Size:: 1.710Mb
Format:: PDF
Description:: File_435579

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess