Derin öğrenme ve büyük veri yaklaşımları ile metin analizi

Ay Karakuş, Betül

dc.contributor.advisor	Aydın, Galip
dc.contributor.author	Ay Karakuş, Betül
dc.date.accessioned	2020-12-29T12:06:49Z
dc.date.available	2020-12-29T12:06:49Z
dc.date.submitted	2018
dc.date.issued	2018-12-04
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/408610
dc.description.abstract	Büyük veri analitiği ve derin öğrenme, gelişen dijital dünyada veri biliminin son yıllarda odaklandığı iki önemli araştırma ve çalışma alanıdır. Büyük miktarda ve farklı çeşitlikteki metin verilerini geleneksel yazılım araçları ve teknolojileri kullanarak analiz etmek ve yönetmek zor bir problemdir. Bu tez çalışmasında büyük veri teknolojileri ve derin öğrenme mimarileri detaylı bir şekilde analiz edilmiş olup dört temel uygulama sunularak akademik katkı sunulması hedeflenmiştir. İlk olarak çağrı merkezleri için bulut tabanlı dağıtık performans analizi ve değerlendirme sistemi geliştirilmiştir. Bu sistemin, iç ve dış çağrı kayıtlarını dağıtık bir şekilde işleyen bulut tabanlı bir performans ölçüm sistemi sunarak müşteri memnuniyeti, satış ve pazarlama, hizmet kalitesi ve performans yönetiminde yüksek bir performans ile önemli bir katkı sağlaması hedeflenmiştir. İkinci uygulamada büyük veri teknolojileri kullanarak Türk dili için dağıtık okunabilirlik analiz sistemi geliştirilmiştir. Türkiye'de eğitim kurumları tarafından kullanılan herhangi bir okunabilirlik uygulaması yoktur ve bu ihtiyaçtan dolayı Türkçe okuma kitaplarını kısa bir sürede analiz edecek okunabilirlik sistemi geliştirilmiştir. Üçüncü uygulamada, farklı mimariler, yöntemler, katmanlar ve hiper parametre optimizasyonları ile oluşturulan derin öğrenme modelleri ile duygu analizi ve haberler veri setinde çok kategorili metin sınıflandırma çalışmaları gerçekleştirilmiştir. Dördüncü uygulamada ise dil bağımsız metin sınıflandırma problemlerinde kullanılabilecek yeni bir Ortalama Doküman Vektörü (ADE) yöntemi sunulmuştur. Önerilen yöntem Türkçe ve İngilizce film yorumlarında duygu sınıflandırması için test edilmiş ve başarılı bir performans göstermiştir. Türkçe metin sınıflandırma çalışmalarında kullanılabilecek büyük ölçekli bir kıyaslama veri seti yoktur. Bu tez çalışmasının diğer temel katkısı ise bu ihtiyacı karşılamaya yönelik yaklaşık 1 milyon benzersiz kelime içeren Türkçe haberler veri setinin ve 150 bin adet etiketli Türkçe film yorumları veri setinin oluşturulması ve akademik kullanıma açık olarak sunulmasıdır.
dc.description.abstract	Big data analytics and deep learning are two significant areas of research and study that data science has focused on in the developing digital world over the last few years. Analyzing and managing large amounts of text data using a variety of traditional software tools and technologies is a difficult problem. In this thesis, big data technologies and deep learning architectures have been analyzed in detail and four basic applications have been proposed as academic contributions. First, a cloud based distributed performance analysis and evaluation system was developed for call centers. The proposed system aims to provide significant contribution in terms of customer satisfaction, sales and marketing, high quality of service and performance management by offering a cloud based performance measurement system that handles both internal and external call records in a distributed manner. Second, a distributed readability analysis system for the Turkish language was developed using big data technologies. There is no readability application used by educational institutions in Turkey and due to this need, a readability system has been developed to analyze Turkish reading books in a short time. Third, using various deep learning models which are created with different architectures, methods, layers and hyper parameters sentiment analysis and multi-category text classification on news datasets studies are performed. Lastly, a novel Average Document Embeddings (ADE) approach is presented which can be used for multi-category language independent text classification. The proposed method has been tested for sentiment classification in Turkish and English movie reviews and has performed well.There is no large scale benchmark dataset that can be used in Turkish text classification studies. The other main contribution of this thesis is that the Turkish news data set containing about 1 million unique words to meet this need and the creation of the 150,000 labeled Turkish movie reviews dataset, which is made available for academic use.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Derin öğrenme ve büyük veri yaklaşımları ile metin analizi
dc.title.alternative	Text analysis with deep learning and big data approaches
dc.type	doctoralThesis
dc.date.updated	2018-12-04
dc.contributor.department	Bilgisayar Mühendisliği Anabilim Dalı
dc.identifier.yokid	10197582
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	FIRAT ÜNİVERSİTESİ
dc.identifier.thesisid	521781
dc.description.pages	246
dc.publisher.discipline	Bilgisayar Mühendisliği Bilim Dalı

Files in this item

Name:: yokAcikBilim_10197582.pdf
Size:: 7.714Mb
Format:: PDF
Description:: File_10197582

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess