Multi-label classification of text document using deep learning

Mohammed, Hamza Haruna

dc.contributor.advisor	Görür, Abdül Kadir
dc.contributor.advisor	Choupanı, Roya
dc.contributor.author	Mohammed, Hamza Haruna
dc.date.accessioned	2020-12-04T11:14:28Z
dc.date.available	2020-12-04T11:14:28Z
dc.date.submitted	2019
dc.date.issued	2019-11-28
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/77445
dc.description.abstract	Son zamanlarda, Doğal Dil İşleme alanında çalışmalar ve bununla ilgili bazı önemli problemler ve makine öğrenmesi alanındaki uygulamalar artmaya devam ediyor. Makina öğreniminin genel amaçlı modellerin uygulama alanina özel veri ile eğitilerek kullanılması ile veriye dayalı olduğu kanıtlanmıştır. Bu yöntemin pratikte sıkça karşılaştığımız karmaşık veri bağımlılıklarının modellenmesinde, çok az varsayımda bulunulduğunda ve bilgilerin kendileri için konuşması açısından çok etkili bir yaklaşım olduğu kanıtlanmıştır. Kimyasal proses mühendisliği, iklim bilimi, sistemler, sağlık hizmetleri ve doğal dilin dilbilimsel işlenmesinde bazılarına örnekler verilebilir. Ayrıca, metin sınıflandırma Doğal Dil İşlemenin önemli yönlerinden biridir. Metin sınıflandırma, metin veya metin belgelerini belirli bir etiket grubuna kategorize etme eylemidir. Öte yandan, çok etiketli metin sınıflandırma, metin veya belgelerin aynı anda bir başka etikete sınıflandırılması ile ilgilidir. Yıllar içinde kelime çantası modelleri, denetimli makina öğrenmesi, ağaç azaltma ve etiket-vektör gömmeleri gibi metodlar önerilmiştir. Bu tür araçlar, belge filtreleme, arama motorları, doküman yönetim sistemleri gibi gerçek dünyadaki birçok uygulamada kullanılabilir. Son zamanlarda derin öğrenmeye dayalı modeller, bunların içinde de aşırı çoklu etiketli metin sınıflandırma modeli, ilgi çekmeye başlamıştır. Derin öğrenme, resim ve metin belgeleri gibi yüksek boyutlu ve yapılandırılmamış verileri içeren birçok makine öğrenimi uygulamasının ana çözümlerinden biridir. Bununla birlikte, bu uygulamaların birçoğunda, bu modellerin öngörüleriyle ilgili belirsizlikleri doğru bir şekilde aktarabilmek çok önemlidir. Bu sebeple, bu çalışmada çoklu etiketli metin sınıflandırma problemini evrişimsel sinir ağları, yinelemeli sinir ağları, uzun kısa zamanlı hafıza modelleri ve geçitli tekrarlayan birimler modelleriyle araştırdık. Bu çalışmada iki senaryo kulandık. Birincisi, gömme katmanıyla ve ikincisi Word2vec, Glove ve FastText gibi önceden eğitilmiş bir gömme bütüncesi ile çok etiketli sınıflandırma. Bu farklı sinir ağı modeli performanslarını, bu iki yaklaşıma göre çok etiketli değerlendirme ölçütleri açısından değerlendirdik ve karşılaştırdık.
dc.description.abstract	Recently, studies in the field of Natural Language Processing and some of its related important problem and Applications in the machine learning field continue to mount up. Machine Learning is prove to be predominantly data-driven in the sense that generic model buildings are used and then tailored to a specific application data. Needless to say, this has proven to be a very effective approach to modeling the complicated data dependencies we frequently experience in practice, making very few assumptions and allowing the information to talk for themselves. Examples can be found in chemical process engineering, climate science, systems, healthcare, and linguistic processing of natural language, to name a few. Moreover, text classification is one of the important aspect of Natural Language Processing. Text classification is the act of categorizing text or text documents into a given set of labels. While on the other hand, multi-label text classification deals with classifying text or documents into one more labels at the same time. Over the years, some methods for classifying text and documents have been proposed, including popularly known Bag of Words (BoW) method, Supervised Machine Learning, tree induction and label-vector embedding, to mention a few. These kind of tools can be used in many digital applications, such as document filtering, search engines, document management systems, etc. Lately, Deep Learning based methods is getting more attention, especially in an Extreme Multi-Label text classification. Deep learning is one of the major solutions to many machine learning applications that involve high-dimensional and unstructured data, such as pictures and text documents. However, it is of paramount importance in many of these applications to be able to reason accurately about the uncertainties associated with the predictions of these models. Therefore in this studies, we explore multi-label classification of text documents using deep learning methods such as CNN, RNN, LSTM, and even GRU. We investigate two scenarios in the studies. Firstly, multi-label classification models with plane embedding layer, and secondly with a Glove, Word2vec, and FastText as pre-trained embedding corpus for our models. We evaluate and compare these different neural network models performances in terms of multi-label evaluation metrics with respect to the two approaches.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Multi-label classification of text document using deep learning
dc.title.alternative	Derin öğrenme kullanan metin belgelerinin çoklu etiket sınıflandırılması
dc.type	masterThesis
dc.date.updated	2019-11-28
dc.contributor.department	Bilgisayar Mühendisliği Anabilim Dalı
dc.subject.ytm	Machine learning
dc.identifier.yokid	10299590
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	ÇANKAYA ÜNİVERSİTESİ
dc.identifier.thesisid	584852
dc.description.pages	82
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10299590.pdf
Size:: 1.923Mb
Format:: PDF
Description:: File_10299590

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess