Music emotion recognition: A multimodal machine learning approach

Gökalp, Cemre

dc.contributor.advisor	Durahim, Ahmet Onur
dc.contributor.advisor	Daşcı, Abdullah
dc.contributor.author	Gökalp, Cemre
dc.date.accessioned	2020-12-10T06:51:13Z
dc.date.available	2020-12-10T06:51:13Z
dc.date.submitted	2019
dc.date.issued	2019-12-30
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/213583
dc.description.abstract	Müzik duygusu tanıma, müzik bigisi çıkarım bilimsel topluluğunun yeni gelişmekte olan bir alanıdır ve aslında, duygular üzerinden yapılan müzik aramaları, web kullanıcıları tarafından kullanılan en önemli tercihlerden biridir.Dünya dijitale giderken, Last.fm gibi çevrimiçi veritabanlarındaki müzik içerikleri katlanarak genişlemesi, içeriklerin yönetilmesi ve güncel tutulması için önemli bir manuel çaba gerektiriyor. Bu nedenle, kullanıcıların duygusal durumuna göre kişiselleştirilebilecek ileri ve esnek arama mekanizmalarına olan talep son yıllarda artan ilgi görmektedir.Bu tezde, metinsel bazlı özelliklerin yanısıra müzikten türetilen sessel niteliklerle beslenen çeşitli sınıflandırılma modelleri sunarak, müzik duygu tanıma problemini ele almaya odaklanan bir çerçeve tasarlamıştır. Bu çalışmada, tempo, akustiklik ve enerji gibi ses özelliklerinin duygusal rolünü ve, iki farklı yaklaşımla, TF-IDF ve Word2Vec, elde edilen metinsel özelliklerin etkisini, hem denetimli hem de yarı denetimli tasarımlarla, dört araştırma deneyi altında ele aldık. Ayrıca, müzikten türetilen sessel özellikleri, içeriğe duyarlı verilerden gelen özelliklerle birleştirerek, çok modlu bir yaklaşım önerdik. Yüksek performanslı, otomatik bir duygu sınıflandırma sistemi oluşturmayı başarmak adına, 1500'den fazla etiketli şarkı sözü ve 2.5 milyondan fazla Türkçe belgenin bulunduğu etiketlenmemiş büyük veriyi içeren temel bir gerçek veri seti oluşturduk. Analitik modeller Python kullanılarak çapraz doğrulanmış veriler üzerinde birkaç farklı algoritma benimseyerek gerçekleştirildi. Deneylerin bir sonucu olarak, sadece ses özellikleri kullanılırken elde edilen en iyi performans %44,2 iken, metinsel özelliklerin kullanılmasıyla, sırasıyla denetimli ve yarı denetimli öğrenme paradigmaları dikkate alındığında, % 46,3 ve % 51,3 doğruluk puanları ile gelişmiş bir performans gözlenmiştir. Son olarak, sessel ve metinsel özelliklerin birleşimiyle oluşturulan bütünsel bir özellik seti yaratmış olsak da, bu yaklaşımın sınıflandırma performansı için önemli bir gelişme göstermediği gözlemlendi.
dc.description.abstract	Music emotion recognition (MER) is an emerging domain of the Music Information Retrieval (MIR) scientific community, and besides, music searches through emotions are one of the major preferences utilized by web users.As the world goes to digital, the musical contents in online databases, such as Last.fm have expanded exponentially, which require substantial manual efforts for managing them and also keeping them updated. Therefore, the demand for advanced andflexible search mechanisms, which can be personalized according to the emotional state of users, has received increasing attention in recent years.This thesis concentrates on addressing music emotion recognition problem by presenting several classification models, which were fed by textual features, as well as audio attributes extracted from the music. In this study, we build both supervised and semi-supervised classification designs under four research experiments, that addresses the emotional role of audio features, such as tempo, acousticness, and energy, and also the impact of textual features extracted by two different approaches, which are TF-IDF and Word2Vec. Furthermore, we proposed a multi-modal approach by using a combined feature-set consisting of the features from the audio content, as well as from context-aware data. For this purpose, we generated a ground truth dataset containing over 1500 labeled song lyrics and also unlabeled big data, which stands for more than 2.5 million Turkish documents, for achieving to generate an accurate automatic emotion classification system. The analytical models were conducted by adopting several algorithms on the cross-validated data by using Python. As a conclusion of the experiments, the best-attained performance was 44.2% when employing only audio features, whereas, with the usage of textual features, better performances were observed with 46.3% and 51.3% accuracy scores considering supervised and semi-supervised learning paradigms, respectively. As of last, even though we created a comprehensive feature set with the combination of audio and textual features, this approach did not display any significant improvement for classification performance.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilim ve Teknoloji	tr_TR
dc.subject	Science and Technology	en_US
dc.subject	Müzik	tr_TR
dc.subject	Music	en_US
dc.subject	İstatistik	tr_TR
dc.subject	Statistics	en_US
dc.title	Music emotion recognition: A multimodal machine learning approach
dc.title.alternative	Müzik duygusu tanıma: Çok-modlu makine öğrenmesi yaklaşımı
dc.type	masterThesis
dc.date.updated	2019-12-30
dc.contributor.department	Yönetim Bilimleri Anabilim Dalı
dc.identifier.yokid	10234904
dc.publisher.institute	Yönetim Bilimleri Enstitüsü
dc.publisher.university	SABANCI ÜNİVERSİTESİ
dc.identifier.thesisid	598403
dc.description.pages	114
dc.publisher.discipline	Endüstri Mühendisliği ve İşletme Yönetimi Bilim Dalı

Files in this item

Name:: yokAcikBilim_10234904.pdf
Size:: 1.713Mb
Format:: PDF
Description:: File_10234904

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess