Speaker adaptation with minimal data in statistical speech synthesis systems

Mohammadi, Amir

dc.contributor.advisor	Demiroğlu, Cenk
dc.contributor.author	Mohammadi, Amir
dc.date.accessioned	2020-12-06T14:17:03Z
dc.date.available	2020-12-06T14:17:03Z
dc.date.submitted	2014
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/103628
dc.description.abstract	İstatistiksel ses sentezi (İSS) sistemleri birkaç dakikalık uyarlama verisi kullanarak hedef konuşmacının sesine uyarlama yapabilme yeteneğine sahiptir.Uyarlama için gereken konuşma sürelerini daha da aşağıya, birkaç saniyeye, düşürmek için geliştirilen uyarlama algoritmaları, teknolojinin tüketici elektroniği gibi gerçek hayattaki uygulamalarda yaygınlaşmasında önemli etkiye sahip olabilir. Bu tarz hızlı uyarlamayı başarmanın geleneksel yöntemi özses tekniğidir ki konuşma tanımada iyi çalışmaktadır fakat istatistiksel ses sentezinde algısal artifeksler ürettiği bilinmektedir. Burada, hem temel özses uyarlama algoritmasının kalite problemini giderebilecek hem de asgari veri kullanarak konuşmacı uyarlamayı sağlayacak üç yöntem önerdik.Birinci yöntemimiz uyarlama algoritmasını, artifeksleri azaltmak için konuşmacı uzayında realistik doğrultularda hareket ettirmek amacıyla sınırlamak için önerdiğimiz Bayes özses yaklaşımının kullanımına dayanan yöntemdir. İkinci metodumuz ise hedef konuşmacıya yakın, önceden eğitilmiş referans konuşmacıları bulmaya ve o referans konuşmacı modellerini ikinci bir özses uyarlama iterasyonunda kullanmaya dayanır. Her iki teknik de nesnel testlerde temel özses metodundan önemli ölçüde daha iyi sonuçlar verdi. Benzer şekilde, her ikisi de temel özses metoduyla kıyaslandığında öznel testlerde ses kalitesini arttırdı. Üçüncü metodda, önerilen özses metodu ile son teknoloji doğrusal regresyon tekniğinin ardışık kullanımının uyarım özniteliklerinin uyarlanmasını geliştirdiği görüldü.
dc.description.abstract	Statistical speech synthesis (SSS) systems have the ability to adapt to a target speaker with a couple of minutes of adaptation data. Developing adaptation algorithms to further reduce the number of adaptation utterances to a few seconds of data can havesubstantial effect on the deployment of the technology in real life applications such as consumer electronics devices. The traditionalway to achieve such rapid adaptation is the eigenvoice technique which works well in speech recognition but known to generate perceptual artifacts in statistical speech synthesis. Here, we propose three methods to both alleviate the quality problems of the baseline eigenvoice adaptation algorithm while allowing speaker adaptation with minimal data.Our first method is based on using a Bayesian eigenvoice approach for constraining the adaptation algorithm to move in realistic directions in thespeaker space to reduce artifacts.Our second method is based on finding pre-trained reference speakers that are close to the target speaker and utilizing only those reference speaker models in a second eigenvoice adaptation iteration.Both techniques performed significantly better than the baseline eigenvoice method in the objective tests.Similarly, they both improved the speech quality in subjective tests compared to the baseline eigenvoice method. In the third method, tandem use of the proposed eigenvoice method with a state-of-the-art linear regression based adaptation technique is found to improve adaptation of excitation features.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.subject	Elektrik ve Elektronik Mühendisliği	tr_TR
dc.subject	Electrical and Electronics Engineering	en_US
dc.title	Speaker adaptation with minimal data in statistical speech synthesis systems
dc.title.alternative	İstatistiksel ses sentezi sistemlerinde çok az veri ile konuşmacıya uyarlanma yöntemleri
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Diğer
dc.subject.ytm	Text to speech
dc.identifier.yokid	10049657
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	ÖZYEĞİN ÜNİVERSİTESİ
dc.identifier.thesisid	371123
dc.description.pages	60
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10049657.pdf
Size:: 883.5Kb
Format:: PDF
Description:: File_10049657

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess