Show simple item record

dc.contributor.advisorTekalp, Murat
dc.contributor.authorSargin, Mehmet Emre
dc.date.accessioned2020-12-08T08:16:15Z
dc.date.available2020-12-08T08:16:15Z
dc.date.submitted2006
dc.date.issued2018-08-06
dc.identifier.urihttps://acikbilim.yok.gov.tr/handle/20.500.12812/171245
dc.description.abstract
dc.description.abstractThis thesis addresses two major problems of multimodal signal processing using audio-visual correlation modeling: speaker recognition and speaker synthesis. We address the ï¬rstproblem, i.e., the audiovisual speaker recognition problem within an open-set identiï¬cationframework, where audio (speech) and lip texture (intensity) modalities are fused employinga combination of early and late integration techniques. We ï¬rst perform a canonical corre-lation analysis (CCA) on the audio and lip modalities so as to extract the correlated partof the information, and then employ an optimal combination of early and late integrationtechniques to fuse the extracted features. The results of the experiments indicate that theproposed multimodal fusion scheme improves the identiï¬cation performance over the earlyand late integration of original modalities. We also demonstrate the importance of modalitysynchronization for the performance of early integration techniques and propose a CCA-based method to synchronize audio and lip modalities. We address the second problem,i.e., the speaker synthesis problem within the context of a speech-driven speaker animationapplication. More speciï¬cally, we present a Hidden Markov Model (HMM) based two-stagemethod for joint analysis of head gesture and speech prosody patterns of a speaker towardsautomatic realistic synthesis of head gestures from speech prosody. The analysis method isused to learn correlations between head gestures and prosody for a particular speaker froma training video sequence. The resulting audio-visual mapping model is then employed tosynthesize natural head gestures on a given 3D head model for the speaker from arbitraryinput test speech. Objective and subjective evaluations indicate that the proposed synthesisby analysis scheme provides natural looking head gestures for the speaker with any inputtest speech.iven_US
dc.languageEnglish
dc.language.isoen
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightsAttribution 4.0 United Statestr_TR
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.subjectElektrik ve Elektronik Mühendisliğitr_TR
dc.subjectElectrical and Electronics Engineeringen_US
dc.titleAudio-visual correlation modeling for speaker identification and synthesis
dc.title.alternativeKonuşmacı tanıma ve sentezi için görsel işitsel ilinti modellenmesi
dc.typemasterThesis
dc.date.updated2018-08-06
dc.contributor.departmentElektrik ve Bilgisayar Mühendisliği Anabilim Dalı
dc.identifier.yokid156093
dc.publisher.instituteFen Bilimleri Enstitüsü
dc.publisher.universityKOÇ ÜNİVERSİTESİ
dc.identifier.thesisid182064
dc.description.pages83
dc.publisher.disciplineDiğer


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record

info:eu-repo/semantics/openAccess
Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess