Audio-visual correlation modeling for speaker identification and synthesis
dc.contributor.advisor | Tekalp, Murat | |
dc.contributor.author | Sargin, Mehmet Emre | |
dc.date.accessioned | 2020-12-08T08:16:15Z | |
dc.date.available | 2020-12-08T08:16:15Z | |
dc.date.submitted | 2006 | |
dc.date.issued | 2018-08-06 | |
dc.identifier.uri | https://acikbilim.yok.gov.tr/handle/20.500.12812/171245 | |
dc.description.abstract | ||
dc.description.abstract | This thesis addresses two major problems of multimodal signal processing using audio-visual correlation modeling: speaker recognition and speaker synthesis. We address the ï¬rstproblem, i.e., the audiovisual speaker recognition problem within an open-set identiï¬cationframework, where audio (speech) and lip texture (intensity) modalities are fused employinga combination of early and late integration techniques. We ï¬rst perform a canonical corre-lation analysis (CCA) on the audio and lip modalities so as to extract the correlated partof the information, and then employ an optimal combination of early and late integrationtechniques to fuse the extracted features. The results of the experiments indicate that theproposed multimodal fusion scheme improves the identiï¬cation performance over the earlyand late integration of original modalities. We also demonstrate the importance of modalitysynchronization for the performance of early integration techniques and propose a CCA-based method to synchronize audio and lip modalities. We address the second problem,i.e., the speaker synthesis problem within the context of a speech-driven speaker animationapplication. More speciï¬cally, we present a Hidden Markov Model (HMM) based two-stagemethod for joint analysis of head gesture and speech prosody patterns of a speaker towardsautomatic realistic synthesis of head gestures from speech prosody. The analysis method isused to learn correlations between head gestures and prosody for a particular speaker froma training video sequence. The resulting audio-visual mapping model is then employed tosynthesize natural head gestures on a given 3D head model for the speaker from arbitraryinput test speech. Objective and subjective evaluations indicate that the proposed synthesisby analysis scheme provides natural looking head gestures for the speaker with any inputtest speech.iv | en_US |
dc.language | English | |
dc.language.iso | en | |
dc.rights | info:eu-repo/semantics/openAccess | |
dc.rights | Attribution 4.0 United States | tr_TR |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ | |
dc.subject | Elektrik ve Elektronik Mühendisliği | tr_TR |
dc.subject | Electrical and Electronics Engineering | en_US |
dc.title | Audio-visual correlation modeling for speaker identification and synthesis | |
dc.title.alternative | Konuşmacı tanıma ve sentezi için görsel işitsel ilinti modellenmesi | |
dc.type | masterThesis | |
dc.date.updated | 2018-08-06 | |
dc.contributor.department | Elektrik ve Bilgisayar Mühendisliği Anabilim Dalı | |
dc.identifier.yokid | 156093 | |
dc.publisher.institute | Fen Bilimleri Enstitüsü | |
dc.publisher.university | KOÇ ÜNİVERSİTESİ | |
dc.identifier.thesisid | 182064 | |
dc.description.pages | 83 | |
dc.publisher.discipline | Diğer |