Multimodal speaker identification with audio-video processing

Kanak, Alper

dc.contributor.advisor	Tekalp, Ahmet Murat
dc.contributor.advisor	Erzin, Engin
dc.contributor.advisor	Yemez, Yücel
dc.contributor.author	Kanak, Alper
dc.date.accessioned	2020-12-08T08:21:53Z
dc.date.available	2020-12-08T08:21:53Z
dc.date.submitted	2003
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/171684
dc.description.abstract	ÖZETÇE Bu tezde, metne bağlı çoklu ortamlı bir konuşmacı tamma sistemi tanıtılmıştır. Amaç, geleneksel tek ve çift ortamlı tanıma sistemlerinin başaranını arttırmaktır. Önerilen sis tem, bir video akımında bulunan üç temel ortamı birleştirir: ses, yüz dokusu ve du dak hareketi. Video akımının her çerçevesi arasındaki dudak hareketi özdudak katsayıları ile hesaplandıktan sonra bu katsayılar bir öznitelik vektörüne dönüştürülür. Elde edilen öznitelik vektörleri, tüm akım boyunca doğrusal aradeğerlenerek ses işaretinin oram ile eşleştirildikten sonra mel-frekans kepstral katsayılarla (MFCC) birleştirilir. Sonuçta elde edilen birleşik öznitelik vektörleri, Saklı Markov modeli tabanlı bir tanıma sisteminde eğitim ve sınama amacıyla kullanılır. Yüz dokusu ise bir özyüz etki yöresinde ayrıca işlenerek karar füzyonu aşamasında sisteme katılır. Deneysel sonuçlar sistem başarımmin gösterilmesi için teze eklenmiştir.
dc.description.abstract	ABSTRACT In this thesis we present a multimodal text-dependent speaker identification system. The objective is to improve the recognition performance over conventional unimodal or bimodal schemes. The proposed system decomposes the information existing in a video stream into three modalities: voice, face texture and lip motion. Lip motion between successive frames is first computed in terms of eigenlip coefficients and then encoded as a feature vector. The feature vectors obtained along the whole stream are linearly interpolated to match the rate of the speech signal and then fused with mel frequency cepstral coefficients (MFCC) of the corresponding speech signal. The resulting joint feature vectors are used to train and test a Hidden Markov Model (HMM) based identification system. Face texture images are treated separately in eigenface domain and integrated to the system through decision-fusion. Experimental results are also included for demonstration of the system performance. IV	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Biyoloji	tr_TR
dc.subject	Biology	en_US
dc.subject	Elektrik ve Elektronik Mühendisliği	tr_TR
dc.subject	Electrical and Electronics Engineering	en_US
dc.title	Multimodal speaker identification with audio-video processing
dc.title.alternative	Çoklu-ortam ses-görüntü işleme ile biometrik konuşmacı tanıma
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Elektrik ve Bilgisayar Mühendisliği Anabilim Dalı
dc.identifier.yokid	144095
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	KOÇ ÜNİVERSİTESİ
dc.identifier.thesisid	136750
dc.description.pages	70
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_144095.pdf
Size:: 5.130Mb
Format:: PDF
Description:: File_144095

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess