Discrimination analysis of lip motion features for multimodal speaker identification and speech-reading

Çetingül, Hasan Ertan

dc.contributor.advisor	Tekalp, Ahmet Murat
dc.contributor.advisor	Erzin, Engin
dc.contributor.author	Çetingül, Hasan Ertan
dc.date.accessioned	2020-12-08T08:20:18Z
dc.date.available	2020-12-08T08:20:18Z
dc.date.submitted	2005
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/171563
dc.description.abstract	Bu tezde ses, dudak dokusu, dudak geometrisi ve dudak devinimlerini birle tiren yeni bir çok-kipli konu macı/konu ma tanıma sistemi sunulmaktadır. Konu macı ve konu ma tanımauygulamalarında ses, dudak ye inli i ve/veya dudak geometri bilgisini beraber kullananbirkaç çalı ma mevcuttur. Bu çalı mada konu macı tanıma ve konu ma okuma için, ses,dudak ye inlik ve/veya geometri bilgisi ile birlikte ya da bu bilgilerin yerine, açık dudakdevinim bilgisinin kullanımı önerilmekte; konu öznitelik seçimi ile ayırım analiziçerçevesinde incelenmektedir. çalı ma iki önemli soruya cevap aramaktadır: i) Açık dudakdevinim bilgisi yararlı mıdır? ve ii) Devinim bilgisi yararlı ise, sözü edilen uygulamalardaeniyi dudak devinim öznitelikleri nelerdir? Konu macılar arasında en yüksek ayrımı sa layanöznitelikler, konu macı tanıma probleminde eniyi dudak devinim öznitelikleri olmaklaberaber konu ma okumada eniyi öznitelikler, en yüksek fonem/kelime/deyi tanıma oranınaeri enlerdir. Ses doru u, mel frekans kepstral katsayıları ile katsayıların birinci ve ikincitürevleriyle gösterilirken, dudak doku kipi, dudak bölgesinin ye inlik de erlerinin 2B-AKD(Ayrık Kosinüs Dönü ümü) katsayıları ile ifade edilmektedir. Birden çok dudak devinimöznitelik adayı ele alınmaktadır: dudak bölgesi içinde ızgara-tabanlı yo un devinimöznitelikleri, dudak çevriti üzerinde devinim öznitelikleri ve son olarak dudak ekilparametreleri ile bunların bile imleri. Buna ek olarak, konu macı tanıma ve konu maokumada eniyi dudak devinim özniteliklerini belirlemek üzere iki basamaklı yeni birayrımsama analizi tanıtılmaktadır. Ses, dudak dokusu ve dudak devinim kiplerinintümle tirilmesi Güvenilirlik A ırlıklı Toplama karar kuralıyla gerçekle tirilmi tir. Deneyselsonuçlarda, önerilen ayırımsal analizin dudak deviniminin tek-kipli ba arımını oldukçageli tirdi i görülmektedir. Bunun yanında, ses ve dudak doku bilgisi ile birlikte açık dudakdevinim bilgisinin kullanımı, iki-kipli konu macı/konu ma tanıma sistemlerininba arımlarında ilave kazanım sa lamaktadır.
dc.description.abstract	In this thesis a new multimodal speaker/speech recognition system that integrates audio, liptexture, lip geometry, and lip motion modalities is presented. There have been several studiesthat jointly use audio, lip intensity and/or lip geometry information for speaker identificationand speech recognition applications. This work proposes using explicit lip motioninformation, instead of or in addition to audio, lip intensity and/or geometry information, forspeaker identification and speech-reading within a unified feature selection and discriminationanalysis framework, and addresses two important issues: i) Is using explicit lip motioninformation useful? and ii) if so, what are the best lip motion features for these twoapplications? The best lip motion features for speaker identification are considered to be thosethat result in the highest discrimination of individual speakers in a population, whereas forspeech-reading, the best features are those providing the highest phoneme/word/phraserecognition rate. The audio modality is represented by the well-known mel-frequency cepstralcoefficients (MFCC) along with the first and second derivatives, whereas lip texture modalityis represented by the 2D-DCT coefficients of the luminance component within a boundingbox about the lip region. Several lip motion feature candidates are considered including densemotion features within a bounding box around the lip, lip contour motion features, lip shapefeatures, and combinations of them. Furthermore, a novel two-stage discriminant analysis isintroduced to select the best lip motion features for speaker identification and speech-readingapplications. The fusion of audio, lip texture and lip motion modalities is performed by the so-called Reliability Weighted Summation (RWS) decision rule. Experimental results show thatthe proposed discriminative analysis significantly improves the unimodal performance of thelip motion modality. Moreover, using explicit lip motion information in addition to audio andlip texture yields further performance gains in bimodal speaker/speech recognition systems.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.subject	Elektrik ve Elektronik Mühendisliği	tr_TR
dc.subject	Electrical and Electronics Engineering	en_US
dc.title	Discrimination analysis of lip motion features for multimodal speaker identification and speech-reading
dc.title.alternative	Çok-kipli konuşmacı ve konuşma tanıma uygulamaları için dudak devinim öz niteliklerinde ayırıcı analiz
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Diğer
dc.identifier.yokid	198295
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	KOÇ ÜNİVERSİTESİ
dc.identifier.thesisid	198577
dc.description.pages	85
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_198295.pdf
Size:: 931.2Kb
Format:: PDF
Description:: File_198295

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess