İstatiksel modelleme ile konuşmacı tanıma

Eskidere, Ömer

dc.contributor.advisor	Ertaş, Figen
dc.contributor.author	Eskidere, Ömer
dc.date.accessioned	2021-05-08T11:49:27Z
dc.date.available	2021-05-08T11:49:27Z
dc.date.submitted	2007
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/690750
dc.description.abstract	Kisilerin konusmalarından kim olduklarının belirlenebilmesi önemi giderek artanbir ilgi alanı haline gelmistir. Uzun yıllardır kullanılan parmak izi ve retina gibi kisiye has,kisinin kimligini tanımlayıcı biometrik özelliklere son yıllarda ses de eklenmistir.Konusma örneginden kisinin kimliginin belirlenebilmesinin günümüzde özelliklegüvenlik, giris ve/veya erisim kontrolü, telefon bankacılıgı gibi önemli uygulama alanlarımevcuttur. Bu tip gerçek zamanlı sistemlerde en büyük sorun seslerin kaydedildigiortamın gürültülü olması ya da konusmaların iletildigi kanalların (özellikle telefon hattı)bozucu etkisidir. Dolayısıyla, son yıllarda amaç, sistem basarımını olumsuz etkileyen butip etkileri en aza indirmek ve/veya bu sartlarda çalısacak dayanıklı sistemlergelistirmektir. Bu tezde Gauss Karısım Modeli (GKM) temeline dayanan, telefon hattıetkilerine karsı dayanıklı, bir konusmacı tanıma sistemi olusturulmustur. Sistem egitim vetest olmak üzere iki asamalıdır. Kisinin sesinden kimligini en iyi temsil eden özniteliklerolarak da MFCC kullanılmıs ve model parametreleri beklentinin maksimumlastırılmasıalgoritması ile kestirilmistir. Test asamasında aday konusmacıya ait öznitelikler, egitimasamasında olusturulan her bir konusmacı modele uygulanmakta ve maksimum olasılıgıveren model konusmacıyı belirlenmektedir.Konusmacı tanıma sistemi, temiz konusma (TIMIT) ve telefon konusması(NTIMIT) içeren iki veritabanı ile denenmistir. Her iki veritabanı için, egitim ve testasamalarında, konusmacı tanıma sistemine etkisi olan tüm parametreler incelenmis veparametrelerin optimum degerleri belirlenmistir. Ayrıca formant frekansları, perdefrekansı ve enerji gibi sesin bürünsel özellikleri tek basına ve MFCC öznitelikleri ilebirlikte kullanılarak konusmacı tanıma performansı ölçülmüs, perde frekansının, telefonortamında ortalama 8.34 puan tanıma artısı sagladıgı görülmüstür. Özniteliklerinolusturulmasında kepstrum katsayılarının kümelenerek agırlıklandırılması ve konusmacıfrekans bandı parçalara ayrılıp, bu parçalara F-oranına baglı olarak süzgeçleryerlestirilmesi önerilmis olup, bu iki yöntem ile konusmacı tanıma oranında 10 puanavaran artıs saglanmıstır.ANAHTAR KELMELER: Konusmacı tanıma, Gauss Karısım Modeli, MFCC,Öznitelik vektörleri, TIMIT/NTIMIT verita
dc.description.abstract	Identifying speakers from their voices has been an area of interest that receivedever increasing attention. In recent years, voice has also been added to the individualspecificbiometric features representing the identity of individuals such as commonlyemployed finger print and retina, and the identification of speakers from their voicesamples has recently found place particularly in security, access control, and telephonebanking applications. The problem in such real time systems is the noise and/or distortioninduced by the environments where the speech samples are taken and the media(particularly telephone lines) though which the speech samples are transmitted,respectively. In recent years, efforts have been made to minimize the impact of suchfactors that severely damage the identification performance, or to develop systems that arerobust to such disturbances.In this thesis, a speaker identification system based on Gaussian Mixture Model(GMM) has been developed that is robust to telephone line distortion, employing melfrequency cepstrum coefficients (MFCC) as speaker specific features, which are known tobest represent speakers? identity, along with the Expectation Maximization algorithm forthe estimation of speaker model parameters. The system consists of two stages, namely,training and testing. In the training session, a model is produced for each speaker torepresent their identity, and the input speaker is identified in the test session by decidingon the model that provides the highest probability. The system has been tested on bothclean speech (TIMIT) and telephone speech (NTIMIT) databases. From feature extractionto model training and testing, various parameters that affect the system performance havebeen investigated and optimized using both speech databases. Identification performanceof the system has been determined for cases where prosodic features of speech such asformant frequency, pitch frequency, and energy are employed on their own and incombination with MFCC. It has been found that pitch frequency provides 8.34 pointincrease in identification performance on telephone speech when used in combinationwith MFCC. Weighted clustering of cepstral coefficients and adaptive filtering have beenintroduced in extracting discriminatory features. Up to 10 point increase in identificationperformance has been obtained by each technique.Keywords: Speaker Identification, Gaussian Mixture Models, MFCC, Feature vectors,TIMIT/ NTIMIT databases	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Elektrik ve Elektronik Mühendisliği	tr_TR
dc.subject	Electrical and Electronics Engineering	en_US
dc.title	İstatiksel modelleme ile konuşmacı tanıma
dc.title.alternative	Speaker identification with statistics modeling
dc.type	doctoralThesis
dc.date.updated	2018-08-06
dc.contributor.department	Elektronik Mühendisliği Ana Bilim Dalı
dc.subject.ytm	Speaker recognition
dc.identifier.yokid	9010412
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	ULUDAĞ ÜNİVERSİTESİ
dc.identifier.thesisid	202288
dc.description.pages	216
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_9010412.pdf
Size:: 11.10Mb
Format:: PDF
Description:: File_9010412

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess