Variable sized input multi layer perceptrons for speech recognition

Kurşun, Olcay

View/Open

File_95414 (3.166Mb)

Date

2000

Author

Kurşun, Olcay

Metadata

Show full item record

Abstract

ÖZET DEĞİŞKEN BOYUTLU GİRDİLİ ÇOK KATMANLI PERCEPTRONLAR İLE SES TANIMA İlk olarak, popüler istatistiksel ve sinirsel yöntemler olan, Saklı Markov Modelleri (HMM), k- Yalan Komşu, Tek ve Çok-Katmanlı Perceptronlar (MLP), Radyal-Tabanlı Fonksiyonlar, ve Uzmanlar Karışımları, ses sınıflandırma problemi üzerinde karşılaştırıldı. İstatistiksel yöntemler benzerlik tabanlı, sinir ağlan ise ayırt edici şekilde eğitilir. HMM, sesi, geçici sinyaller olarak modellerken, diğer yöntemler zaman gecikmesi kullanarak, zamanı uzaya izdüşürürler ve tüm sinyali tek bir vektör olarak gösterirler. Bu yüzden, bu metodlann girdileri sabit uzunlukta olmalıdır. Değişken uzunlukta sesbirimlerinin sınıflandırılmasında, MLP'nin ayırt edici gücünden faydalamlabilmesi için, özel eğitilmiş MLP'lerden oluşan, Değişken Boyutlu Girdili Çok Katmanlı Perceptronlar (VSIMLP) metodu önerildi. Birbirine yakın altı Japon sesbirimini, /b,d,g,m,n,N/, içeren veriseti, yukarıda adı geçen yöntemlerin karşılaştırılmasında kullanıldı. VSIMLP metodu, UMU 39-sınıf sesbirimi tanıma problemi üzerinde test edildi. Girdi uzayının fazla boyutluluğu ve ses sentez fizyolojisi sebepleriyle, yerel yöntemlerin, ses tanıma probleminde önemli bir yeri olduğu sonucuna varıldı. VSIMLP-HMM melez yöntemlerinin kelime veya cümle tanımada kullanışlı olduğu ve VSMLP'nin temel aldığı fikirlerin, MLP yerine, yerel yöntemlere de uygulanabileceği görüldü.

IV ABSTRACT VARIABLE SIZED INPUT MULTI LAYER PERCEPTRONS FOR SPEECH RECOGNITION First, we review popular statistical and neural methods for classification, which are Hidden Markov Models (HMM), ^-Nearest Neighbor, Single and Multi-Layer Perceptrons (MLP), Radial-Basis Functions, and Mixture of Experts. Then, we apply them to the classification of speech phonemes. The statistical methods are likelihood-based, whereas neural network methods are trained discriminatively. HMMs model the speech as a temporal signal whereas the other methods map time to space using time-delay and represent the whole signal as one vector. Therefore, the input to the latter systems should be of a fixed length. To make use of the discriminative power of MLPs for classification of variable length phonemes, we propose Variable Sized Input Multi Layer Perceptrons (VSIMLP), which is composed of a set of special-type MLPs. The database used for the review part contains instances from six closely pronounced Japanese phonemes, /b,d,g,m,n,N/. We test VSIMLP on the ITMIT 39-class phoneme problem. We conclude that focusing on the localities is an important issue in phoneme recognition because of the high dimensionality of input space and the nature of speech synthesis. We also conclude that VSIMLP is a promising and extendable technique for phoneme classification. The idea of VSIMLP can be applied to local models with better feature vectors, and it can be used with HMM as a hybrid method to classify words or sentences.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/79563

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/embargoedAccess