Uçtan-uca konuşma tanıma modeli: Türkçe`deki deneyler

Asefisaray, Behnam

dc.contributor.advisor	Sever, Hayri
dc.contributor.advisor	Mengüşoğlu, Erhan
dc.contributor.author	Asefisaray, Behnam
dc.date.accessioned	2020-12-30T06:31:49Z
dc.date.available	2020-12-30T06:31:49Z
dc.date.submitted	2018
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/474180
dc.description.abstract	Okunuş sözlüğü ve saklı markov modeli (Hidden Markov Model - HMM) yıllardır konuşma tanıma sistemlerinin en önemli iki parçası olarak bilinmekteler. HMM'ler çıktı olarak ürettikleri fonemler arasında bağımsızlık varsayımında bulunup, sözlükteki kelimelerin okunuşunu el yordamı ile oluşturmak da oldukça zaman alıcı bir süreçtir. Ayrıca bu modellerin eğitimi de birbirinden bağımsız yapılıp, bir modeldeki iyileşme her zaman konuşma tanıma sisteminin hata oranını düşürmemektedir. Son yıllarda, bağlantıcı zamansal sınıflandırma (Connectionist Temporal Classification - CTC) yöntemi bu sorunu kısmen çözmüş olup akustik model ile okunuş modelinin birlikte eğitilebilmesini sağlamıştır. Ancak hem HMM hem de CTC çözümleri, karakter/kelime çıktıları arasında bağımsızlık varsayımında bulunup, gerek akustik gerekse okunuş açısından uzun bağımlılıkları modelleyememekteler. Bu nedenden dolayı da, HMM ve CTC tabanlı sistemler her zaman güçlü bir dil modeline ihtiyaç duyup, dil modeli kullanmadan bu sistemlerdeki kelime hata oranı oldukça yüksek çıkmaktadır. Bu tezde, HMM tabanlı sistemlerin yapısı incelenip bu modellerin getirdiği kısıtlamalar anlatılmıştır. Odaklanma mekanizması (Attention Mechanism) ile çalışan bir tekrarlanan sinir ağı (Recurrent Neural Network - RNN) direkt sesi yazıya çevirmek için eğitilip, yukarıdaki kısıtlamalar ve bağımsızlıklar olmadan Türkçe konuşma tanıma sisteminin yapısı verilmiştir. Kullanılan bu model, uçtan uca eğitilip konuşma tanıma sisteminin içerisinde bulunması gereken okunuş sözlüğü, dil modeli ve akustik model tek bir model kapsamında eğitilmiştir. Bu sayede, farklı modellerin birbirinden bağımsız olarak eğitilmesine gerek kalmayıp nihai sonucu iyileştirecek ve bütün bağımlılıkları göz önünde bulundurabilecek bir model tasarımı ve eğitimi yapılmıştır. Transfer öğrenme yöntemi kullanarak uçtan uca bir konuşma tanıma modeli daha az veriyle eğitilip yeterince iyi bir model elde edilmiştir.
dc.description.abstract	For decades, the main components of Automatic Speech Recognition (ASR) systems have been pronunciation dictionary and Hidden Markov Models (HMMs). HMMs assume conditional independence between its output and creating the pronunciation dictionary have a tedious and time consuming process. Additionally, training each of these models are independent with each other and there especially exists a disconnect between acoustic model accuracy and word error rate (Word Error Rate) of automatic speech recognition. Connectionist Temporal Classification (CTC) character models attempts to solve some of these issues by jointly learning the pronunciation and acoustic model as a single model. However, both HMM and CTC models suffer from conditional independence assumption and rely heavily on a large enough language model during decoding. In this thesis, we investigate the traditional paradigm of ASR and focus the limitations of HMM and CTC base speech recognition models. We propose an approach to ASR with neural attention mechanism models and we directly optimize speech transcriptions error rate in Turkish. The end-to-end recurrent neural network model jointly learns all the main components of a speech recognition system: the pronunciation dictionary, language model and acoustic model. We used transfer learning in our end-to-end architecture in order to training a good enough acoustic model using limited amount of transcribed speech data.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Uçtan-uca konuşma tanıma modeli: Türkçe`deki deneyler
dc.title.alternative	End-to-end speech recognition model: Experiments in Turkish
dc.type	doctoralThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Mühendisliği Anabilim Dalı
dc.identifier.yokid	10178143
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	HACETTEPE ÜNİVERSİTESİ
dc.identifier.thesisid	493886
dc.description.pages	126
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10178143.pdf
Size:: 5.335Mb
Format:: PDF
Description:: File_10178143

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/embargoedAccess