Gerçek zamanlı yüksek kalitede ses tanıma

Çakir, Mert Yilmaz

View/Open

File_10176060 (2.509Mb)

Date

2017

Author

Çakir, Mert Yilmaz

Metadata

Show full item record

Abstract

Gelişen teknolojiyle birlikte insan-bilgisayar etkileşiminde birçok arayüz (etkileşim kurma şekilleri) oluşmuştur. Bu arayüzlerden biri de konuşma tanımadır. Konuşma tanıma, insan sesini aracılar olmadan bilgisayar tarafından okunabilecek bir forma çevirir. Böylelikle konuşma ile cihazları yönetme imkânı sağlanır. Sağladığı kolaylıkların kullanılma şekillerine göre değiştiği konuşma tanıma teknolojisi birçok uygulama alanına sahiptir. Bu alanlardan birisi olan konuşmanın yazıya çevrilmesi işlemi, geçmişten günümüze birçok çalışmaya konu olmuştur. Geleneksel çalışmalarda, belirli kişilerin konuşmalarının yazıya çevrilmesi hedeflenmiştir. Bu amaçlı uygulamalar konuşmacı bağımlı sistemlerdir. Fakat konuşmacı bağımlı sistemler, farklı konuşmaları, sisteme tanımlamadan başarılı olamamaktadır. Günümüzde ise akıllı cihazlar başta olmak üzere geliştirilen çoğu sistemler konuşmacı bağımsız olarak tasarlanmaktadır. Bu tezde dil ve konuşmacı bağımsız olarak konuşmaların, söz dizileriyle etiketlenerek gelişmesini hedefleyen sistem önerimi yapılmıştır. Etiketlenen konuşmalar ile bu alandaki araştırmalar için yenilikçi bir bakış açısı sayılabilecek dil bağımsız olarak gelişen metin kütüphanesi (corpus) tabanlı konuşma tanıma sistemi önerilmiştir. İlgilendiği konular kapsamında bu tez, sinyal işleme ve örüntü tanıma gibi farklı bilgisayar bilimlerinin kesişiminde yer almaktadır. Önerilen çalışmada nihai hedef, insanların akıllı cihazlarla etkili iletişim kurmaları için verimli teknikler ile başarısı yüksek gerçek zamanlı bir konuşma tanıma sistemi sunmaktır. Ayrıca bu tez kapsamında, konuşma tanıma alanında kullanılan teknikler karşılaştırılarak önerilen sistemin deneysel çalışması ve değerlendirilmesi yapılmıştır.

Along with evolving technology, many interfaces (forms of interaction) have occurred in human-computer interaction. One of these interfaces is speech recognition. Speech recognition translates human voice into a form that can be read by the computer without intermediaries. This way, one has the possibility to manage the devices by speaking. The speech recognition technology, which has many application areas, provides facilities that are differentiated according to the ways of use. The process of translating one's speech into one of these areas has been subject to many daily work from past to present. In traditional studies, it was aimed to translate the speeches of certain people into the text. Applications for this purpose are speaker dependent systems. However, speaker-dependent systems are not able to work out, without identifying different speeches to the system first. Nowadays, most of the systems developed, especially smart devices, are designed as speaker independent. In this thesis, a system proposal was made aiming to develop their speech independently from both the speaker and the language by labeling them with their syntax. The tagged speech has been proposed as a corpus-based speech recognition system, which can be considered as an innovative viewpoint for researches in this area. This thesis within the scope of the subjects it is concerned, is in the intersection of different computer sciences such as signal processing and pattern recognition. The ultimate goal in the proposed study is to provide a high level of real-time speech recognition system with efficient techniques for effective communication between humans and smart devices. In addition, in the scope of writing of this thesis, an experimental system is studied and evaluated by comparing the techniques which are used in the field of speech recognition.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/630788

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess