A Prosodic Turkish text-to-speech synthesizer

Vural, Esra

View/Open

File_144880 (2.553Mb)

Date

2003

Author

Vural, Esra

Metadata

Show full item record

Abstract

TÜRKÇE İÇİN VURGULU METİNDEN SES SENTEZLEYICISI Özet Metinden ses sentezleyicisi sistemlerinde doğallık kaliteli bir ses dalgası elde edilmesinde çok önemli bir rol oynar. Ses dalgasının doğallığı fonetik kapsama ve vurgusal özellikler olan perde frekans eğrisi ve süre bilgileriyle ilişkilidir. Süre bilgisi sentezlenen fonemin zaman bilgisini belirler, perde frekans eğrisi ise ses dalgasının temel frekans özelliklerini kapsar. Bu tezde, Festival ses sentezleme sistemi kullanılarak, Türkçe için vurgulu metinden ses sentezleyicisi geliştirilmiştir [31]. Yeni bir erkek sesi, Türkçedeki alofonları kap sayarak, temel frekans ve süre bilgileri kullanılarak oluşturulmuştur. Alofonların süresi ve kelime vurgusu geniş çapta çalışılmıştır. Cümle vurgusu ve kelime öbek vurgusu daha az detaylı olarak çalışılmıştır. Tüm alofon kombinasyonları için taşıyıcı kelimeler oluşturulmuştur. 1680 tane taşıyıcı kelime ses yalıtımlı bir kayıt stüdyosunda kaydedilmiştir. LPC ve RES parametreleri hesaplanmıştır. Kısaltmalar ve sayılar için metni normalize eden bir modül geliştirilmiştir. Alofonlar için süre bilgisi girilmiştir. Cümle ve kelime se viyelerinde F0 üretim modülleri geliştirilmiştir. Fonem sayısını arttırarak ve vurgu yaratarak Türkçe için daha doğal bir metinden ses sentezleyici sistem elde edilmiştir.

A PROSODIC TURKISH TEXT-TO-SPEECH SYNTHESIZER Abstract Naturalness in Text-to-Speech systems is very important in achieving high qual ity waveform. The naturalness of the waveform is highly correlated with phonetic coverage and prosodic features such as, duration and FO contour. Duration de termines the timing for the synthesized phoneme, whereas FO contour determines fundamental frequency component of the waveform. This thesis presents the development of a prosodic Text-to-Speech System for Turkish Language using the Festival Tool [31]. We describe a complete realization of a new male voice, covering allophones of Turkish using duration and FO parameters. The duration of the allophones and the word stress have been studied extensively. Sentence stress and phrasal stress are also discussed by in less detail. Carrier words are designed approximately for all allophone-allophone combina tions. 1680 carrier words are recorded in a sound-proof recording studio. LPC (linear predictive coding) and RES (residual) parameters are computed. The text normalisation module is implemented for abbreviations and numbers. Durations for the allophones are entered. Sentence level and word level FO generation modules are implemented. By increasing the number of phonemes and giving prosody we obtained a more natural sounding Text-to-Speech System for Turkish Language.

URI

https://acikbilim.yok.gov.tr/handle/20.500.12812/218057

Collections

TEZLER

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/embargoedAccess