Co-training using prosodic, lexical and morphological information for automatic sentence segmentation of Turkish spoken language

Dalva, Doğan

dc.contributor.advisor	Güz, Ümit
dc.contributor.advisor	Gürkan, Hakan
dc.contributor.author	Dalva, Doğan
dc.date.accessioned	2020-12-04T17:17:24Z
dc.date.available	2020-12-04T17:17:24Z
dc.date.submitted	2018
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/93803
dc.description.abstract	Cümle bölütleme işlevi, standart Otomatik Konuşma Tanıma (OKT) sistemlerinin çıkışından elde edilen işlenmemiş kelime dizisi biçimindeki veriyi cümlelere ayırarak zenginleştirmeyi amaçlayan bir işlemdir. Cümle bölütleme; çözümleme, makine çevrimi, bilgi çıkarımı gibi cümle bölütlemenin yapıldığının varsayıldığı konuşma işlemenin daha ileri uygulamaları için bir ön adım olarak gerçekleştirilmektedir.Cümle bölütlemede kullanılan standart yöntemler, model eğitimi aşamasında oldukça fazla etiketlenmiş veriye ihtiyaç duyar. El ile yapılan veri etiketleme işlemi; emek, dikkat ve zaman isteyen bir işlemdir. Bu çalışmada çok bakışlı yarı öğreticili yöntemler geliştirerek, daha az el ile etiketlenmiş veri ile standart yöntemlere göre daha yüksek başarımın sağlanması hedeflenmektedir.Bu çalışmada çok bakışlı yarı öğreticili yöntemler geliştirerek, daha az el ile etiketlenmiş veri ile standart yöntemlere göre daha yüksek başarımın sağlanması hedeflenmektedir. Bu çalışmada sözcüksel, biçimbilgisel ve prozodik özellikleri kullanan, uzlaşma (agreement), uzlaşamama (disagreement) ve self-combined yöntemleri ile beraber çalışan yeni üç bakışlı eş eğitim (co-training) ve kurul tabanlı (committee-based) yöntemler geliştirildi. Yeni yöntemlerin performansları, iki bakışlı eş eğitim yöntemleri, kendi kendini eğitme (self-training) yöntemi ve standart yöntemler ile kıyaslandı. Deneysel sonuçlar, veri kümeleri yeterli ve ayrık özellik grupları kullanılarak ifade edilebildiği için, önerilen yöntemlerin cümle bölütleme başarımını oldukça arttırdığı göstermektedir.
dc.description.abstract	Sentence segmentation of speech aims to detect sentence boundaries in a stream of words output by the speech recognizer. Sentence segmentation is a preliminary step toward speech understanding. It is of particular importance for speech related applications, as most of the further processing steps; such as parsing, machine translation, and information extraction, assume the presence of sentence boundaries.Typically, statistical methods require a huge amount of manually labeled data, which is time and labor consuming process to prepare. In this work, novel multiview semi-supervised learning strategies for the solution of sentence segmentation problem are proposed.The aim of this work is to nd eective semi-supervised machine learning strategies when only a small set of sentence boundary labeled data is available. This work proposes three-view co-training and committee-based strategies incorporating with the agreement, disagreement and self-combined strategies using lexical, morphological and prosodic information, and investigates the performance of the proposed learning strategies against the baseline, self-training, and co-training. The experimental results show that the proposed learning strategies highly improve the sentence segmentation problem since data sets can be represented by three redundantly sucient and disjoint feature sets.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Elektrik ve Elektronik Mühendisliği	tr_TR
dc.subject	Electrical and Electronics Engineering	en_US
dc.title	Co-training using prosodic, lexical and morphological information for automatic sentence segmentation of Turkish spoken language
dc.title.alternative	Bürünsel, sözcüksel ve biçimbilgisel bilgiyi kullanan eş-eğitim ile Türkçe konuşma dilinin otomatik cümle bölütlemesi
dc.type	doctoralThesis
dc.date.updated	2018-08-06
dc.contributor.department	Elektronik Mühendisliği Anabilim Dalı
dc.subject.ytm	Morphology
dc.subject.ytm	Artificial intelligence
dc.identifier.yokid	10180990
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	IŞIK ÜNİVERSİTESİ
dc.identifier.thesisid	489395
dc.description.pages	148
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10180990.pdf
Size:: 3.668Mb
Format:: PDF
Description:: File_10180990

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess