Development of a software tool for optical text recognition for Turkish

Resko, Moiz

dc.contributor.advisor	Akın, Hüseyin Levent
dc.contributor.author	Resko, Moiz
dc.date.accessioned	2020-12-04T11:53:32Z
dc.date.available	2020-12-04T11:53:32Z
dc.date.submitted	1994
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/81430
dc.description.abstract	ÖZET Bu çalışma, Türkçe dokümanlara yönelik bir Doküman Analizi Sisteminin gereklerini ve öğelerini tanıtmaktadır. Literatürde böyle bir sistemin bazı kritik problemlerine ve yapılması gereken bölümlerine çözümler getirilmiştir. Bazı problemler için de, bu çalışmada yeni yaklaşımlar geliştirilrniştir. ikili görüntüler elde etmek amacıyla, gönümüz okuyucularında kullanılan geleneksel görüntü dosya tipleri mcelenmiştir. Görüntülerin herhangi bir tipte saklanmasından sonra, gri-tonhı görüntülerden, ikili görüntüler elde etmek işlemlerin ileriM bölümlerinde rahat çalışabilmek için önemlidir. Bir Doküman Analizi Sistemindeki önemli problemlerin arasında her görüntüde olabilen, gürültü veya eğim açısı gibi hataların anlaşılması ve düzeltilmesi gelir. Açıyı anlamak için çizgi yerleştirme metodu güzel bir yaklaşımdır; açı düzeltmek için ise basit matematiksel denklemler kullanılmıştır. Gürültü giderimi için birkaç yaklaşım tanıtıldıktan sonra Uzamsal Pürüz Giderme metodu en uygun bulunmuştur. Diğer bir problem, yazı içeren kısımların, grafik veya çevre çizgileri gibi diğer bölümlerden ayrılmasıdır. Siyah bölgelerin ayrılıp, bazı özellMerinin analiz edilmesi bu problemi çözmekte yardımcı olacaktır. Görüntüyü içindeki ayrı bölümlere dilimlendirmek için üç yöntem tanıtılmıştır; sınır takip etme, noktalara bağlı diliııüendirme, ve satırlara bağlı dilimlendirme. Sınır takip etme yöntemi literatürden alınmış, diğer iki yöntem ise bu çalışmada geh^tirilmiştir. Bölümlenmiş kısımlar daha sonra uygun bir şekilde tanımlanıp, bu tanımlardan gösterdikleri şekle göre sınıflandmlmahdırlar (Harf, sayı, v.b.). Tanımlama her bir bölümün tipik bazı özelliklerine göre yapılır. Bu özellikler, asıl karakter tanımlama bölümü olan sınıflandırmada kullanılırlar. Yazı tipine, harf boyuna, veya duruş şekline bağlı kalmayan bir sınıflama yapmak için yapay sinir ağlarında hata geri yayma yöntemi kulanurnıştır.
dc.description.abstract	IV ABSTRACT This study outlines the requirements and components of a Document Analysis System for Turkish texts. Several critical solutions are given in literature for some of the main problems. Here we present new approaches to solve other problems. In order to obtain binary images, the image formats of typical files used in conventional scanners are investigated. After storing the image in any format, binarization of the gray-scale images is crucial, in order to process documents easily. Among the most common topics of a Document Analysis System are detection and elimination of noises and skew angles that occur in almost every scanning environment For angle detection, line fitting is a good approach; to correct angles simple mathematical equations may be used. After discussing popular noise elimination approaches, spatial smoothing is found to best fit to this problem. The separation of the text part, from non-text shapes such as graphical parts or gridlines, is another problem. Separating black regions and analysing them as lines of text or graphics, helps to overcome this problem. For segmentation of the image into distinct symbol images, three methods are discussed: boundary following, pixel based segmentation, and line based segmentation. Boundary following algorithm is taken from literature whereas the other two methods are developed in this study. The segmented portion then must be well represented, and according to this representation, must be classified (letters, digits, etc.). The representation is done according to certain features of each segment. Then these features are used in the classification process which is the main character recognition part. In order to have a character recognition system not constrained by font, size, and orientation, artificial neural networks with Back Propagation learning algorithm is used.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/embargoedAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Development of a software tool for optical text recognition for Turkish
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Diğer
dc.subject.ytm	Text recognition
dc.subject.ytm	Computer programs
dc.subject.ytm	Computer softwares
dc.identifier.yokid	35314
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	BOĞAZİÇİ ÜNİVERSİTESİ
dc.identifier.thesisid	35314
dc.description.pages	143
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_35314.pdf
Size:: 5.026Mb
Format:: PDF
Description:: File_35314

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/embargoedAccess