Farklı dillerdeki belgelerin benzerliğinin tespiti

Yilmazer, Hakan

dc.contributor.advisor	Yetgin, Zeki
dc.contributor.author	Yilmazer, Hakan
dc.date.accessioned	2020-12-29T06:28:16Z
dc.date.available	2020-12-29T06:28:16Z
dc.date.submitted	2013
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/335771
dc.description.abstract	Teknolojinin gelişmesi ile birlikte internet kullanımı ve web dokümanlarının boyutu paralel olarak artmaktadır. Dünyanın farklı coğrafyalarında, farklı dillerde internet ortamında dijital olarak paylaşılan dokümanlar çoğalmaktadır. İnternetin bu devasa bilgi kapasitesinde yer alan dokümanlar elbette tekil değildir.Farklı dillerdeki bu dijital dokümanların bazıları benzer içeriğe sahip olabilirken, bir kısmı diğerinin alıntısı veya özeti, bir kısmı ise orijinal dökümanın birebir çevirisi olarak yer alabilmektedir. Birçok dokümanın orijinal dilinde kopyaları, alıntıları ve benzerleri, gerek başka dillerde tercümeleri mevcuttur. Bilginin bu kadar önemli olduğu çağda, aranılan metin veya belgelerin farklı dillerdeki mevcudiyetleri bilgiye erişimi kolaylaştıracaktır. Bir dilde yazılmış belgenin başka dillerde doğru çevrimlerini ve alıntılarını hızlı bir şekilde bulmak, araştırmacılar açısından da faydalı olacaktır. Bunun yanında, aynı metnin farklı dillerde bulunabiliyor olması bir akademik çalışmanın orijinal dilinin dışındaki başka dillerdeki intihallerinin bulunmasında da yardımcı olacaktır.Bu tez çalışmasında bir belgenin farklı dillerdeki anlamsal ve içerik olarak benzerlerinin bulunması için yeni algoritmalar geliştirilmesi amaçlanmıştır. Bu algoritmalarda, dökümanlar metin, kelime, harf benzerliğinin yerine sayısal vektörler olarak düzenlenmiştir. Ayrıca uzaklık ve benzerlik ölçümleri literatürde kullanılan farklı yöntemler ile test edilmiştir.
dc.description.abstract	The use of internet and size of the documents have been increasing in parallel with the development of technology. The documents in different languages, which are digitally shared on Internet, have risen on different geographies of world. The documents placed in the paramount information capacity of Internet are not single for certain.While some of these digital documents in different languages have similar contents, some of them may be citation or summary of the original or some parts of the document may be literal translation of the original. The copies, citations, and duplications of the documents in original languages as well as their translations in different languages are available. The existence of the searched text or documents in various languages make easy to access the information in this era when the information is very important. It will useful for the monolingual individuals to find the translations and citations of the document which is written in mono language. Further, it will be beneficial for researchers to find a mono-lingual document?s right citiations and translations in different languages easily. Beside, the presence of a text in difference languages will be helpful for the detection of plagiarism of an academic study in different languages other from the original language of the study.In this thesis study, novel algorithms are aimed to be developed in order to find out a documents?s similarities in different languages in terms of their semantics and contents. In these algorithms, documents are organized in feature vectors rather than text, word, and letter similarities. Moreover, distance metrics and similarity measures are tested with different state of the art methods that have been used in literature.	en_US
dc.language	Turkish
dc.language.iso	tr
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Farklı dillerdeki belgelerin benzerliğinin tespiti
dc.title.alternative	Diagnosis of similiarity of texts in different documents
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Mühendisliği Anabilim Dalı
dc.subject.ytm	Hierarchical clustering
dc.subject.ytm	Plagiarism
dc.subject.ytm	Feature vector
dc.identifier.yokid	10014077
dc.publisher.institute	Fen Bilimleri Enstitüsü
dc.publisher.university	MERSİN ÜNİVERSİTESİ
dc.identifier.thesisid	348472
dc.description.pages	150
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10014077.pdf
Size:: 4.908Mb
Format:: PDF
Description:: File_10014077

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess