Unsupervised active learning for video annotation

Demir, Emre

dc.contributor.advisor	Çataltepe, Zehra
dc.contributor.author	Demir, Emre
dc.date.accessioned	2020-12-07T10:06:58Z
dc.date.available	2020-12-07T10:06:58Z
dc.date.submitted	2015
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/128871
dc.description.abstract	Aktif Ögrenme, yarı-gözetimli makine ögrenmesi yöntemlerinden birisidir. Özellikle, fazla sayıda etiketlenmemis ̧ verinin veya çok az sayıda etiketlenmis ̧ verinin oldugu durumlarda kullanılır. Böyle verileri etiketlemek oldukça maliyetlidir. Günümüzde, video kayıt teknolojilerinin analog sistemlerden sayısal sistemlere geçmis ̧tir ve video kayıt cihazları çes ̧itli profildeki kullanıcılar tarafından yaygın olarak kullanılmaktadır. Büyük veya ag ̆ yapısındaki bir video verisinin etiketlenmesi ve sınıflandırılması konusu da aktif ögrenmenin ilgi alanları arasındadır. Videoların etiketlenmesi büyük bir video verisini dizinlemek ve bu veriler arasında arama yapmak için kullanılmaktadır. Elle etiketleme ve otomatik etiketleme olmak üzere ̇Iki ana video etiketleme teknig ̆i bulunmaktadır.Elle etiketleme teknig ̆inde, videolar insanlar tarafından izlenir ve tek tek etiketlenir. Öte yanda, otomatik etiketleme teknig ̆ inde bilgisayımsal yöntemlerle videolar etiketlendirilir. Böylesine muazzam sayıdaki videoların insanlarca elle etiketlenmesi hem is ̧ gücü hem de zaman açısından oldukça maliyetlidir.Bu tez, 3M veriler için etiketleme yapısı sunan bir çerçeve tasarlamayı amaçlayan CAMOMILE projesinin bir alt projesidir. 3M'deki M harfi ̇Ingilizce'deki multimodal (çok tipli), multimedia (çok ortamlı) ve multilingual (çok dilli) kelimelerinden gelmektedir. Çalıs ̧mamızda, kümeleme tabanlı gözetimsiz aktif ög ̆renme yaklas ̧ımıyla, videolardaki insanların kimliklerini belirlemek amacıyla olus ̧turulmus ̧ REPERE video veritabanı üzerinde çalıs ̧acak bir seçim yöntemi öneriyoruz.
dc.description.abstract	When annotating complex multimedia data like videos, a human expert usually annotates them manually. Even tough manual annotation achieves accurate results, it is a labor-intensive and time-consuming process. On the other hand, computational methods can annotate mass video data for indexing and searching with any or almost no help from human experts effortlessly and faster but they are probably more error prone solutions. The tradeoff between the costs in terms of labor, time and accuracy reveals Active Learning as a natural outcome. Active learning is one of the semi-supervised machine learning methods that benefits from the strongest properties of both manual and computational methods. In an active learning cycle, a learner algorithm discovers the underlying patterns in data and queries the human experts interactively for some informative decision points. It is used when labeled instances are insufficient and acquiring new labels is expensive or especially when unlabeled instances are abundant. In this study, we introduce an unsupervised active learning cycle structure in a flow, which includes clustering, stable matching between the created clusters, various unsupervised selection strategies for selecting the most uncertain and the most certain instances and querying the human annotators. We propose two new cluster selection methods, namely Most Disagreement Selection (MDS) and Hybrid Set Selection (HS), which is a hybrid of MDS and Big Cluster First [2] methods. For MDS and HS, we adopt the `Stable Marriage Problem` solution, in which a stable marriage problem is transformed into a cluster matching problem. We work on REPERE [1] video dataset, which is created for the problem of person identification in videos. Our study aims to identify who is speaking and who is on screen by using multi-modal data. We have evaluated the performance of selection strategies over active learning cycles using multimodality on 28 videos from 7 different TV programs. Each video has three different similarity matrices namely face-to-face, speech-to-speech and face-to-speech. We have run four experiments with regard to matrices in this order: face score for face track annotation, face score for speaker track annotation, speaker score for speaker track annotation and speaker score for speaker annotation.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	tr_TR
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Unsupervised active learning for video annotation
dc.title.alternative	Video etiketleme için denetimsiz aktif öğrenme
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Bilgisayar Bilimleri Anabilim Dalı
dc.identifier.yokid	10079179
dc.publisher.institute	Bilişim Enstitüsü
dc.publisher.university	İSTANBUL TEKNİK ÜNİVERSİTESİ
dc.identifier.thesisid	392928
dc.description.pages	132
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10079179.pdf
Size:: 21.19Mb
Format:: PDF
Description:: File_10079179

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess