An asymptotically optimal solution for contextual bandit problem in adversarial setting

Mohaghegh Neyshabouri, Mohammadreza

dc.contributor.advisor	Kozat, Süleyman Serdar
dc.contributor.author	Mohaghegh Neyshabouri, Mohammadreza
dc.date.accessioned	2020-12-02T12:28:12Z
dc.date.available	2020-12-02T12:28:12Z
dc.date.submitted	2018
dc.date.issued	2018-08-06
dc.identifier.uri	https://acikbilim.yok.gov.tr/handle/20.500.12812/33537
dc.description.abstract	Bağlamsal çok silahlı haydut algoritması çerçevesinde sıralı öğrenme için çevrimiçi algoritmalar öneriyoruz. Yaklaşımımız, bağlam uzayını bölmek ve daha sonra, bölünen kısımları ve haydut kolları arasındaki olası tüm eşleştirmeleri değerlendirerek, bunları veri odaklı bir şekilde en uygun şekilde birleştirmektir. Bizim yaklaşımımızda, en iyi haritalamanın, en iyi kol seçim politikasını, rahat Lipschitz koşullarında istenen herhangi bir dereceye kadar tahmin edebileceğini gösteriyoruz. Bu nedenle algoritmalarımızı en uygun uyarlanır kombinasyona göre tasarlıyoruz ve en iyi haritalama performansının yanı sıra en iyi kol seçim politikasını asimptotik olarak gerçekleştiriyoruz. Bu en iyilemenin, aynı zamanda, çekişmeli ortamlarda bile sağlanması garanti altına alınmaktadır çünkü bağlamlar veya haydut kollarının hatası ile ilgili herhangi bir istatistiksel varsayıma dayanmıyoruz. Ayrıca, algoritmalarımız için, sözlüksel veya rasgele bir şekilde bölme ve ikili ağaçlar (ve diğer birkaç bölümleme örnekleri) gibi çeşitli hiyerarşik bölümleme yapılarında verimli uygulamalar tasarlıyoruz. Örneğin, ikili ağaç bölümlemesi durumunda, hesaplama karmaşıklığı, en iyi bölümdeki bölgelerin sayısında logaritmik olarak doğrusaldır. Sonuç olarak, son teknoloji ile kıyaslandığında, her tur başına ortalama kayıpta matematiksel olarak kanıtlanmış olan üst sınırları (en iyi kol seçim politikası) tanıtarak önemli performans iyileştirmeleri sağlamaktayız. Deneysel çalışmalarımız, haydut düzeninden gerçek ve sentetik verilere sahip çok sınıflı sınıflamaya kadar çeşitli senaryoları kapsamaktadır. Bu deneylerde, sunulan matematiksel garantileri ve hesaplanabilir ölçeklenebilirliği korurken, algoritmalarımızın en son teknolojilerden oldukça üstün olduğunu göstermekteyiz.
dc.description.abstract	We propose online algorithms for sequential learning in the contextual multi-armed bandit setting. Our approach is to partition the context space and then optimally combine all of the possible mappings between the partition regions and the set of bandit arms in a data driven manner. We show that in our approach, the best mapping is able to approximate the best arm selection policy to any desired degree under mild Lipschitz conditions. Therefore, we design our algorithms based on the optimal adaptive combination and asymptotically achieve the performance of the best mapping as well as the best arm selection policy. This optimality is also guaranteed to hold even in adversarial environments since we do not rely on any statistical assumptions regarding the contexts or the loss of the bandit arms. Moreover, we design efficient implementations for our algorithms in various hierarchical partitioning structures such as lexicographical or arbitrary position splitting and binary trees (and several other partitioning examples). For instance, in the case of binary tree partitioning, the computational complexity is only log-linear in the number of regions in the finest partition. In conclusion, we provide significant performance improvements by introducing upper bounds (w.r.t. the best arm selection policy) that are mathematically proven to vanish in the average loss per round sense at a faster rate compared to the state-of-the-art. Our experimental work extensively covers various scenarios ranging from bandit settings to multi-class classification with real and synthetic data. In these experiments, we show that our algorithms are highly superior over the state-of-the-art techniques while maintaining the introduced mathematical guarantees and a computationally decent scalability.	en_US
dc.language	English
dc.language.iso	en
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	Attribution 4.0 United States	tr_TR
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/
dc.subject	Elektrik ve Elektronik Mühendisliği	tr_TR
dc.subject	Electrical and Electronics Engineering	en_US
dc.title	An asymptotically optimal solution for contextual bandit problem in adversarial setting
dc.title.alternative	Çekişmeli ortamlarda bağlamsal haydut problemi için asimptotik olarak en uygun çözüm
dc.type	masterThesis
dc.date.updated	2018-08-06
dc.contributor.department	Elektrik-Elektronik Mühendisliği Anabilim Dalı
dc.identifier.yokid	10192096
dc.publisher.institute	Mühendislik ve Fen Bilimleri Enstitüsü
dc.publisher.university	İHSAN DOĞRAMACI BİLKENT ÜNİVERSİTESİ
dc.identifier.thesisid	498467
dc.description.pages	56
dc.publisher.discipline	Diğer

Files in this item

Name:: yokAcikBilim_10192096.pdf
Size:: 2.781Mb
Format:: PDF
Description:: File_10192096

View/Open

This item appears in the following Collection(s)

TEZLER

Show simple item record

Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess