Exploiting Easiness and Overcoming Delays in Online Learning

Research output: Book/ReportPh.D. thesisResearch

Standard

Exploiting Easiness and Overcoming Delays in Online Learning. / Thune, Tobias Sommer.

Department of Computer Science, Faculty of Science, University of Copenhagen, 2020.

Research output: Book/ReportPh.D. thesisResearch

Harvard

Thune, TS 2020, Exploiting Easiness and Overcoming Delays in Online Learning. Department of Computer Science, Faculty of Science, University of Copenhagen.

APA

Thune, T. S. (2020). Exploiting Easiness and Overcoming Delays in Online Learning. Department of Computer Science, Faculty of Science, University of Copenhagen.

Vancouver

Thune TS. Exploiting Easiness and Overcoming Delays in Online Learning. Department of Computer Science, Faculty of Science, University of Copenhagen, 2020.

Author

Thune, Tobias Sommer. / Exploiting Easiness and Overcoming Delays in Online Learning. Department of Computer Science, Faculty of Science, University of Copenhagen, 2020.

Bibtex

@phdthesis{efb08abf82be456a95d7405001a26eef,
title = "Exploiting Easiness and Overcoming Delays in Online Learning",
abstract = "In machine learning we work towards building algorithms that can solve complex tasks by learning how to solve them, rather than knowing how to solve them by design. Online learning is the subfield focusing on simultaneous execution and learning — that is learning while a task is “live” or online. Imagine a medical trial, where we want to identify the best drug for some illness. Instead of setting aside a portion of patients for testing, we might be able to cure more people by considering all patients as an online task and optimise the total number we cure. An algorithm in this scenarios must balance on one hand being adventurous and exploring the options in order to sufficiently gather knowledge of the task, with choosing what seems to be the best option in order to be performant on the other.Using the theoretical framework of “multi-armed bandits”, we explore two variations of online learning scenarios:We construct an algorithm capable of performing better if the task has a certain structure making it easier. This is possible for two kinds of structure simultaneously without having knowledge about the setting, and while remaining robust to harder settings.Secondly we explore how to deal with the feedback from the algorithm{\textquoteright} s actions being delayed. We expand prior approaches to the case where the delay might vary in time. Here we develop a new technique of skipping feedback if it is excessively delayed and prove a conjecture of the potential performance for this algorithm. In addition we show that in such problems our algorithms perform much better than what was previously thought possible, and design examples of tasks where this is the case.",
author = "Thune, {Tobias Sommer}",
year = "2020",
language = "English",
publisher = "Department of Computer Science, Faculty of Science, University of Copenhagen",

}

RIS

TY - BOOK

T1 - Exploiting Easiness and Overcoming Delays in Online Learning

AU - Thune, Tobias Sommer

PY - 2020

Y1 - 2020

N2 - In machine learning we work towards building algorithms that can solve complex tasks by learning how to solve them, rather than knowing how to solve them by design. Online learning is the subfield focusing on simultaneous execution and learning — that is learning while a task is “live” or online. Imagine a medical trial, where we want to identify the best drug for some illness. Instead of setting aside a portion of patients for testing, we might be able to cure more people by considering all patients as an online task and optimise the total number we cure. An algorithm in this scenarios must balance on one hand being adventurous and exploring the options in order to sufficiently gather knowledge of the task, with choosing what seems to be the best option in order to be performant on the other.Using the theoretical framework of “multi-armed bandits”, we explore two variations of online learning scenarios:We construct an algorithm capable of performing better if the task has a certain structure making it easier. This is possible for two kinds of structure simultaneously without having knowledge about the setting, and while remaining robust to harder settings.Secondly we explore how to deal with the feedback from the algorithm’ s actions being delayed. We expand prior approaches to the case where the delay might vary in time. Here we develop a new technique of skipping feedback if it is excessively delayed and prove a conjecture of the potential performance for this algorithm. In addition we show that in such problems our algorithms perform much better than what was previously thought possible, and design examples of tasks where this is the case.

AB - In machine learning we work towards building algorithms that can solve complex tasks by learning how to solve them, rather than knowing how to solve them by design. Online learning is the subfield focusing on simultaneous execution and learning — that is learning while a task is “live” or online. Imagine a medical trial, where we want to identify the best drug for some illness. Instead of setting aside a portion of patients for testing, we might be able to cure more people by considering all patients as an online task and optimise the total number we cure. An algorithm in this scenarios must balance on one hand being adventurous and exploring the options in order to sufficiently gather knowledge of the task, with choosing what seems to be the best option in order to be performant on the other.Using the theoretical framework of “multi-armed bandits”, we explore two variations of online learning scenarios:We construct an algorithm capable of performing better if the task has a certain structure making it easier. This is possible for two kinds of structure simultaneously without having knowledge about the setting, and while remaining robust to harder settings.Secondly we explore how to deal with the feedback from the algorithm’ s actions being delayed. We expand prior approaches to the case where the delay might vary in time. Here we develop a new technique of skipping feedback if it is excessively delayed and prove a conjecture of the potential performance for this algorithm. In addition we show that in such problems our algorithms perform much better than what was previously thought possible, and design examples of tasks where this is the case.

M3 - Ph.D. thesis

BT - Exploiting Easiness and Overcoming Delays in Online Learning

PB - Department of Computer Science, Faculty of Science, University of Copenhagen

ER -

ID: 244237213