This is a past event.
Title: Contextual Bandits with Episodic Feedback
Abstract: We consider the episodic bandit setting, a variant of the contextual bandit setting operating over an episodic model. An episodic bandit may not directly observe the rewards for actions taken, instead the bandit receives an aggregate reward for all the actions taken during an episode. Episodic bandits naturally describe problems wherein individual rewards for actions taken are difficult or impossible to directly observe or approximate, yet the total reward for an episode is readily available. We consider one such application in detail, the selective switching problem. In this paper we formally define the episodic bandit setting and describe episodic reward variants of classic multi-armed bandit algorithms including epsilon-greedy, Boltzmann exploration, and UCB. We show experimentally the properties and performance characteristics of the algorithmic variants and suggest possible avenues for exploring this setting further.
0 people added