Bandit classes

Bandit agents that implement various strategies.

class bandit.bandit.BaseBandit(environment: bandit.environment.Environment, values: List[float] = None)[source]

Base class for all bandit agents.

action(i: int = None) → float[source]

Take an action. :param i: action to take :type i: int

Returns:(float) reward of the taken action
update_history_and_values(choice: int, reward: Union[float, int]) → None[source]

Update the histories and the value estimates. This base class assumes a sample mean estimate for the values. Different strategies require overwriting this function.

Parameters:
  • choice (int) – choiec of action taken
  • reward (Union[float, int]) – reward recieved
class bandit.bandit.CustomBandit(environment: bandit.environment.Environment, values: List[float] = None)[source]

Wrapper around the BaseBandit for creating custom bandit subclasses.

class bandit.bandit.EpsGreedyBandit(environment: bandit.environment.Environment, eps: float, values: List[float] = None)[source]

Epsilon-Greedy bandit, that makes a random choice 100*episilon percent of the time for exploration and acts greedily the rest of the time.

Parameters:eps (float) – fraction of time taking exploratory actions
choose_action(*args, **kwargs) → int[source]

Choose a random action 100*self.eps percent of the time and otherwise take greedy actions.

Returns:(int) action choice
class bandit.bandit.GreedyBandit(environment: bandit.environment.Environment, values: List[float] = None)[source]

Greedy bandit that always selects the optimally valued action.

choose_action(*args, **kwargs) → int[source]

Choose the action with the highest value. In case of any ties, return a random selection.

Returns:(int) action choice
class bandit.bandit.RandomBandit(environment: bandit.environment.Environment, values: List[float] = None)[source]

A totally random bandit with no strategy. Actions are selected randomly.

choose_action(*args, **kwargs) → int[source]

Choose a random action.

Returns:(int) action choice