Bandit classes¶

Bandit agents that implement various strategies.

class bandit.bandit.BaseBandit(environment: bandit.environment.Environment, values: List[float] = None)[source]¶

Base class for all bandit agents.

action(i: int = None) → float[source]¶

Take an action. :param i: action to take :type i: int

Returns:	(float) reward of the taken action

update_history_and_values(choice: int, reward: Union[float, int]) → None[source]¶

Update the histories and the value estimates. This base class assumes a sample mean estimate for the values. Different strategies require overwriting this function.

Parameters:	choice (int) – choiec of action taken reward (Union[float, int]) – reward recieved

class bandit.bandit.CustomBandit(environment: bandit.environment.Environment, values: List[float] = None)[source]¶: Wrapper around the BaseBandit for creating custom bandit subclasses.

class bandit.bandit.EpsGreedyBandit(environment: bandit.environment.Environment, eps: float, values: List[float] = None)[source]¶

Epsilon-Greedy bandit, that makes a random choice 100*episilon percent of the time for exploration and acts greedily the rest of the time.

Parameters:	eps (float) – fraction of time taking exploratory actions

choose_action(*args, **kwargs) → int[source]¶

Choose a random action 100*self.eps percent of the time and otherwise take greedy actions.

Returns:	(int) action choice

class bandit.bandit.GreedyBandit(environment: bandit.environment.Environment, values: List[float] = None)[source]¶

Greedy bandit that always selects the optimally valued action.

choose_action(*args, **kwargs) → int[source]¶

Choose the action with the highest value. In case of any ties, return a random selection.

Returns:	(int) action choice

class bandit.bandit.RandomBandit(environment: bandit.environment.Environment, values: List[float] = None)[source]¶

A totally random bandit with no strategy. Actions are selected randomly.

choose_action(*args, **kwargs) → int[source]¶

Choose a random action.

Returns:	(int) action choice