Bandit classes¶
Bandit agents that implement various strategies.
-
class
bandit.bandit.
BaseBandit
(environment: bandit.environment.Environment, values: List[float] = None)[source]¶ Base class for all bandit agents.
-
action
(i: int = None) → float[source]¶ Take an action. :param i: action to take :type i: int
Returns: (float) reward of the taken action
-
update_history_and_values
(choice: int, reward: Union[float, int]) → None[source]¶ Update the histories and the value estimates. This base class assumes a sample mean estimate for the values. Different strategies require overwriting this function.
Parameters: - choice (int) – choiec of action taken
- reward (Union[float, int]) – reward recieved
-
-
class
bandit.bandit.
CustomBandit
(environment: bandit.environment.Environment, values: List[float] = None)[source]¶ Wrapper around the BaseBandit for creating custom bandit subclasses.
-
class
bandit.bandit.
EpsGreedyBandit
(environment: bandit.environment.Environment, eps: float, values: List[float] = None)[source]¶ Epsilon-Greedy bandit, that makes a random choice 100*episilon percent of the time for exploration and acts greedily the rest of the time.
Parameters: eps (float) – fraction of time taking exploratory actions