Posse class

A gang of bandit agents for easily performing testing en masse.

class bandit.posse.Posse(environment: bandit.environment.Environment, bandit_class: Type[bandit.bandit.BaseBandit], n_bandits: int, **bandit_kwargs)[source]

A posse of bandits that all sample the same environment for the same number of steps.

Parameters:
  • environment (Environment) – the environment that the bandits sample
  • bandit_class (Type[BaseBandit]) – the kind of bandit to create
  • n_bandits (int) – the number of bandits to create
  • bandit_kwargs (dict) – dictionary of arguments to pass to the bandits
mean_best_choice(best_choice: Union[int, List[T], numpy.ndarray]) → numpy.ndarray[source]

Average of the best choice at each time computed over all bandits.

Parameters:best_choice (Union[int, List[int], np.ndarray]) – if int, the best choice for all times. If list of np.ndarray then the best choice at each time step.
mean_reward() → numpy.ndarray[source]

Average reward at each time computed over all bandits.

take_actions(n_actions: int) → None[source]

Take n_actions actions for each bandit in the posse.

Parameters:n_actions (int) – number of actions to take
var_best_choice(best_choice: Union[int, List[T], numpy.ndarray]) → numpy.ndarray[source]

Average of the best choice at each time computed over all bandits.

Parameters:best_choice (Union[int, List[int], np.ndarray]) – if int, the best choice for all times. If list of np.ndarray then the best choice at each time step.
var_reward() → numpy.ndarray[source]

Variance at each time of the reward computed over all bandits.