Posse class¶

A gang of bandit agents for easily performing testing en masse.

class bandit.posse.Posse(environment: bandit.environment.Environment, bandit_class: Type[bandit.bandit.BaseBandit], n_bandits: int, **bandit_kwargs)[source]¶

A posse of bandits that all sample the same environment for the same number of steps.

Parameters:	environment (Environment) – the environment that the bandits sample bandit_class (Type[BaseBandit]) – the kind of bandit to create n_bandits (int) – the number of bandits to create bandit_kwargs (dict) – dictionary of arguments to pass to the bandits

mean_best_choice(best_choice: Union[int, List[T], numpy.ndarray]) → numpy.ndarray[source]¶

Average of the best choice at each time computed over all bandits.

Parameters:	best_choice (Union[int, List[int], np.ndarray]) – if int, the best choice for all times. If list of np.ndarray then the best choice at each time step.

mean_reward() → numpy.ndarray[source]¶: Average reward at each time computed over all bandits.

take_actions(n_actions: int) → None[source]¶

Take n_actions actions for each bandit in the posse.

Parameters:	n_actions (int) – number of actions to take

var_best_choice(best_choice: Union[int, List[T], numpy.ndarray]) → numpy.ndarray[source]¶

Average of the best choice at each time computed over all bandits.

Parameters:	best_choice (Union[int, List[int], np.ndarray]) – if int, the best choice for all times. If list of np.ndarray then the best choice at each time step.

var_reward() → numpy.ndarray[source]¶: Variance at each time of the reward computed over all bandits.