Base Iteration
Shared model-based agent base for tabular RL (discrete MDPs).
- class qrl.algorithms._base.BaseIteration(env, gamma=0.9, num_test_episodes=20, device=None, dtype=torch.float32)[source]
Bases:
objectShared base class for tabular model-based RL agents (Value Iteration, QValueIteration).
Maintains empirical estimates of the transition probability P(s’|s,a) and mean reward R(s,a,s’) from environment interaction. Subclasses implement the specific Bellman update and action-selection strategy.
- Parameters:
env (gym.Env) – A Gymnasium or qrl-qai environment with discrete observation and action spaces.
gamma (float) – Discount factor in [0, 1).
num_test_episodes (int) – Number of episodes used for evaluation (informational; used by training loops).
device (torch.device, optional) – Compute device. Defaults to CUDA if available, else CPU.
dtype (torch.dtype, optional) – Floating-point dtype for all tensors. Defaults to float32.
- play_episode(env)[source]
Run one full episode with the current policy, updating the model on-the-fly.
- Parameters:
env (gym.Env) – A separate environment instance to avoid interfering with self.env.
- Returns:
Total undiscounted reward accumulated over the episode.
- Return type:
float