BlochSphereV1
Implementation of BlochSphereV0 environment
Author: Jay Shah (@Jayshah25)
Contact: jay.shah@qrlqai.com
License: Apache-2.0
- class qrl.env.core.bloch_sphere.BlochSphereV1(target_state=2, max_steps=10, reward_tolerance=0.99, ffmpeg=False)[source]
Bases:
QuantumEnvSingle-qubit Bloch sphere environment as a graph problem for reinforcement learning.
BlochSphereV1is agymnasium.Envcompatible environment where an agent controls a single qubit via a discrete set of quantum gates. The qubit state is exposed to the agent as an integer index corresponding to the discrete states |0⟩, |1⟩, |+⟩, |-⟩, |+i⟩, |-i⟩.The objective is to steer the qubit from the fixed starting initial state
|0⟩to a user defined target pure state (default|+⟩) within a limited number of steps by applying unitary gate actions.The environment is fully compatible with
ValueIterationandQValueIterationfromqrl.algorithms.Key details
Action space: Discrete set of single-qubit gates (H,X,Z,S).
Observation space: Integer index corresponding to the Discrete states |0⟩, |1⟩, |+⟩, |-⟩, |+i⟩, |-i⟩.
Reward: Fidelity
|⟨target | state⟩|²in[0, 1].Termination: Success when reward exceeds
reward_toleranceor truncation
at
max_steps.- param - target_state:
Target state index in [0, 5]. Defaults to 2 (|+⟩). The mapping is: 0 → |0⟩, 1 → |1⟩, 2 → |+⟩, 3 → |-⟩, 4 → |+i⟩, 5 → |-i⟩.
- type - target_state:
int, optional
- param - max_steps:
Maximum number of steps per episode. Default is 10.
- type - max_steps:
int, optional
- param - reward_tolerance:
Fidelity threshold for successful termination. Must be in (0, 1]. Default is 0.99.
- type - reward_tolerance:
float, optional
- param - ffmpeg:
If True, animations are saved as MP4 via ffmpeg, else as GIFs. Default is False.
- type - ffmpeg:
bool, optional
:raises - ValueError : If
target_stateis not in [0, 5].: :raises - ValueError : Ifreward_toleranceis not in (0, 1].: :raises - ValueError : Ifffmpeg=Truebut ffmpeg is not installed on the system.:- property bloch_vector: pennylane.numpy.ndarray
Current Bloch vector (x, y, z).
- get_reward()[source]
Compute the reward for the current state and update termination flags.
Evaluates the fidelity between the current statevector and the target statevector. Sets
self.terminatedif fidelity meets or exceedsreward_tolerance, andself.truncatedif the step limit is reached.- Returns:
1.0 if the current state matches the target within
reward_tolerance, 0.0 otherwise.- Return type:
float
- render(save_path_without_extension, interval=600, ffmpeg=False)[source]
Render accumulated graph frames as an animation and save to disk.
Assembles the list of graph snapshots captured by
_render_graph()into a single animation. Each frame corresponds to one call to_render_graph(), producing a visual record of the agent’s learning progression over episodes.- Parameters:
save_path_without_extension (str) – File path (without extension) where the animation will be saved. The appropriate extension (
.mp4or.gif) is appended automatically based on theffmpegargument.interval (int, optional) – Delay between frames in milliseconds. Default is 600.
ffmpeg (bool, optional) – If True, saves the animation as an MP4 using ffmpeg. If False, saves as a GIF using Pillow. Default is False.
- Raises:
ValueError – If
_render_graph()has not been called and no frames are available.- Returns:
This method produces an animation file but does not return a value.
- Return type:
None
- reset(*, seed=None, options=None)[source]
Reset the environment to the initial state.
The qubit is placed at state index 0 (|0⟩). Episode step count, history, and termination flags are cleared.
- Parameters:
seed (int or None, optional) – Random seed passed to the base
gymnasium.Envreset. Default is None.options (dict or None, optional) – Additional options passed to the base reset. Default is None.
- Returns:
observation (int) – Initial state index (always 0, corresponding to |0⟩).
info (dict) – Dictionary containing
fidelity,gate("reset"), andbloch_vectorof the initial state.
- property state_index: int
Current state index (0-5).
- step(action)[source]
Apply a gate action and advance the episode by one step.
Applies the unitary gate corresponding to
actionto the current statevector, updates the discrete state index via the transition table, increments the step counter, and appends the new state to history.- Parameters:
action (int) – Index into
ACTION_NAMESselecting the gate to apply. 0 → H, 1 → X, 2 → Z, 3 → S.- Returns:
observation (int) – New discrete state index after applying the gate.
reward (float) – 1.0 if the target is reached within tolerance, 0.0 otherwise.
terminated (bool) – True if fidelity ≥
reward_tolerance.truncated (bool) – True if
steps≥max_steps.info (dict) – Dictionary containing
fidelity,gatename applied, andbloch_vectorof the resulting state.
- static transition_table()[source]
Return the deterministic state-transition table for the environment.
Each entry
T[s, a]gives the next state index when actionais taken from states. Rows correspond to the 6 Bloch sphere states and columns to the 4 gate actions (H, X, Z, S).- Returns:
Integer array of shape
(6, 4)whereT[s, a] = s'.- Return type:
np.ndarray