BlochSphereV0
Implementation of BlochSphereV0 environment
Author: Jay Shah (@Jayshah25)
Contact: jay.shah@qrlqai.com
License: Apache-2.0
- class qrl.env.core.bloch_sphere.BlochSphereV0(*args: Any, **kwargs: Any)[source]
Bases:
QuantumEnvSingle-qubit Bloch sphere environment for reinforcement learning.
BlochSphereV0is agymnasium.Env-compatible environment where an agent controls a single qubit via a discrete set of quantum gates. The qubit state is represented internally as a statevector and exposed to the agent as a 3D Bloch vector(x, y, z).The objective is to steer the qubit from the fixed initial state
|0⟩to a target pure state (default|+⟩) within a limited number of steps by applying unitary gate actions.Key details
Action space: Discrete set of single-qubit gates (Clifford + common rotations).
Observation space: Bloch vector
(x, y, z), each component in[-1, 1].Reward: Fidelity
|⟨target | state⟩|²in[0, 1].Termination: Success when reward exceeds
reward_toleranceor truncation
at
max_steps.Rendering
The
render()method visualizes the Bloch sphere and the agent’s trajectory, showing the current state and target state as arrows in 3D.Input Parameters
target_state: Target pure state as a Numpy complex 2-vector, defaults to
|+⟩.max_steps: Maximum number of steps per episode.
reward_tolerance: Fidelity threshold for successful termination.
ffmpeg: If set to True, animations are saved as mp4 videos, else as GIFs. Default is False.
See also
tutorials/bloch_sphere
- get_reward(action)[source]
Apply a quantum gate action and compute the resulting reward.
This method evolves the internal qubit state by applying the unitary corresponding to the selected action and evaluates the fidelity with respect to the target state.
- Parameters:
action (int) – Index of the selected action in
self.actions.- Returns:
Fidelity between the current state and the target state, defined as
|⟨target | state⟩|²and bounded in[0, 1].- Return type:
float
- render(save_path_without_extension=None, interval=800)[source]
Render the Bloch sphere trajectory as a 3D animation.
The visualization shows: - A translucent Bloch sphere with labeled basis states, - The target Bloch vector (green, static), - The evolving qubit state trajectory (red, dynamic).
- Parameters:
save_path_without_extension (str or None, optional) – Path (without file extension) to save the animation. If provided, the animation is saved using the configured writer (MP4 for FFmpeg or GIF for Pillow). If None, the animation is displayed interactively.
interval (int, optional) – Delay between animation frames in milliseconds. Default is 800.
- Returns:
This method produces a visualization but does not return a value.
- Return type:
None
- reset()[source]
Reset the environment to the initial state.
The qubit is initialized to the computational basis state |0⟩. Episode step count and history are cleared.
- Returns:
observation (np.ndarray) – Initial Bloch vector corresponding to |0⟩, shape
(3,).info (dict) – Empty dictionary provided for compatibility with Gymnasium API.
- step(action)[source]
Execute one environment step.
Applies the selected quantum gate, updates the internal state and history, computes the reward, and checks termination conditions.
- Parameters:
action (int) – Index of the selected action in
self.actions.- Returns:
observation (np.ndarray) – Updated Bloch vector of the qubit state, shape
(3,).reward (float) – Fidelity-based reward after applying the action.
done (bool) – True if the episode has terminated due to success or truncation.
info (dict) – Empty dictionary provided for compatibility with Gymnasium API.