qrl.env.core.bloch_sphere module

Implementation of BlochSphereV0 environment

Author: Jay Shah (@Jayshah25)

Contact: jay.shah@qrlqai.com

License: Apache-2.0

class qrl.env.core.bloch_sphere.BlochSphereV0(*args: Any, **kwargs: Any)[source]

Bases: QuantumEnv

Single-qubit Bloch sphere environment for reinforcement learning.

BlochSphereV0 is a gymnasium.Env-compatible environment where an agent controls a single qubit via a discrete set of quantum gates. The qubit state is represented internally as a statevector and exposed to the agent as a 3D Bloch vector (x, y, z).

The objective is to steer the qubit from the fixed initial state |0⟩ to a target pure state (default |+⟩) within a limited number of steps by applying unitary gate actions.

Key details

  • Action space: Discrete set of single-qubit gates (Clifford + common rotations).

  • Observation space: Bloch vector (x, y, z), each component in [-1, 1].

  • Reward: Fidelity |⟨target | state⟩|² in [0, 1].

  • Termination: Success when reward exceeds reward_tolerance or truncation

at max_steps.

Rendering

The render() method visualizes the Bloch sphere and the agent’s trajectory, showing the current state and target state as arrows in 3D.

Input Parameters

  • target_state: Target pure state as a Numpy complex 2-vector, defaults to |+⟩.

  • max_steps: Maximum number of steps per episode.

  • reward_tolerance: Fidelity threshold for successful termination.

  • ffmpeg: If set to True, animations are saved as mp4 videos, else as GIFs. Default is False.

See also

tutorials/bloch_sphere

get_reward(action)[source]

Apply a quantum gate action and compute the resulting reward.

This method evolves the internal qubit state by applying the unitary corresponding to the selected action and evaluates the fidelity with respect to the target state.

Parameters:

action (int) – Index of the selected action in self.actions.

Returns:

Fidelity between the current state and the target state, defined as |⟨target | state⟩|² and bounded in [0, 1].

Return type:

float

render(save_path_without_extension=None, interval=800)[source]

Render the Bloch sphere trajectory as a 3D animation.

The visualization shows: - A translucent Bloch sphere with labeled basis states, - The target Bloch vector (green, static), - The evolving qubit state trajectory (red, dynamic).

Parameters:
  • save_path_without_extension (str or None, optional) – Path (without file extension) to save the animation. If provided, the animation is saved using the configured writer (MP4 for FFmpeg or GIF for Pillow). If None, the animation is displayed interactively.

  • interval (int, optional) – Delay between animation frames in milliseconds. Default is 800.

Returns:

This method produces a visualization but does not return a value.

Return type:

None

reset()[source]

Reset the environment to the initial state.

The qubit is initialized to the computational basis state |0⟩. Episode step count and history are cleared.

Returns:

  • observation (np.ndarray) – Initial Bloch vector corresponding to |0⟩, shape (3,).

  • info (dict) – Empty dictionary provided for compatibility with Gymnasium API.

step(action)[source]

Execute one environment step.

Applies the selected quantum gate, updates the internal state and history, computes the reward, and checks termination conditions.

Parameters:

action (int) – Index of the selected action in self.actions.

Returns:

  • observation (np.ndarray) – Updated Bloch vector of the qubit state, shape (3,).

  • reward (float) – Fidelity-based reward after applying the action.

  • done (bool) – True if the episode has terminated due to success or truncation.

  • info (dict) – Empty dictionary provided for compatibility with Gymnasium API.

class qrl.env.core.bloch_sphere.BlochSphereV1(target_state=2, max_steps=10, reward_tolerance=0.99, ffmpeg=False)[source]

Bases: QuantumEnv

Single-qubit Bloch sphere environment as a graph problem for reinforcement learning.

BlochSphereV1 is a gymnasium.Env compatible environment where an agent controls a single qubit via a discrete set of quantum gates. The qubit state is exposed to the agent as an integer index corresponding to the discrete states |0⟩, |1⟩, |+⟩, |-⟩, |+i⟩, |-i⟩.

The objective is to steer the qubit from the fixed starting initial state |0⟩ to a user defined target pure state (default |+⟩) within a limited number of steps by applying unitary gate actions.

The environment is fully compatible with ValueIteration and QValueIteration from qrl.algorithms.

Key details

  • Action space: Discrete set of single-qubit gates (H,X,Z,S).

  • Observation space: Integer index corresponding to the Discrete states |0⟩, |1⟩, |+⟩, |-⟩, |+i⟩, |-i⟩.

  • Reward: Fidelity |⟨target | state⟩|² in [0, 1].

  • Termination: Success when reward exceeds reward_tolerance or truncation

at max_steps.

param - target_state:

Target state index in [0, 5]. Defaults to 2 (|+⟩). The mapping is: 0 → |0⟩, 1 → |1⟩, 2 → |+⟩, 3 → |-⟩, 4 → |+i⟩, 5 → |-i⟩.

type - target_state:

int, optional

param - max_steps:

Maximum number of steps per episode. Default is 10.

type - max_steps:

int, optional

param - reward_tolerance:

Fidelity threshold for successful termination. Must be in (0, 1]. Default is 0.99.

type - reward_tolerance:

float, optional

param - ffmpeg:

If True, animations are saved as MP4 via ffmpeg, else as GIFs. Default is False.

type - ffmpeg:

bool, optional

:raises - ValueError : If target_state is not in [0, 5].: :raises - ValueError : If reward_tolerance is not in (0, 1].: :raises - ValueError : If ffmpeg=True but ffmpeg is not installed on the system.:

property bloch_vector: pennylane.numpy.ndarray

Current Bloch vector (x, y, z).

get_reward()[source]

Compute the reward for the current state and update termination flags.

Evaluates the fidelity between the current statevector and the target statevector. Sets self.terminated if fidelity meets or exceeds reward_tolerance, and self.truncated if the step limit is reached.

Returns:

1.0 if the current state matches the target within reward_tolerance, 0.0 otherwise.

Return type:

float

render(save_path_without_extension, interval=600, ffmpeg=False)[source]

Render accumulated graph frames as an animation and save to disk.

Assembles the list of graph snapshots captured by _render_graph() into a single animation. Each frame corresponds to one call to _render_graph(), producing a visual record of the agent’s learning progression over episodes.

Parameters:
  • save_path_without_extension (str) – File path (without extension) where the animation will be saved. The appropriate extension (.mp4 or .gif) is appended automatically based on the ffmpeg argument.

  • interval (int, optional) – Delay between frames in milliseconds. Default is 600.

  • ffmpeg (bool, optional) – If True, saves the animation as an MP4 using ffmpeg. If False, saves as a GIF using Pillow. Default is False.

Raises:

ValueError – If _render_graph() has not been called and no frames are available.

Returns:

This method produces an animation file but does not return a value.

Return type:

None

reset(*, seed=None, options=None)[source]

Reset the environment to the initial state.

The qubit is placed at state index 0 (|0⟩). Episode step count, history, and termination flags are cleared.

Parameters:
  • seed (int or None, optional) – Random seed passed to the base gymnasium.Env reset. Default is None.

  • options (dict or None, optional) – Additional options passed to the base reset. Default is None.

Returns:

  • observation (int) – Initial state index (always 0, corresponding to |0⟩).

  • info (dict) – Dictionary containing fidelity, gate ("reset"), and bloch_vector of the initial state.

property state_index: int

Current state index (0-5).

step(action)[source]

Apply a gate action and advance the episode by one step.

Applies the unitary gate corresponding to action to the current statevector, updates the discrete state index via the transition table, increments the step counter, and appends the new state to history.

Parameters:

action (int) – Index into ACTION_NAMES selecting the gate to apply. 0 → H, 1 → X, 2 → Z, 3 → S.

Returns:

  • observation (int) – New discrete state index after applying the gate.

  • reward (float) – 1.0 if the target is reached within tolerance, 0.0 otherwise.

  • terminated (bool) – True if fidelity ≥ reward_tolerance.

  • truncated (bool) – True if stepsmax_steps.

  • info (dict) – Dictionary containing fidelity, gate name applied, and bloch_vector of the resulting state.

static transition_table()[source]

Return the deterministic state-transition table for the environment.

Each entry T[s, a] gives the next state index when action a is taken from state s. Rows correspond to the 6 Bloch sphere states and columns to the 4 gate actions (H, X, Z, S).

Returns:

Integer array of shape (6, 4) where T[s, a] = s'.

Return type:

np.ndarray