CompilerV0

Implementation of CompilerV0 environment

Author: Jay Shah (@Jayshah25)

License: Apache-2.0

class qrl.env.core.compiler.CompilerV0(*args: Any, **kwargs: Any)[source]

Bases: QuantumEnv

Single-qubit quantum gate compilation environment.

CompilerV0 is a gymnasium.Env-compatible environment that models the problem of compiling a target single-qubit unitary using a fixed, discrete gate set. The agent incrementally applies quantum gates to build a circuit whose resulting unitary approximates a given target operation in SU(2).

At each step, the agent selects a gate action that left-multiplies the current circuit unitary. The episode reward is based on the average gate fidelity between the current unitary and the target unitary, encouraging the agent to discover short, high-fidelity gate sequences.

Key properties

Action space: Discrete set of single-qubit gates (Clifford + rotations).
Observation space: Flattened real and imaginary parts of the current 2×2 unitary (shape (8,)).
Reward: Average gate fidelity with respect to the target unitary.
Termination: Success when fidelity exceeds reward_tolerance or truncation at max_steps.

Rendering

The render() method visualizes the compilation process by displaying a heatmap of the magnitude of the difference matrix |U_target − U| over time, annotated with the current step, last applied gate, and reward.

Input Parameters

targetnp.ndarray: Target 2×2 unitary matrix in SU(2) to compile towards.
max_stepsint: Maximum number of gate applications per episode.
reward_tolerancefloat: Fidelity threshold for early termination.
ffmpegbool: Whether to use FFmpeg when saving animations.

See also

tutorials/compiler: Step-by-step tutorial on compiling SU(2) unitaries using CompilerV0.

get_reward(action)[source]

Apply a quantum gate action and compute the compilation reward.

This method left-multiplies the current circuit unitary by the unitary corresponding to the selected action and evaluates the average gate fidelity with respect to the target unitary.

Parameters:: action (int) – Index of the selected action in self.actions.
Returns:: Average gate fidelity between the current unitary and the target unitary, defined as 0.5 * |Tr(U_target† · U)| for a single-qubit system.
Return type:: float

render(save_path_without_extension=None, interval=800)[source]

Render the compilation process as an animation of the difference matrix.

The visualization shows the magnitude of the element-wise difference |U_target - U| as a heatmap that evolves over time, along with annotations indicating the current step, applied action, and reward.

Parameters:

save_path_without_extension (str or None, optional) – Path (without file extension) to save the animation. If provided, the animation is saved using the configured writer (MP4 for FFmpeg or GIF for Pillow). If None, the animation is displayed interactively.
interval (int, optional) – Delay between animation frames in milliseconds. Default is 800.

Returns:

This method produces a visualization but does not return a value.

Return type:

None

reset()[source]

Reset the environment to the initial compilation state.

The circuit unitary is reset to the identity matrix, the step counter is cleared, and the history buffer is reinitialized.

Returns:

observation (np.ndarray) – Flattened observation corresponding to the identity unitary, shape (8,).
info (dict) – Empty dictionary provided for compatibility with the Gymnasium API.

step(action)[source]

Execute one compilation step.

Applies the selected gate, updates the internal circuit unitary and history, computes the reward, and checks termination conditions.

Parameters:

action (int) – Index of the selected action in self.actions.

Returns:

observation (np.ndarray) – Updated flattened unitary observation, shape (8,).
reward (float) – Average gate fidelity after applying the action.
done (bool) – True if the episode has terminated due to reaching the fidelity threshold or the maximum number of steps.
info (dict) – Empty dictionary provided for compatibility with the Gymnasium API.