CompilerV0
Implementation of CompilerV0 environment
Author: Jay Shah (@Jayshah25)
Contact: jay.shah@qrlqai.com
License: Apache-2.0
- class qrl.env.core.compiler.CompilerV0(*args: Any, **kwargs: Any)[source]
Bases:
QuantumEnvSingle-qubit quantum gate compilation environment.
CompilerV0is agymnasium.Env-compatible environment that models the problem of compiling a target single-qubit unitary using a fixed, discrete gate set. The agent incrementally applies quantum gates to build a circuit whose resulting unitary approximates a given target operation in SU(2).At each step, the agent selects a gate action that left-multiplies the current circuit unitary. The episode reward is based on the average gate fidelity between the current unitary and the target unitary, encouraging the agent to discover short, high-fidelity gate sequences.
Key properties
Action space: Discrete set of single-qubit gates (Clifford + rotations).
Observation space: Flattened real and imaginary parts of the current
2×2unitary (shape(8,)).Reward: Average gate fidelity with respect to the target unitary.
Termination: Success when fidelity exceeds
reward_toleranceor truncation atmax_steps.
Rendering
The
render()method visualizes the compilation process by displaying a heatmap of the magnitude of the difference matrix|U_target − U|over time, annotated with the current step, last applied gate, and reward.Input Parameters
- targetnp.ndarray
Target
2×2unitary matrix in SU(2) to compile towards.- max_stepsint
Maximum number of gate applications per episode.
- reward_tolerancefloat
Fidelity threshold for early termination.
- ffmpegbool
Whether to use FFmpeg when saving animations.
See also
- tutorials/compiler
Step-by-step tutorial on compiling SU(2) unitaries using
CompilerV0.
- get_reward(action)[source]
Apply a quantum gate action and compute the compilation reward.
This method left-multiplies the current circuit unitary by the unitary corresponding to the selected action and evaluates the average gate fidelity with respect to the target unitary.
- Parameters:
action (int) – Index of the selected action in
self.actions.- Returns:
Average gate fidelity between the current unitary and the target unitary, defined as
0.5 * |Tr(U_target† · U)|for a single-qubit system.- Return type:
float
- render(save_path_without_extension=None, interval=800)[source]
Render the compilation process as an animation of the difference matrix.
The visualization shows the magnitude of the element-wise difference
|U_target - U|as a heatmap that evolves over time, along with annotations indicating the current step, applied action, and reward.- Parameters:
save_path_without_extension (str or None, optional) – Path (without file extension) to save the animation. If provided, the animation is saved using the configured writer (MP4 for FFmpeg or GIF for Pillow). If None, the animation is displayed interactively.
interval (int, optional) – Delay between animation frames in milliseconds. Default is 800.
- Returns:
This method produces a visualization but does not return a value.
- Return type:
None
- reset()[source]
Reset the environment to the initial compilation state.
The circuit unitary is reset to the identity matrix, the step counter is cleared, and the history buffer is reinitialized.
- Returns:
observation (np.ndarray) – Flattened observation corresponding to the identity unitary, shape
(8,).info (dict) – Empty dictionary provided for compatibility with the Gymnasium API.
- step(action)[source]
Execute one compilation step.
Applies the selected gate, updates the internal circuit unitary and history, computes the reward, and checks termination conditions.
- Parameters:
action (int) – Index of the selected action in
self.actions.- Returns:
observation (np.ndarray) – Updated flattened unitary observation, shape
(8,).reward (float) – Average gate fidelity after applying the action.
done (bool) – True if the episode has terminated due to reaching the fidelity threshold or the maximum number of steps.
info (dict) – Empty dictionary provided for compatibility with the Gymnasium API.