ExpressibilityV0

Implementation of ExpressibilityV0 environment

Author: Jay Shah (@Jayshah25)

Contact: jay.shah@qrlqai.com

License: Apache-2.0

class qrl.env.core.expressibility.ExpressibilityV0(n_qubits=4, max_blocks=12, max_steps=20, n_pairs_eval=120, bins=50, lambda_depth=0.002, lambda_2q=0.002, terminate_bonus=0.1, device_name='default.qubit', seed=None, allow_all_to_all=False, ffmpeg=False)[source]

Bases: QuantumEnv

Parameterized circuit expressibility optimization environment.

ExpressibilityV0 is a gymnasium.Env-compatible environment that models the construction of parameterized quantum circuits with high expressibility. In the context of variational quantum algorithms, expressibility measures how well an ansatz can explore the Hilbert space of quantum states relative to the Haar-random distribution.

The agent incrementally builds a circuit by adding or removing predefined rotation and entangling blocks, or by explicitly terminating construction. Rewards encourage circuits whose fidelity distribution closely matches the Haar distribution, while penalizing excessive circuit depth and two-qubit gate usage.

Key properties

  • Action space: Discrete set of architectural edits (add/remove blocks or

terminate construction). - Observation space: Vector of circuit statistics summarizing depth, parameter count, entanglement, and recent expressibility estimates (shape (7,)). - Reward: Negative KL divergence to the Haar distribution with regularization penalties for depth and two-qubit gates. - Termination: Explicit termination by the agent or truncation at max_steps.

Rendering

The render() method visualizes expressibility optimization via a two-panel animation showing the circuit’s fidelity distribution compared to the Haar-random distribution alongside a block-level diagram of the evolving circuit architecture.

Input Parameters

n_qubitsint

Number of qubits in the circuit.

max_blocksint

Maximum number of blocks allowed in the circuit.

max_stepsint

Maximum number of construction steps per episode.

n_pairs_evalint

Number of random state pairs used to estimate expressibility.

binsint

Number of histogram bins for fidelity distributions.

lambda_depthfloat

Penalty weight for circuit depth.

lambda_2qfloat

Penalty weight for two-qubit gate usage.

terminate_bonusfloat

Bonus reward for explicit termination.

device_namestr

PennyLane device backend used for simulation.

seedint or None

Random seed for reproducibility.

allow_all_to_allbool

Whether to allow all-to-all entangling blocks.

ffmpegbool

Whether to use FFmpeg when saving animations.

See also

tutorials/expressibility

Tutorial on optimizing ansatz expressibility with block-based circuits.

action_meanings()[source]

Return a mapping from action indices to action names.

Returns:

Dictionary mapping integer action indices to human-readable architectural action names.

Return type:

dict

get_reward(action)[source]

The selected action modifies the circuit architecture by adding, removing, or terminating block construction. Expressibility is evaluated after the update, and a reward is computed based on the circuit’s deviation from the Haar distribution and architectural penalties.

Parameters:

action (int) – Index of the selected architectural action.

Returns:

reward – Reward value combining expressibility and architectural penalties.

Return type:

float

render(save_path_without_extension=None, interval=800)[source]

Render the expressibility optimization process as an animation.

The animation shows: 1. A histogram of circuit fidelity distribution compared to the Haar-random distribution. 2. A block-diagram visualization of the evolving circuit architecture.

Parameters:
  • save_path_without_extension (str or None, optional) – Path (without file extension) to save the animation. If provided, the animation is saved using the configured writer (MP4 for FFmpeg or GIF for Pillow). If None, the animation is displayed interactively.

  • interval (int, optional) – Delay between animation frames in milliseconds. Default is 800.

Returns:

This method produces a visualization but does not return a value.

Return type:

None

reset(*, seed=None, options=None)[source]

Reset the environment to an empty circuit.

Clears the current circuit architecture, resets internal counters, and initializes the observation vector corresponding to an empty ansatz.

Parameters:
  • seed (int or None, optional) – Random seed for reproducibility. If provided, reinitializes the internal random number generator.

  • options (dict or None, optional) – Additional reset options (currently unused, included for Gymnasium compatibility).

Returns:

  • observation (np.ndarray) – Initial observation vector describing an empty circuit, shape (7,).

  • info (dict) – Empty dictionary provided for Gymnasium API compatibility.

step(action)[source]

Execute one architecture-modification step by calling the get_reward method.

Parameters:

action (int) – Index of the selected architectural action.

Returns:

  • observation (np.ndarray) – Updated observation vector summarizing circuit statistics, shape (7,).

  • reward (float) – Reward value combining expressibility and architectural penalties.

  • done (bool) – True if the episode ended due to termination by agent or truncation, False otherwise.

  • info (dict) – Diagnostic information including expressibility, KL divergence, depth, parameter count, current block sequence, and terminated (true if agent explicitly terminated, false if episode ended due to max steps).