ProbabilityV0

Implementation of ProbabilityV0 environment

Author: Jay Shah (@Jayshah25)

License: Apache-2.0

class qrl.env.core.probability.ProbabilityV0(n_qubits, target_distribution, ansatz=None, **kwargs)[source]

Bases: QuantumEnv

Probability distribution matching environment for variational quantum circuits.

ProbabilityV0 is a gymnasium.Env-compatible environment that trains a parameterized quantum circuit to approximate a target probability distribution over computational basis states. The agent optimizes continuous circuit parameters so that the measurement statistics of the circuit match a specified target distribution.

This environment is suitable for distribution learning, quantum generative modeling, and variational circuit optimization tasks.

Key properties

Action space: Continuous parameter updates applied to the circuit ansatz.
Observation space: Probability distribution over 2**n_qubits basis

states produced by the current circuit. - Reward: Negative weighted cost combining KL divergence and L2 distance to the target distribution, with an additional step penalty. - Termination: Success when the reward exceeds the specified tolerance or truncation at max_steps.

Visualization

The render() method animates the evolution of the learned probability distribution relative to the target distribution, along with the reward trajectory over training steps.

Input Parameters

n_qubitsint: Number of qubits in the circuit.
target_distributionnp.ndarray: Target probability distribution over computational basis states.
ansatzcallable or None: Custom parameterized circuit ansatz. If None, a default RY-based ansatz is used.
max_stepsint: Maximum number of optimization steps per episode.
tolerancefloat: Reward threshold for early termination.
alphafloat: Weight balancing KL divergence and L2 distance.
betafloat: Penalty weight for step count.
ffmpegbool: Whether to use FFmpeg when saving animations.