# Using `qdk-chemistry` for multi-reference quantum chemistry state preparation and energy estimation

This notebook demonstrates an end-to-end multi-configurational quantum chemistry workflow using `qdk-chemistry`.
It covers molecule loading and visualization, self-consistent-field (SCF) calculation, active-space selection, multi-configurational wavefunction generation, quantum state-preparation circuit construction, and measurement circuits for energy estimation.

In many molecular systems—such as bond dissociation or transition-metal complexes—a single electronic configuration cannot describe the true electronic structure.
These multi-configurational systems exhibit strong electron correlation that challenges mean-field and single-determinant methods like [Hartree–Fock](https://en.wikipedia.org/wiki/Hartree%E2%80%93Fock_method) or standard [coupled cluster theory](https://en.wikipedia.org/wiki/Coupled_cluster).

While classical multi-configurational approaches can capture these effects, their computational cost grows exponentially with system size.
Quantum computers offer a complementary route: they can represent superpositions of many configurations natively and solve these problems with polynomial scaling.

However, near-term fault-tolerant quantum hardware is still in the early stages of growth and scaling.
To use it effectively, we must compress and optimize chemistry problems before they reach the quantum device.
Classical methods enable this by identifying essential orbitals through active-space selection, generating approximate wavefunctions for state preparation, and supplying data to optimize quantum circuits for energy estimation.

This notebook focuses on state preparation, where a multi-configurational wavefunction from classical computation is transformed into a quantum circuit.
State preparation is central to quantum chemistry algorithms such as [Quantum Phase Estimation (QPE)](https://en.wikipedia.org/wiki/Quantum_phase_estimation_algorithm) and also serves as a practical hardware benchmark: preparing complex multi-configurational states tests the fidelity and coherence of quantum hardware.

In the example below, we show how to generate and optimize state preparation circuits, from active-space selection to energy measurement, demonstrating how chemical insight can reduce quantum resource requirements for near-term devices.

## Loading and visualizing the molecular structure

For this example, we will use the benzene diradical molecule.
The benzene diradical has two unpaired electrons, making it a good candidate for multi-reference quantum chemistry methods.
This molecule is also an important intermediate in the [Bergman cyclization reaction](https://en.wikipedia.org/wiki/Bergman_cyclization), a popular reaction in synthetic organic chemistry.

The molecular structure is provided in the [XYZ file format](https://en.wikipedia.org/wiki/XYZ_file_format).
This cell demonstrates how load the molecule and visualize its structure.

In [None]:
from pathlib import Path

from qdk.widgets import MoleculeViewer

from qdk_chemistry.data import Structure

# Read molecular structure from XYZ file
structure = Structure.from_xyz_file(
    Path(".") / "data/benzene_diradical.structure.xyz"
)

# Visualize the molecular structure
display(MoleculeViewer(molecule_data=structure.to_xyz()))

## Generating the molecular orbitals

This step performs a [Hartree-Fock](https://en.wikipedia.org/wiki/Hartree%E2%80%93Fock_method) (HF) SCF calculation to generate an approximate initial wavefunction and ground-state energy guess.
The wavefunction and energy returned by this initial calculation do not provide an accurate description of the system electronic structure; however, they are useful for constructing molecular orbitals.
The resulting molecular orbitals will be used in subsequent steps for active space selection and multi-configuration calculations.

In [None]:
from qdk_chemistry.algorithms import create

# Perform an SCF calculation, returning the energy and wavefunction
scf_solver = create("scf_solver")
E_hf, wfn_hf = scf_solver.run(structure, charge=0, spin_multiplicity=1, basis_or_guess="cc-pvdz")
print(f"SCF energy is {E_hf:.3f} Hartree")

# Display a summary of the molecular orbitals obtained from the SCF calculation
print("SCF Orbitals:\n", wfn_hf.get_orbitals().get_summary())

## Selecting an active space and calculating the multi-configuration wavefunction

### Active space selection

Most chemistry applications on quantum computers will require the use of [active spaces](https://en.wikipedia.org/wiki/Complete_active_space) to focus the quantum calculation on a subset of the electrons and orbitals in the system.
For example, the benzene diradical with the default basis set specified above results in ~100 molecular orbitals, requiring ~200 qubits to represent the full electronic structure problem.

This cell shows how to optimize this calculation by selecting an active space from the valence molecular orbitals calculated in the previous SCF step, focusing on the [frontier orbitals](https://en.wikipedia.org/wiki/Frontier_molecular_orbital_theory) that are most relevant to molecular reactivity.

In [None]:
# Select active space (6 electrons in 6 orbitals for benzene diradical) to choose most chemically relevant orbitals
active_space_selector = create("active_space_selector", algorithm_name="qdk_valence",
                               num_active_electrons=6, num_active_orbitals=6)
active_wfn = active_space_selector.run(wfn_hf)
active_orbitals = active_wfn.get_orbitals()

# Print a summary of the active space orbitals
print("Active Space Orbitals:\n", active_orbitals.get_summary())

The next cell shows how to visualize the selected active orbitals.
The drop-down menu provides the ability to select different occupied and virtual orbitals in the active space to visualize their shapes, while the isovalue slider adjusts the surface representation of the orbitals for different electron density levels.


In [None]:
from qdk_chemistry.utils.cubegen import generate_cubefiles_from_orbitals

# Generate cube files for the active orbitals
cube_data = generate_cubefiles_from_orbitals(
    orbitals=active_orbitals,
    grid_size=(40, 40, 40),
    margin=10.0,
    indices=active_orbitals.get_active_space_indices()[0],
    label_maker=lambda p: f"{'occupied' if p < 20 else 'virtual'}_{p + 1:04d}"
)

# Visualize the molecular orbitals together with the structure
MoleculeViewer(molecule_data=structure.to_xyz(), cube_data=cube_data)

### Calculate the multi-configuration wavefunction for the active space

Once the active space has been selected, we are ready to solve the electronic structure problem (e.g., [Schrodinger's equation](https://en.wikipedia.org/wiki/Schr%C3%B6dinger_equation#Time-independent_equation)) more accurately than our initial SCF guess.
However, this requires two steps illustrated in this cell:

First, we need to construct the [Hamiltonian](https://en.wikipedia.org/wiki/Hamiltonian_(quantum_mechanics)), which provides the mathematical description of the energy and interactions of the electrons in the active space.

Second, we need to construct an improved estimate of the multi-configuration wavefunction and ground-state energy for this Hamiltonian.
The benzene diradical system in this demonstration is small enough that we can use a Complete Active Space [Configuration Interaction](https://en.wikipedia.org/wiki/Configuration_interaction) (CAS-CI) calculation to obtain the exact quantum mechanical energy and wavefunction for the active space.
However, for larger systems, the exact solution will not be feasible classically, and approximate methods such as [selected configuration interaction](https://arxiv.org/abs/2303.05688) or [density matrix renormalization group (DMRG)](https://en.wikipedia.org/wiki/Density_matrix_renormalization_group) are required.

Unlike the mean-field Hartree-Fock method, which approximates the wavefunction as a single [Slater determinant](https://en.wikipedia.org/wiki/Slater_determinant), these multi-configuration methods consider all possible electron configurations within the active space, capturing electron correlation effects.
By subtracting the mean-field Hartree-Fock energy from the correlated multi-configuration energy, we obtain the [correlation energy](https://en.wikipedia.org/wiki/Electronic_correlation) for this active space.

In [None]:
# Construct Hamiltonian in the active space and print its summary
hamiltonian_constructor = create("hamiltonian_constructor")
hamiltonian = hamiltonian_constructor.run(active_orbitals)
print("Active Space Hamiltonian:\n", hamiltonian.get_summary())

# Perform CASCI calculation to get the wavefunction and exact energy for the active space
mc = create("multi_configuration_calculator")
E_cas, wfn_cas = mc.run(
    hamiltonian, n_active_alpha_electrons=3, n_active_beta_electrons=3
)
print(f"CASCI energy is {E_cas:.3f} Hartree, and the electron correlation energy is {E_cas - E_hf:.3f} Hartree")

## Loading the wavefunction onto a quantum computer

Now that we have calculated the multi-configuration wavefunction for the active space, we can generate a quantum circuit to prepare this state on a quantum computer.
However, not all parts of the multi-configuration wavefunction contribute equally to the overall state, creating an opportunity for optimization.

### Identifying the dominant configurations in the wavefunction

The first task is to understand the sparsity of the wavefunction:  how many configurations contribute significantly to the overall state?

This cell demonstrates how to analyze the wavefunction and identify the dominant configurations based on their amplitudes.

In [None]:
import numpy as np
from qdk.widgets import Histogram

# Plot top determinant weights from the CASCI wavefunction
NUM_DETERMINANTS = 10
print(f"Total determinants in the CASCI wavefunction:  {len(wfn_cas.get_active_determinants())}")
print(f"Plotting the top {NUM_DETERMINANTS} determinants by weight.")
top_configurations = wfn_cas.get_top_determinants(max_determinants=NUM_DETERMINANTS)
display(Histogram(bar_values={k.to_string(): np.abs(v)**2 for k, v in top_configurations.items()},))

Reducing the wavefunction to these determinants allows us to optimize the computational requirements for loading the quantum computer with a state that has high overlap with the true wavefunction—an important metric for quantum algorithms like QPE.
However, this reduction of the wavefunction also changes our description of the quantum system, particularly its energy.
Therefore, for the purposes of benchmarking, we need to recalculate the energy of the truncated wavefunction classically to provide a reference for evaluating accuracy of the quantum calculation.
This cell shows how to recalculate this energy.

In [None]:
# Get top 2 determinants from the CASCI wavefunction to form a sparse wavefunction
top_configurations = wfn_cas.get_top_determinants(max_determinants=2)

# Compute the reference energy of the sparse wavefunction
pmc_calculator = create("projected_multi_configuration_calculator")
E_sparse, wfn_sparse = pmc_calculator.run(hamiltonian, list(top_configurations.keys()))

print(f"Reference energy for top 2 determinants is {E_sparse:.6f} Hartree")

### Loading the wavefunction using general state preparation methods

One possibility for loading the multi-configuration wavefunction onto a quantum computer is to use general state preparation approaches such as the [isometry method](https://arxiv.org/abs/1501.06911), as offered in software such as [Qiskit](https://qiskit.org/documentation/stubs/qiskit.circuit.library.Isometry.html).
While this is a very powerful general-purpose approach, it can be resource intensive, requiring very deep circuits even for modest-sized wavefunctions due to its exponential scaling in the number of qubits.
This approach also requires numerous fine rotations—operations that can be challenging for near-term fault-tolerant quantum hardware.
This cell demonstrates how to use the isometry method to generate a quantum circuit for preparing the multi-configuration wavefunction on a quantum computer.

**Note**:  the generated circuits are so deep that you will need to adjust the "zoom" selection in the visualization window to see the detailed operations.

In [None]:
import pandas as pd
from qdk.openqasm import estimate
from qdk.widgets import Circuit

# Generate state preparation circuit for the sparse state using the regular isometry method (Qiskit)
state_prep = create("state_prep", "regular_isometry")
regular_isometry_circuit = state_prep.run(wfn_sparse)

# Visualize the regular isometry circuit
display(Circuit(regular_isometry_circuit.get_qsharp()))

# Print logical qubit counts estimated from the circuit
df = pd.DataFrame(estimate(regular_isometry_circuit.get_qasm()).logical_counts.items(), columns=['Logical Estimate', 'Counts'])
display(df)

### Loading the wavefunction using optimized state preparation methods

As the cell above illustrates, the general isometry method for state preparation can be very resource intensive—requiring thousands of fine rotations for this benzene diradical example.
However, we can optimize this process by taking advantage of the sparse multi-configuration wavefunction structure, generating much more efficient quantum circuits for state preparation.
The cell below demonstrates how to use the `qdk-chemistry` library can be used for optimized wavefunction loading, producing a circuit that is orders of magnitude more efficient than the general isometry method.

The underlying approach is based on a variation of the [sparse isometry method](https://quantum-journal.org/papers/q-2021-03-15-412/pdf/), with new updates specific to `qdk-chemistry` that avoid the use of multi-controlled gates (also challenging for near-term fault-tolerant quantum computers).

In [None]:
# Generate state preparation circuit for the sparse state via sparse isometry (GF2 + X)
state_prep = create("state_prep", "sparse_isometry_gf2x")
sparse_isometry_circuit = state_prep.run(wfn_sparse)

# Visualize the sparse isometry circuit
display(Circuit(sparse_isometry_circuit.get_qsharp()))

# Print logical qubit counts estimated from the circuit
df = pd.DataFrame(estimate(sparse_isometry_circuit.get_qasm()).logical_counts.items(), columns=['Logical Estimate', 'Counts'])
display(df)

Rather than requiring thousands of fine rotations, this optimized approach requires only a single fine rotation for the two-determinant benzene diradical wavefunction—demonstrating the power of chemistry-informed optimizations for quantum state preparation.

Close inspection of the generated circuit shows that it has also reduced our qubit count:  several of the qubits have been converted to classical bits, which can be post-processed after measurement.
We will revisit these classical bits in the next section on energy measurement.

## Estimating the energy on a quantum computer

For the final stage of this state preparation application benchmark workflow, we estimate the energy of the optimized multi-configuration wavefunction prepared on a quantum computer.
The first step in this process is mapping the classical Hamiltonian for the active space to a qubit Hamiltonian that can be measured on a quantum computer.
For this example, we use the [Jordan-Wigner transformation](https://en.wikipedia.org/wiki/Jordan%E2%80%93Wigner_transformation) to perform this mapping.

In [None]:
# Prepare qubit Hamiltonian
qubit_mapper = create("qubit_mapper", algorithm_name="qiskit", encoding="jordan-wigner")
qubit_hamiltonian = qubit_mapper.run(hamiltonian)

### Optimizing the Hamiltonian

Recall from earlier in the notebook that the full Hamiltonian included 1000+ one- and two-body integrals.
Despite optimizing the wavefunction, our Hamiltonian still contains a large number of terms that would need to be measured on a quantum computer to estimate the energy.
The cell below illustrates this point.

In [None]:
# Print the number of Pauli strings in the full Hamiltonian
print(f"Number of Pauli strings in the Hamiltonian: {len(qubit_hamiltonian.pauli_strings)}")

However, we can optimize this measurement process in two ways:

1. Applying the classical wavefunction information to pre-screen the qubit Hamiltonian, identifying which terms actually need quantum measurements.  The remaining terms are converted into precomputed classical coefficients, allowing us to slash the number of quantum measurements required for the sparse state.
2. Grouping commuting terms in the Hamiltonian to reduce the number measurements.

The following cell shows how to perform these optimizations using the `qdk-chemistry` library, resulting in a much more efficient measurement process for estimating the energy on a quantum computer.

In [None]:
from qdk_chemistry.data.qubit_hamiltonian import filter_and_group_pauli_ops_from_wavefunction

# Filter and group Pauli operators based on the wavefunction
filtered_hamiltonian_ops, classical_coeffs = (
    filter_and_group_pauli_ops_from_wavefunction(
        qubit_hamiltonian,
        wfn_sparse
    )
)
print(f"Filtered and grouped qubit Hamiltonian contains {len(filtered_hamiltonian_ops)} groups:")
for igroup, group in enumerate(filtered_hamiltonian_ops):
    print(f"Group {igroup+1}: {[group.pauli_strings]}")
print(f"Number of classical coefficients: {len(classical_coeffs)}")

### Estimating the energy on a quantum computer (simulator)

Finally, we need to generate the measurement circuits required to estimate the energy of the prepared multi-configuration wavefunction on a quantum computer.
Since the optimized benzene diradical Hamiltonian contains only two measurement groups, we only need two measurement circuits to estimate the energy.
The cell below demonstrates how to generate these measurement circuits using the `qdk-chemistry` library and how to use the QDK simulator to execute them.

This cell provides a set budget of measurements ("shots") to be evenly divided between the measurement circuits.
The measurement process is probabilistic, so we obtain a distribution of results from each circuit.
These distributions are then combined to produce a final energy estimate, along with an uncertainty (variance) as reported below.
The uncertainty is directly related to the number of shots used in the measurement process:  more shots lead to lower uncertainty.

In [None]:
# Estimate energy using the optimized circuit and filtered Hamiltonian operators
estimator = create("energy_estimator", algorithm_name="qdk_base_simulator")
energy_results, simulation_data = estimator.run(
    circuit=sparse_isometry_circuit,
    qubit_hamiltonians=filtered_hamiltonian_ops,
    total_shots=250000,
    classical_coeffs=classical_coeffs,
)


for i, results in enumerate(simulation_data.bitstring_counts):
    print(f"Measurement Results for Hamiltonian Group {i+1}: {simulation_data.hamiltonians[i].pauli_strings}")
    display(Histogram(bar_values=results))

# Print statistic for measured energy
energy_mean = energy_results.energy_expectation_value + hamiltonian.get_core_energy()
energy_stddev = np.sqrt(energy_results.energy_variance)
print(
    f"Estimated energy from quantum circuit: {energy_mean:.3f} ± {energy_stddev:.3f} Hartree"
)

# Print comparison with reference energy
print(f"Difference from reference energy: {energy_mean - E_sparse} Hartree")