<h1 style="color: 	#80B1D3;"><strong>QUBIT ERROR PREDICTION</strong></h1>


<h2 style="color: 	#365F93;"><strong>Overview</strong></h2>

**Author:** Xavi

**Date:** June 2025 

**Environment:** Jupyter Notebook · Python 

---

<h3 style="color: 	#365F93;">🎯 Objective</h3>

This project aims to develop a machine learning model capable of **predicting the readout assignment error of a physical qubit** based on its calibration parameters and quantum hardware conditions.  
The model is trained on real data from **IBM QPUs**, starting with the `ibm_sherbrooke` processor and later extended to others for generalization and transfer learning analysis.

Throughout the project, we will explore and compare multiple regression algorithms **(e.g., KNN, linear regression, decision trees, etc.)** to evaluate which model best predicts the assignment error based on performance metrics like MAE and R².

---

<h3 style="color: 	#365F93;">📦 Steps Covered in This Notebook</h3>

1. **Data Acquisition** from IBM Quantum dashboard  
2. **Data Cleaning & Preprocessing**  
3. **Exploratory Data Analysis (EDA)**  
4. **Feature Engineering**  
5. **First Baseline Model** 

---

<h3 style="color: 	#365F93;">📁 Dataset</h3>

- Source: IBM Quantum backend calibration data (`ibm_sherbrooke`)
- Format: `.csv`
- Records: ~127 qubits
- Target Variable:  
  - `Readout assignment error` 

- Feature Variables (examples):  
  - `T1 (us)`, `T2 (us)`, `Frequency (GHz)`, `Anharmonicity (GHz)`, `Readout length (ns)`

---

<h2 style="color: 	#365F93;"><strong>Libraries/Functions/API</strong></h2>

In [None]:
# Run in the terminal: pip install -r requirements.txt

In [None]:
# Install required packages if not already installed
!pip install pandas --quiet
!pip install numpy --quiet
!pip install matplotlib --quiet
!pip install seaborn --quiet
!pip install scikit-learn --quiet
!pip install xgboost --quiet
!pip install shap --quiet
!pip install qiskit==0.45.1 --quiet
!pip install qiskit-ibm-provider==0.8.0 --quiet
!pip install python-dotenv --quiet
!pip install qiskit-ibmq-provider --quiet

In [None]:
# Importing necessary libraries for data preprocessing, model training, and evaluation

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_absolute_error, r2_score

In [None]:
# Importing custom utility functions from the src/functions.py script

import os
import sys
sys.path.append('../src')
from functions import *

In [2]:
from qiskit_ibm_provider import IBMProvider
from dotenv import load_dotenv

load_dotenv('../ibm_quantum_key.env')

ibm_token = os.getenv("IBM_QUANTUM_TOKEN")

provider = IBMProvider(token=ibm_token)
provider.backends()

[<IBMBackend('ibm_brisbane')>, <IBMBackend('ibm_sherbrooke')>]

<h2 style="color: 	#365F93;"><strong>1. Data Collection</strong></h2>

- Carga desde CSV / API
- Descripción inicial de las columnas

In [17]:
''''
This code is to extract all the data from the API, but it is not necessary to run it, we can upload the dataset from the raw folder inside the data folder

df_sherbrooke = load_calibration_history(
    token=ibm_token,
    backend_name="ibm_sherbrooke",
    start_date="2024-01-01",
    end_date="2025-06-01",
    step_days=1
)

'''

'\'\nThis code is to extract all the data from the API, but it is not necessary to run it, we can upload the dataset from the raw folder inside the data folder\n\ndf_sherbrooke = load_calibration_history(\n    token=ibm_token,\n    backend_name="ibm_sherbrooke",\n    start_date="2024-01-01",\n    end_date="2025-06-01",\n    step_days=1\n)\n\n'

In [None]:
# To save what has been downloaded from the API in the raw folder inside the data folder
#df_sherbrooke.to_csv('../data/raw/df_sherbrooke.csv', index=False)

In [None]:
# To upload the dataset
# df = pd.read_csv('../data/raw/archivo.csv')

In [4]:
data_overview(df_sherbrooke)

DataFrame Head
         date  qubit     T1 (us)   T2 (us)  Frequency (GHz)  \
0  2024-01-01      0  214.317737  4.635668       362.553985   
1  2024-01-01      1  250.813950  4.736282       325.227421   
2  2024-01-01      2  156.053953  4.819172       210.204496   
3  2024-01-01      3  164.308282  4.747176       357.164540   
4  2024-01-01      4  245.261393  4.787861       393.719263   

   Anharmonicity (GHz)  Readout assignment error  Prob meas0 prep1  \
0            -0.313276                    0.0040            0.0054   
1            -0.312918                    0.0077            0.0102   
2            -0.311295                    0.0355            0.0262   
3            -0.311153                    0.0054            0.0074   
4            -0.310945                    0.0093            0.0134   

   Prob meas1 prep0  Readout length (ns)  ID error  \
0            0.0026          1244.444444  0.000092   
1            0.0052          1244.444444  0.000339   
2            0.0448    

In [5]:
df_sherbrooke

Unnamed: 0,date,qubit,T1 (us),T2 (us),Frequency (GHz),Anharmonicity (GHz),Readout assignment error,Prob meas0 prep1,Prob meas1 prep0,Readout length (ns),ID error,Z-axis rotation (rz) error,√x (sx) error,Pauli-X error,ECR error,Gate time (ns),Operational
0,2024-01-01,0,214.317737,4.635668,362.553985,-0.313276,0.004000,0.005400,0.002600,1244.444444,0.000092,0.0,0.000092,,0.005403,1272.888889,
1,2024-01-01,1,250.813950,4.736282,325.227421,-0.312918,0.007700,0.010200,0.005200,1244.444444,0.000339,0.0,0.000339,,0.019672,1272.888889,
2,2024-01-01,2,156.053953,4.819172,210.204496,-0.311295,0.035500,0.026200,0.044800,1244.444444,0.000489,0.0,0.000489,,0.004720,1272.888889,
3,2024-01-01,3,164.308282,4.747176,357.164540,-0.311153,0.005400,0.007400,0.003400,1244.444444,0.000142,0.0,0.000142,,0.002732,1287.111111,
4,2024-01-01,4,245.261393,4.787861,393.719263,-0.310945,0.009300,0.013400,0.005200,1244.444444,0.000212,0.0,0.000212,,0.002732,1272.888889,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
65781,2025-06-01,122,275.361807,4.731538,269.565777,-0.311598,0.005615,0.007812,0.003418,1216.000000,,,,,,,
65782,2025-06-01,123,226.075511,4.820581,233.044197,-0.310312,0.005127,0.003906,0.006348,1216.000000,,,,,,,
65783,2025-06-01,124,217.443199,4.881324,295.996382,-0.309208,0.003906,0.004883,0.002930,1216.000000,,,,,,,
65784,2025-06-01,125,37.624372,4.977387,230.713102,-0.307758,0.006348,0.009277,0.003418,1216.000000,,,,,,,


<details>
  <summary><strong>📌Column Glossary</strong></summary>

<h3 style="color: 	#365F93;">Qubit</h3>

- Español: Índice que identifica cada qubit dentro del procesador cuántico. Es un número entero que no se repite; cada valor corresponde a un qubit físico distinto. En un backend con 33 qubits, los índices van desde 0 hasta 32.
- English: Index that identifies each qubit within the quantum processor. It is a non-repeating integer; each value corresponds to a distinct physical qubit. In a backend with 33 qubits, the indices run from 0 to 32.

<h3 style="color: 	#365F93;">T1 (us)</h3>

- Español: Tiempo de relajación energética del qubit, medido en microsegundos. Indica cuánto tarda ese qubit en perder la información de su nivel excitado y volver al estado base debido a interacciones con el entorno. Un valor más alto significa que el qubit puede mantener su estado excitado más tiempo antes de «colapsar».
- English: Energy relaxation time of the qubit, measured in microseconds. It indicates how long that qubit takes to lose information from its excited level and return to the ground state due to interactions with the environment. A higher value means the qubit can hold its excited state longer before “collapsing.”

<h3 style="color: 	#365F93;">T2 (us)</h3>

- Español: Tiempo de coherencia o de «desfasamiento» del qubit, en microsegundos. Mide cuánto tardan las fases de la superposición cuántica en perderse por ruido o inestabilidades internas. Un T2 grande implica que el qubit conserva la coherencia (la fase relativa entre sus estados) durante más tiempo.
- English: Coherence or “dephasing” time of the qubit, in microseconds. It measures how long the phases of the quantum superposition take to be lost due to noise or internal instabilities. A larger T2 means the qubit maintains coherence (the relative phase between its states) for a longer time.

<h3 style="color: 	#365F93;">Frequency (GHz)</h3>

- Español: Frecuencia de resonancia del qubit, en gigahercios. Es la frecuencia a la cual el qubit responde cuando se le aplica una señal de microondas para manipular su estado. Cada qubit está diseñado para operar en un rango de frecuencia específico; este valor indica la frecuencia central de su resonancia.
- English: Resonance frequency of the qubit, in gigahertz. It is the frequency at which the qubit responds when a microwave signal is applied to manipulate its state. Each qubit is designed to operate in a specific frequency range; this value indicates its central resonance frequency.

<h3 style="color: 	#365F93;">Anharmonicity (GHz)</h3>

- Español: Anharmonicidad del qubit, en gigahercios. Describe la diferencia entre los niveles energéticos del qubit comparado con un oscilador armónico ideal. Un valor negativo (por ejemplo, –0.31 GHz) indica que el segundo nivel excitado está desplazado respecto a la frecuencia fundamental, lo cual permite dirigir pulsos a transiciones específicas sin afectar otros niveles.
- English: Anharmonicity of the qubit, in gigahertz. It describes the difference between the qubit’s energy levels compared to an ideal harmonic oscillator. A negative value (e.g., –0.31 GHz) indicates that the second excited level is detuned from the fundamental frequency, allowing pulses to target specific transitions without affecting other levels.

<h3 style="color: 	#365F93;">Readout assignment error</h3>

- Español: Tasa de error en la lectura de la medida («readout») del qubit. Representa la probabilidad de asignar el estado equivocado (por ejemplo, medir “1” cuando el qubit estaba en “0”, o viceversa). Un valor de 0.005127 equivale a un 0.5127 % de probabilidad de error durante la lectura.
- English: Readout assignment error rate of the qubit. It represents the probability of assigning the wrong state (for example, measuring “1” when the qubit was in “0,” or vice versa). A value of 0.005127 corresponds to a 0.5127 % chance of error during readout.

<h3 style="color: 	#365F93;">Prob meas0 prep1</h3>

- Español: Probabilidad de medir “0” cuando se preparó el qubit en el estado “1”. Es una métrica de fidelidad unidireccional: si preparas el qubit en |1⟩, esta cifra indica cuántas veces (aprox. 0.488 %) el sistema devuelve “0” en lugar de “1” al leer.
- English: Probability of measuring “0” when the qubit was prepared in state “1.” It is a one-way fidelity metric: if you prepare the qubit in |1⟩, this number indicates how often (approximately 0.488 %) the system returns “0” instead of “1” upon readout.

<h3 style="color: 	#365F93;">Prob meas1 prep0</h3>

- Español: Probabilidad de medir “1” cuando se preparó el qubit en el estado “0”. Similar a la anterior, pero en la dirección opuesta: si preparas el qubit en |0⟩, esta cifra (≈ 0.537 %) indica que se mide “1” equivocadamente en lugar de “0”.
- English: Probability of measuring “1” when the qubit was prepared in state “0.” Similar to the previous one but in the opposite direction: if you prepare the qubit in |0⟩, this number (≈ 0.537 %) indicates that it is mistakenly read as “1” instead of “0.”

<h3 style="color: 	#365F93;">Readout length (ns)</h3>

- Español: Duración del pulso de lectura (measure) en nanosegundos. Es el tiempo que tarda el sistema en aplicar la señal de microondas y recolectar la información para determinar el estado del qubit. Un valor de 1216 ns implica que la medida dura aproximadamente 1.216 μs.
- English: Duration of the readout pulse (measure) in nanoseconds. It is the time it takes for the system to apply the microwave signal and gather information to determine the qubit’s state. A value of 1216 ns means the measurement lasts about 1.216 μs.

<h3 style="color: 	#365F93;">ID error</h3>

- Español: Tasa de error para la puerta “identidad” (ID). La puerta “ID” no cambia el estado del qubit, pero se usa para mantenerlo inactivo durante un ciclo temporal. Este número (≈ 0.08298 %) indica la probabilidad de error cuando se aplica esa puerta.
- English: Error rate for the “identity” gate (ID). The ID gate does not change the qubit’s state but is used to keep it idle for a time cycle. This number (≈ 0.08298 %) indicates the probability of error when applying that gate.

<h3 style="color: 	#365F93;">Z-axis rotation (rz) error</h3>

- Español: Tasa de error para la puerta de rotación alrededor del eje Z (“rz”). En muchos sistemas superconductores, la rotación “rz” se implementa de forma virtual (sin aplicar un pulso físico), por lo que su error es prácticamente cero.
- English: Error rate for the Z-axis rotation gate (“rz”). In many superconducting systems, the “rz” rotation is implemented virtually (without applying a physical pulse), so its error is effectively zero.

<h3 style="color: 	#365F93;">√x (sx) error</h3>

- Español: Tasa de error para la puerta √X (conocida como “sx”). Esta puerta equivale a rotar el qubit 90° alrededor del eje X. El valor (≈ 0.08298 %) indica la probabilidad de fallo al aplicar esta operación.
- English: Error rate for the √X gate (also known as “sx”). This gate corresponds to rotating the qubit 90° around the X axis. The value (≈ 0.08298 %) indicates the probability of failure when applying this operation.

<h3 style="color: 	#365F93;">Pauli-X error</h3>

- Español: Tasa de error para la puerta X (Pauli-X). Esto equivale a un giro de 180° alrededor del eje X (también llamado bit-flip). Un valor de ≈ 0.08298 % indica la probabilidad de que la operación “flip” falle.
- English: Error rate for the X gate (Pauli-X). This corresponds to a 180° rotation around the X axis (also called a bit-flip). A value of ≈ 0.08298 % indicates the probability that the “flip” operation fails.

<h3 style="color: 	#365F93;">ECR error</h3>

- Español: Tasa de error para la puerta “ECR” (Echoed Cross-Resonance). La ECR es una puerta de dos qubits que genera entrelazamiento; se usa sobre pares conectados físicamente. El sufijo “0” suele indicar el índice de qubit u otro identificador interno en la API. Un valor de 0.00852 (≈ 0.852 %) representa la probabilidad de error al aplicar esa puerta de dos qubits en ese índice.
- English: Error rate for the “ECR” (Echoed Cross-Resonance) gate. ECR is a two-qubit entangling gate used on physically connected pairs. The suffix “0” usually indicates the qubit index or another internal identifier in the API. A value of 0.00852 (≈ 0.852 %) represents the probability of error when applying that two-qubit gate on that index.

<h3 style="color: 	#365F93;">Gate time (ns)</h3>

- Español: Tiempo de ejecución de la puerta (gate) en nanosegundos para el índice “0” (o el identificador que indique la API). Por ejemplo, 533.333 ns corresponde al tiempo que tarda en completarse la operación ECR (o la puerta asociada al índice “0”). Si la puerta dura medio microsegundo, aparecerá como ~500 ns.
- English: Execution time of the gate in nanoseconds for index “0” (or the identifier given by the API). For example, 533.333 ns corresponds to the time it takes to complete the ECR operation (or the gate associated with index “0”). If a gate lasts half a microsecond, it appears around 500 ns.

<h3 style="color: 	#365F93;">Operational</h3>

- Español: Indicador de si el qubit está operacional (activa). “Yes” significa que el qubit está habilitado y puede ser usado en circuitos cuánticos. Si apareciera “No”, querría decir que ese qubit está fuera de servicio o deshabilitado para ese momento.
- English: Indicator of whether the qubit is operational (active). “Yes” means the qubit is enabled and can be used in quantum circuits. If it showed “No,” it would mean that qubit is out of service or disabled at that time.

</details>

<h2 style="color: 	#365F93;"><strong>2. Data Cleaning & Formatting</strong></h2>

- Conversión de tipos
- Renombrado de columnas
- Gestión de NaNs o valores extremos

In [16]:
print(df_sherbrooke.isnull().sum())

date                              0
qubit                             0
T1 (us)                           0
T2 (us)                           0
Frequency (GHz)                   0
Anharmonicity (GHz)               0
Readout assignment error          0
Prob meas0 prep1                  0
Prob meas1 prep0                  0
Readout length (ns)               0
ID error                      65652
Z-axis rotation (rz) error    65652
√x (sx) error                 65652
ECR error                     65652
Gate time (ns)                65652
dtype: int64


In [12]:
df_sherbrooke.dropna(subset=['Prob meas1 prep0', 'Readout length (ns)'], inplace=True)

In [14]:
df_sherbrooke.duplicated().sum()

0

In [15]:
# Delete columns without data
df_sherbrooke.drop(columns=["Pauli-X error", "Operational"], inplace=True)

<h2 style="color: 	#365F93;"><strong>3. EDA</strong></h2>

- Histogramas y boxplots
- Correlaciones
- Distribuciones por qubit

<h2 style="color: 	#365F93;"><strong>4. Feature Engineering</strong></h2>

- Nuevas variables
- Normalización / escalado

<h2 style="color: 	#365F93;"><strong>5. Baseline Models</strong></h2>

- Regresión lineal
- Árbol de decisión
- XGBoost básico
- Métricas (RMSE, MAE, R²)

In [19]:
features = ["T1 (us)", "T2 (us)", "Frequency (GHz)", "Anharmonicity (GHz)", "Readout length (ns)"]
target = "Readout assignment error"

df_model = df_sherbrooke[features + [target]].copy()

In [None]:
# Creamos X e y directamente desde df_model
X = df_model[features]
y = df_model[target]

Shape de X: (65779, 5)
Shape de y: (65779,)


In [None]:
# Dividimos el dataset en entrenamiento (80 %) y test (20 %)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.20, random_state=42
)

Tamaño X_train: (52623, 5)
Tamaño X_test:  (13156, 5)
Tamaño y_train: (52623,)
Tamaño y_test:  (13156,)


In [None]:
# Creamos el escalador y lo ajustamos sobre X_train
scaler = MinMaxScaler()
scaler.fit(X_train)

# Transformamos X_train y X_test
X_train_norm = scaler.transform(X_train)
X_test_norm  = scaler.transform(X_test)

# Convertimos de nuevo a DF (para conservar nombres de columnas e índices)
X_train_norm = pd.DataFrame(X_train_norm, columns=X_train.columns, index=X_train.index)
X_test_norm  = pd.DataFrame(X_test_norm,  columns=X_test.columns,  index=X_test.index)

# Mostramos las primeras filas de X_train_norm
X_train_norm.head()

Unnamed: 0,T1 (us),T2 (us),Frequency (GHz),Anharmonicity (GHz),Readout length (ns)
39822,0.232423,0.649286,0.483695,0.270094,0.977143
28322,0.528109,0.466623,0.548154,0.216507,1.0
19532,0.110034,0.64559,0.360265,0.260297,1.0
29096,0.341985,0.168895,0.334563,0.200082,1.0
13905,0.214192,0.524633,0.155088,0.253242,1.0


In [None]:
# Creamos el modelo KNN con n_neighbors = 10
knn = KNeighborsRegressor(n_neighbors=10)

# Entrenamos con los datos normalizados
knn.fit(X_train_norm, y_train)

# Predecimos en el conjunto de test
y_pred = knn.predict(X_test_norm)

# Calculamos métricas de regresión: MAE y R²
mae = mean_absolute_error(y_test, y_pred)
r2  = r2_score(y_test, y_pred)

print(f"MAE (Mean Absolute Error): {mae}")
print(f"R² score: {r2}")

MAE (Mean Absolute Error): 0.014960245643028846
R² score: 0.5501271275156308


<h2 style="color: 	#365F93;"><strong>6. Conclusiones Iniciales</strong></h2>

- Variables más relevantes
- Posibles mejoras futuras