# Chunking Experiments

### What is chunking?

Quantum control instruments are specialized computing devices that contain various electronic components such as processing units, memories, registers, data buses, etc.
Such components have limited resources in all control instruments.
For example, instruments have limited storage for pulses and for instructions needed to execute an experiment.
Large experiments might not fit into the available resources.

Experiment chunking is a technique that enables running large experiments in the presence of these resource limitations.

Let's look at an example to understand how this works.
Waveform memory is where digitized waveforms are stored.
We will build a large experiment that needs to use a lot of waveform memory, and then demonstrate how chunking helps run this experiment seamlessly.

First things first, let's construct a minimal device setup for our example.
It is a setup containing a single SHFSG instrument:

In [None]:
from laboneq.simple import DeviceSetup, SHFSG, create_connection

setup = DeviceSetup("my_setup")
setup.add_dataserver(host="localhost", port="8004")
setup.add_instruments(SHFSG(uid="shfsg", address="dev12021"))
setup.add_connections(
    "shfsg", create_connection(to_signal="q0/drive_line", ports="SGCHANNELS/0/OUTPUT")
)

And create a session instance in emulation mode (for this exercise we don't need to run anything, just compile, so emulation mode is perfectly fine):

In [None]:
from laboneq.simple import Session

session = Session(device_setup=setup)
session.connect(do_emulation=True)

Now, let's build a simple experiment where we sweep the length of a Gaussian pulse.

In [None]:
from laboneq.simple import (
    LinearSweepParameter,
    Oscillator,
    SignalCalibration,
    pulse_library,
)
from laboneq.dsl.experiment.builtins import *


@experiment(signals=["drive"])
def exp():
    pulse = pulse_library.gaussian(length=100e-9)
    length_sweep_param = LinearSweepParameter(start=100e-9, stop=299e-9, count=200)
    with acquire_loop_rt(1):
        with sweep(parameter=length_sweep_param) as swp:
            play("drive", pulse=pulse, length=swp)

    map_signal("drive", setup.logical_signal_groups["q0"].logical_signals["drive_line"])

    experiment_calibration()["drive"] = SignalCalibration(
        local_oscillator=Oscillator(frequency=1.0e9),
        oscillator=Oscillator(frequency=100e6),
    )


length_sweep_exp = exp()

This example is artificially built to hit the waveform memory limitation of the instrument.
In practice, you may hit this limit doing a large meaningful experiment, such as randomized benchmarking.
Since the length of the pulse is critical for the timing of the entire experiment, LabOneQ will separately sample the swept pulse for each length.
This creates a lot of samples which will hit the waveform memory limit and demonstrate the main point of this tutorial.

Let's now attempt to compile the experiment:

In [None]:
compiled = session.compile(length_sweep_exp)

It fails with an error that says:

```
LabOneQException: Compilation error - resource limitation exceeded.
To circumvent this, try one or more of the following:
- Double check the integrity of your experiment (look for unexpectedly long pulses, large number of sweep steps, etc.)
- Reduce the number of sweep steps
- Reduce the number of variations in the pulses that are being played
- Enable chunking for a sweep
- If chunking is already enabled, increase the chunk count or switch to automatic chunking
```

If you scroll up in the Python traceback, you should also see the underlying cause of the problem, in this case `waveforms are not fitting into wave memory`.



To circumvent this, we could split the sweep range into two halves, and execute two separate experiments.
However this is inconvenient - first, we would be running multiple experiments, but we conceptually need only one, second, we would need to deal with multiple result objects, and third, halving may not be enough and we may need to split more.
Fortunately, LabOneQ provides a much more convenient way of doing this split without actually doing it yourself.
In your experiment definition you can tell LabOneQ how many pieces do you want this experiment be executed in.
We call them "chunks", and the way to use it is just by updating the sweep definition:

```python
with sweep(parameter=length_sweep_param, chunk_count=2) as swp:
```

After this change, you can verify that the compilation succeeds.
We've told the compiler to compile the experiment as two separate real-time chunks.
This is the only change you need to make in your program.
If you have measurements in your experiment, then the results that you get back will look exactly as if they come from a single experiment execution.

Besides saving us from manually splitting the experiment, the chunking option also utilizes hardware resources more efficiently compared to executing separate experiments.
A chunked experiment will be executed faster, which becomes a noticeable benefit when the number of chunks is much larger than in our example above.
The compilation time remains roughly the same as compiling separate half-experiments.

### But wait, how do I know what is a good value for chunk_count?
You're probably thinking that you don't know the size of the waveform memory on the instrument, and moreover, you don't know what optimizations the LabOneQ compiler performs, so you can't predict how much waveform memory the generated program needs.

You are of course welcome to study instrument manuals, learn the limitations, and find an optimal `chunk_count` value for your experiment by trial and error.
In a lot of cases figuring out a good `chunk_count` is very easy and one would need to do it once for an experiment and use forever.
Nevertheless, if this is not a route that you want to take, you can just use the `auto_chunking` option instead, in which case the LabOneQ compiler will determine a good `chunk_count` on its own:

```python
with sweep(parameter=length_sweep_param, auto_chunking=True) as swp:
```

Compiling the experiment with this change is again successful, and in the compiler log you should see a message like `Auto-chunked sweep divided into 2 chunks`.

### Why doesn't LabOneQ chunk the sweep automatically always?

If automatic determination of `chunk_count` is possible, why does LabOneQ even need chunking options to be specified?

Consider a more complicated experiment with multiple nested sweeps. There are two key points:

1. Only one sweep can be chunked
2. The LabOneQ compiler shouldn't decide on your behalf which one to chunk. There are use cases where you need to chunk a specific sweep only, since some parts of the program should always be executed together in one go. Think about state tomography experiments.

As a user, you have control over which sweep to chunk.

Note that sometimes the experiment structure will pretty much enforce which sweep to chunk. For example, imagine the experiment we used above with another sweep on top of it.
Each iteration of that outer sweep is basically one instance of our original experiment, which we already know doesn't fit into waveform memory.
In this case, chunking the outer sweep will not help, so the inner one should be chunked

### Why shouldn't I use auto_chunking always?

Automatically determining a chunk count requires running parts of the compilation process multiple times, which leads to increased compilation times.
Additionally, the compiler may make assumptions about which chunking is best.
If you as the author of the experiment know how the experiment should be chunked, it is recommended to supply the specific `chunk_count`.
This will reduce compilation time and be guaranteed to give you exactly the result you want.

The exact underlying mechanism/algorithm of auto chunking is not important and may change from one LabOneQ version to another.

### What are the resource limitations that are covered by chunking?

Strictly speaking, you do not need to know/remember the list of all hardware resources, hitting the limitations of which can be circumvented by chunking.
However, there are few important points to note:

1. `auto_chunking` can be used for any resource limitation that results into an error message that we saw earlier in this tutorial. If you encounter a situation where the error messages looks like some sort of resource limitation was hit, but it is not the error we saw above, please get in touch with us.
2. On the other hand, `chunk_count` can be used anytime, particularly for situations not covered by `auto_chunking`, and even when you are not dealing a resource limitation. Even though it is mainly designed to circumvent hardware resource limitations, you can use it for whatever other reasons you need to split a sweep execution into parts.

Nevertheless, for the curious reader, here is a non-exhaustive list of the main resources that are currently covered by the auto chunking algorithm:

- Waveform memory
- Command table memory
- Program/Instruction memory
- Various cache memories

### Chunking rules

1. In each experiment, only one real-time sweep can be chunked.
2. Near-time sweeps cannot be chunked (chunking is not meaningful for near-time sweeps).
3. The `chunk_count` must evenly divide the number of corresponding sweep points.
4. An experiment should be chunked so that all chunks have the same measurement operations. They can be at different times though.


### Known shortcomings

The chunking rules 3. and 4. exist because of a shortcoming in the measurement result processing, and shall go away in some future releases.

ADditionally, there are some resource limitations that we know should be coveered by `auto_chunking`, but currently are not:

- Measurement result buffer size. It is the part of the hardware where measurement results are collected. If you need to record too many measurement results (e.g. running an experiment with a lot of measurement operations, or an experiment in single-shot readout mode), you may hit this limitation.

NOTE: "Manual" chunking with `chunk_count` can still be used to cover these cases.