Explore alternate parallelization #255

MichaelBroughton · 2020-06-06T17:00:33Z

Currently our C++ op implementation for circuit simulation does this:

Parse all the circuits in parallel
Simulate each circuit on on it's own thread with it's own independent memory.

This works fine for smaller systems. but if you want to simulate larger systems and have limited memory this breaks down. Doing something like this:

Parse all circuits in parallel
Simulate each circuit one at a time (parallelizing over single wavefunctions) allocating just one pool of memory. (Keep in mind we would still need to use tensorflow parallelization tools and not OPENMP)

Would be friendlier on memory for larger number of qubits. While this may not be as fast as the first method it would certainly be more memory efficient. What do you think @jaeyoo and @zaqqwerty ?

MichaelBroughton · 2020-06-11T17:31:33Z

#221 is Related. Rotosolve optimizer would benefit greatly from second parallelization scheme

MichaelBroughton · 2020-07-17T05:01:49Z

This is resolved. We opted to go with Our original method for circuits with < 25 qubits. With circuits with more than 25 qubits we opted for the new method.

we-taper · 2020-09-21T16:24:50Z

Hi @MichaelBroughton, I am trying to test this behaviour by running 500 slightly different circuits (each has 6 qubits) in a batch. But it seems that only one thread is actively running. I have attached the test code below.

tensorflow (CPU version) 2.3.0
tensorflow_quantum nightly 20200917

Test code:

# %% imports
import tensorflow as tf
import tensorflow_quantum as tfq
import cirq
import numpy as np

# %% create a batch of circuits to test if TFQ runs circuits in parallel on all available threads

# %% params
nqb = 6
ncirc = 500
nrepeat = 50  # number of repeated structures in each circuits

# %% create it
qbs = cirq.GridQubit.rect(nqb, 1)
part_a = [cirq.H.on(_) for _ in qbs]
part_b = [cirq.CNOT.on(qbs[i], qbs[i + 1]) for i in range(len(qbs) - 1)]
circuit = []
for _ in range(nrepeat):
    if np.random.rand() < 0.5:
        circuit.extend(part_a)
    else:
        circuit.extend(part_b)
circuit = cirq.Circuit(*circuit)
print('The circuit typically looks like:\n', circuit.to_text_diagram())
circuit_list = [circuit] * ncirc

# %% run it
# layer = tfq.layers.Unitary()
layer = tfq.layers.State()


@tf.function
def cost():
    """Simulate a fake hybrid-cost function."""
    output = layer(circuit_list)
    output = tf.abs(output)
    output = tf.reduce_sum(output)
    return output


print(cost())

MichaelBroughton · 2020-09-21T20:01:10Z

Hmmm you are compiling a @tf.function with no input. Chances are the autographer is just compiling this down into a constant expression. I ran this on my end and changed a few things and definitely saw full use of all threads. Could you try running this version and see if you still only see one thread being active ?

# %% imports
import tensorflow as tf
import tensorflow_quantum as tfq
import cirq
import numpy as np

# %% create a batch of circuits to test if TFQ runs circuits in parallel on all available threads

# %% params
nqb = 15
ncirc = 500
nrepeat = 50  # number of repeated structures in each circuits

# %% create it
qbs = cirq.GridQubit.rect(nqb, 1)
part_a = [cirq.H.on(_) for _ in qbs]
part_b = [cirq.CNOT.on(qbs[i], qbs[i + 1]) for i in range(len(qbs) - 1)]
circuit = []
for _ in range(nrepeat):
    if np.random.rand() < 0.5:
        circuit.extend(part_a)
    else:
        circuit.extend(part_b)
circuit = cirq.Circuit(*circuit)
print('The circuit typically looks like:\n', circuit.to_text_diagram())
circuit_list = [circuit] * ncirc

# %% run it
# layer = tfq.layers.Unitary()
layer = tfq.layers.State()


@tf.function
def cost(input2):
    """Simulate a fake hybrid-cost function."""
    output = layer(input2)
    # output = tf.abs(output)
    # output = tf.reduce_sum(output)
    return output


v = tfq.convert_to_tensor(circuit_list)
print('About to enter @tf.function')
res = cost(v)

we-taper · 2020-09-21T20:18:19Z

Thanks @MichaelBroughton . I think now I understood it. The issue is actually in tfq.convert_to_tensor(circuit_list), which takes up a significant chunk of the time, and the actual execution of circuit in layer is fast and makes use of all the threads. But because the long conversion time which uses only one thread, the later spike in multi-threads usage got unnoticed.

…ates Fuse gates referenced by pointers.

MichaelBroughton self-assigned this Jun 11, 2020

MichaelBroughton mentioned this issue Jun 14, 2020

[Tracking] qsim migration #263

Closed

MichaelBroughton closed this as completed Jul 17, 2020

jaeyoo pushed a commit to jaeyoo/quantum that referenced this issue Mar 30, 2023

Merge pull request tensorflow#255 from quantumlib/fuser-pointers-to-g…

9c02ef9

…ates Fuse gates referenced by pointers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore alternate parallelization #255

Explore alternate parallelization #255

MichaelBroughton commented Jun 6, 2020 •

edited

Loading

MichaelBroughton commented Jun 11, 2020

MichaelBroughton commented Jul 17, 2020

we-taper commented Sep 21, 2020

MichaelBroughton commented Sep 21, 2020

we-taper commented Sep 21, 2020

Explore alternate parallelization #255

Explore alternate parallelization #255

Comments

MichaelBroughton commented Jun 6, 2020 • edited Loading

MichaelBroughton commented Jun 11, 2020

MichaelBroughton commented Jul 17, 2020

we-taper commented Sep 21, 2020

MichaelBroughton commented Sep 21, 2020

we-taper commented Sep 21, 2020

MichaelBroughton commented Jun 6, 2020 •

edited

Loading