Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore alternate parallelization #255

Closed
MichaelBroughton opened this issue Jun 6, 2020 · 5 comments
Closed

Explore alternate parallelization #255

MichaelBroughton opened this issue Jun 6, 2020 · 5 comments
Assignees

Comments

@MichaelBroughton
Copy link
Collaborator

MichaelBroughton commented Jun 6, 2020

Currently our C++ op implementation for circuit simulation does this:

  1. Parse all the circuits in parallel
  2. Simulate each circuit on on it's own thread with it's own independent memory.

This works fine for smaller systems. but if you want to simulate larger systems and have limited memory this breaks down. Doing something like this:

  1. Parse all circuits in parallel
  2. Simulate each circuit one at a time (parallelizing over single wavefunctions) allocating just one pool of memory. (Keep in mind we would still need to use tensorflow parallelization tools and not OPENMP)

Would be friendlier on memory for larger number of qubits. While this may not be as fast as the first method it would certainly be more memory efficient. What do you think @jaeyoo and @zaqqwerty ?

@MichaelBroughton
Copy link
Collaborator Author

#221 is Related. Rotosolve optimizer would benefit greatly from second parallelization scheme

@MichaelBroughton
Copy link
Collaborator Author

This is resolved. We opted to go with Our original method for circuits with < 25 qubits. With circuits with more than 25 qubits we opted for the new method.

@we-taper
Copy link
Contributor

Hi @MichaelBroughton, I am trying to test this behaviour by running 500 slightly different circuits (each has 6 qubits) in a batch. But it seems that only one thread is actively running. I have attached the test code below.

tensorflow (CPU version) 2.3.0
tensorflow_quantum nightly 20200917

Test code:

# %% imports
import tensorflow as tf
import tensorflow_quantum as tfq
import cirq
import numpy as np

# %% create a batch of circuits to test if TFQ runs circuits in parallel on all available threads

# %% params
nqb = 6
ncirc = 500
nrepeat = 50  # number of repeated structures in each circuits

# %% create it
qbs = cirq.GridQubit.rect(nqb, 1)
part_a = [cirq.H.on(_) for _ in qbs]
part_b = [cirq.CNOT.on(qbs[i], qbs[i + 1]) for i in range(len(qbs) - 1)]
circuit = []
for _ in range(nrepeat):
    if np.random.rand() < 0.5:
        circuit.extend(part_a)
    else:
        circuit.extend(part_b)
circuit = cirq.Circuit(*circuit)
print('The circuit typically looks like:\n', circuit.to_text_diagram())
circuit_list = [circuit] * ncirc

# %% run it
# layer = tfq.layers.Unitary()
layer = tfq.layers.State()


@tf.function
def cost():
    """Simulate a fake hybrid-cost function."""
    output = layer(circuit_list)
    output = tf.abs(output)
    output = tf.reduce_sum(output)
    return output


print(cost())

@MichaelBroughton
Copy link
Collaborator Author

Hmmm you are compiling a @tf.function with no input. Chances are the autographer is just compiling this down into a constant expression. I ran this on my end and changed a few things and definitely saw full use of all threads. Could you try running this version and see if you still only see one thread being active ?

# %% imports
import tensorflow as tf
import tensorflow_quantum as tfq
import cirq
import numpy as np

# %% create a batch of circuits to test if TFQ runs circuits in parallel on all available threads

# %% params
nqb = 15
ncirc = 500
nrepeat = 50  # number of repeated structures in each circuits

# %% create it
qbs = cirq.GridQubit.rect(nqb, 1)
part_a = [cirq.H.on(_) for _ in qbs]
part_b = [cirq.CNOT.on(qbs[i], qbs[i + 1]) for i in range(len(qbs) - 1)]
circuit = []
for _ in range(nrepeat):
    if np.random.rand() < 0.5:
        circuit.extend(part_a)
    else:
        circuit.extend(part_b)
circuit = cirq.Circuit(*circuit)
print('The circuit typically looks like:\n', circuit.to_text_diagram())
circuit_list = [circuit] * ncirc

# %% run it
# layer = tfq.layers.Unitary()
layer = tfq.layers.State()


@tf.function
def cost(input2):
    """Simulate a fake hybrid-cost function."""
    output = layer(input2)
    # output = tf.abs(output)
    # output = tf.reduce_sum(output)
    return output


v = tfq.convert_to_tensor(circuit_list)
print('About to enter @tf.function')
res = cost(v)

@we-taper
Copy link
Contributor

Thanks @MichaelBroughton . I think now I understood it. The issue is actually in tfq.convert_to_tensor(circuit_list), which takes up a significant chunk of the time, and the actual execution of circuit in layer is fast and makes use of all the threads. But because the long conversion time which uses only one thread, the later spike in multi-threads usage got unnoticed.

jaeyoo pushed a commit to jaeyoo/quantum that referenced this issue Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants