New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use tf.distribute.Strategy in TFGNN #177
Comments
Hi @ly3106, The orchestration layer (https://github.com/tensorflow/gnn/tree/main/tensorflow_gnn/runner) is what we usually use for training models, and it will ideally work out of the box. However, some improvements are underway for the in_memory stuff in the near term. Could you provide:
|
On Tue, Nov 15, 2022 at 12:02 PM Bryan Perozzi ***@***.***> wrote:
Hi @ly3106 <https://github.com/ly3106>,
The orchestration layer (
https://github.com/tensorflow/gnn/tree/main/tensorflow_gnn/runner) is
what we usually use for training models, and it will ideally work out of
the box.
You can also find some documentation here:
https://github.com/tensorflow/gnn/blob/main/tensorflow_gnn/docs/guide/runner.md.
And examples here:
https://github.com/tensorflow/gnn/tree/main/tensorflow_gnn/runner/examples/ogbn/mag
.
… However, some improvements are underway for the in_memory stuff in the
near term.
Could you provide:
1. What version of the library are you using? (head, or 0.3.0?, etc)
2. Some details about whatever error you encountered (stacktrace, MWE,
etc)
—
Reply to this email directly, view it on GitHub
<#177 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJTKB434JZZT27LN6VWTL3WIO62HANCNFSM6AAAAAASAUW2CA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi @phanein and @dzelle, thanks for your reply. Sorry, it took so long to reply to you, because I have been learning and testing your recommendation materials and demos during this period. After the actual test, I think they are very helpful for me. Eventually, I completed the parallel running on multiple GPUs. However, I think there will be some questions for TFGNN beginners. So, I list the problems I encountered here and give their solutions. Actually, I didn't run the recommended demo but run this demo (I think it is a previous version of the former). The reasons I use the latter demo are that it was written by However, the two demos all use the same pre-processed dataset on Google Cloud Storage. Therefore, we need to download the dataset to your local computer. Otherwise, you should run the latter one on Google Colab. The former demo train.py didn't give the dataset path but the latter demo ogbn_mag_e2e.ipynb wrote the datapath on the code. The path is input_file_pattern = "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-?????-of-00100"
graph_schema_file = "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/schema.pbtxt" So, we should use gsutil -m cp \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00000-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00001-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00002-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00003-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00004-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00005-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00006-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00007-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00008-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00009-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00010-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00011-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00012-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00013-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00014-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00015-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00016-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00017-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00018-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00019-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00020-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00021-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00022-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00023-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00024-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00025-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00026-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00027-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00028-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00029-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00030-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00031-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00032-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00033-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00034-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00035-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00036-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00037-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00038-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00039-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00040-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00041-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00042-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00043-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00044-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00045-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00046-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00047-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00048-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00049-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00050-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00051-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00052-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00053-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00054-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00055-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00056-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00057-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00058-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00059-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00060-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00061-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00062-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00063-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00064-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00065-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00066-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00067-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00068-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00069-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00070-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00071-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00072-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00073-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00074-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00075-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00076-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00077-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00078-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00079-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00080-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00081-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00082-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00083-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00084-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00085-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00086-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00087-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00088-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00089-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00090-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00091-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00092-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00093-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00094-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00095-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00096-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00097-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00098-of-00100" \
"gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00099-of-00100" \
. and gsutil -m cp "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/schema.pbtxt" . Once the download is completed, you should change the dataset path statement on the codes to the local path of the dataset you downloaded. E. g, input_file_pattern = "/home/bit2/programs/ogbn-mag/samples-\?\?\?\?\?-of-00100"
graph_schema_file = "/home/bit2/programs/ogbn-mag/schema.pbtxt" Then, you can run your code like this python /home/bit2/programs/tfgnn_in_memory/ogbn_mag_e2e.py --samples /home/bit2/programs/ogbn-mag/samples-\?\?\?\?\?-of-00100 --graph_schema /home/bit2/programs/ogbn-mag/schema.pbtxt --base_dir ./ Note: I rewrote the ogbn_mag_e2e using |
Hi @ly3106 , thanks for reporting the twists you have noticed but overcome. We'll keep this in mind as we plan for future revisions of the OGBN-MAG demos. Regarding The OGBN-MAG example is meant to demonstrate how users can approach their own datasets, even if much larger than OGBN-MAG, by the combination of
A live demo from within Colab can only cover the second step, so we made a copy of the first step's output available for streaming into Colab and added a link to the Data Preparation guide that explains how the sampler was run. Nonetheless, in real applications, both steps belong together. (Recall from the Colab how the message passing pattern of the model defines the requirements for graph sampling.) Copying out of GCS after the first step is probably not a good idea. It only seems doable for OGBN-MAG because it's small compared to other datasets for which this approach works. Based on this reasoning, please allow me to close this issue. |
I'd like to add tf.distribute.MirroredStrategy to keras_trainer.py and int_arithmetic_sampler_test.py. But when I added it according to Distributed training with Keras, the two
.py
demos didn't work. So, what should I do?I remember the TF-GNN paper had introduced that for fast training and inference, deep graph models must be able to exploit parallel computations on specialized hardware. But the tutorials or demos haven't been given.
Could you please tell me where I can find some tutorials or demos, or tell me some methods to modify the above two demos? Or, just tell me the principle of the modification.
The text was updated successfully, but these errors were encountered: