Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use tf.distribute.Strategy in TFGNN #177

Closed
ly3106 opened this issue Nov 15, 2022 · 4 comments
Closed

How to use tf.distribute.Strategy in TFGNN #177

ly3106 opened this issue Nov 15, 2022 · 4 comments
Assignees

Comments

@ly3106
Copy link

ly3106 commented Nov 15, 2022

I'd like to add tf.distribute.MirroredStrategy to keras_trainer.py and int_arithmetic_sampler_test.py. But when I added it according to Distributed training with Keras, the two .py demos didn't work. So, what should I do?

I remember the TF-GNN paper had introduced that for fast training and inference, deep graph models must be able to exploit parallel computations on specialized hardware. But the tutorials or demos haven't been given.

Could you please tell me where I can find some tutorials or demos, or tell me some methods to modify the above two demos? Or, just tell me the principle of the modification.

@phanein
Copy link
Collaborator

phanein commented Nov 15, 2022

Hi @ly3106,

The orchestration layer (https://github.com/tensorflow/gnn/tree/main/tensorflow_gnn/runner) is what we usually use for training models, and it will ideally work out of the box.

However, some improvements are underway for the in_memory stuff in the near term.

Could you provide:

  1. What version of the library are you using? (head, or 0.3.0?, etc)
  2. Some details about whatever error you encountered (stacktrace, MWE, etc)

@dzelle
Copy link
Contributor

dzelle commented Nov 15, 2022 via email

@ly3106
Copy link
Author

ly3106 commented Jan 2, 2023

Hi @phanein and @dzelle, thanks for your reply. Sorry, it took so long to reply to you, because I have been learning and testing your recommendation materials and demos during this period. After the actual test, I think they are very helpful for me. Eventually, I completed the parallel running on multiple GPUs. However, I think there will be some questions for TFGNN beginners. So, I list the problems I encountered here and give their solutions.

Actually, I didn't run the recommended demo but run this demo (I think it is a previous version of the former). The reasons I use the latter demo are that it was written by .ipynb which can directly run on Colab and it directly uses keras.kit than runner, so I can understand it like other general TensorFlow projects.

However, the two demos all use the same pre-processed dataset on Google Cloud Storage. Therefore, we need to download the dataset to your local computer. Otherwise, you should run the latter one on Google Colab. The former demo train.py didn't give the dataset path but the latter demo ogbn_mag_e2e.ipynb wrote the datapath on the code. The path is

input_file_pattern = "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-?????-of-00100"
graph_schema_file = "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/schema.pbtxt"

So, we should use gstil to download the dataset. Following is the downloading code

gsutil -m cp \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00000-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00001-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00002-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00003-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00004-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00005-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00006-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00007-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00008-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00009-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00010-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00011-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00012-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00013-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00014-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00015-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00016-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00017-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00018-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00019-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00020-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00021-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00022-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00023-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00024-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00025-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00026-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00027-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00028-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00029-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00030-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00031-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00032-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00033-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00034-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00035-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00036-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00037-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00038-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00039-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00040-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00041-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00042-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00043-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00044-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00045-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00046-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00047-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00048-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00049-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00050-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00051-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00052-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00053-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00054-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00055-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00056-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00057-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00058-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00059-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00060-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00061-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00062-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00063-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00064-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00065-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00066-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00067-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00068-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00069-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00070-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00071-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00072-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00073-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00074-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00075-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00076-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00077-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00078-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00079-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00080-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00081-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00082-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00083-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00084-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00085-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00086-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00087-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00088-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00089-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00090-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00091-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00092-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00093-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00094-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00095-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00096-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00097-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00098-of-00100" \
  "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-00099-of-00100" \
  .

and

gsutil -m cp "gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/schema.pbtxt" .

Once the download is completed, you should change the dataset path statement on the codes to the local path of the dataset you downloaded. E. g,

input_file_pattern = "/home/bit2/programs/ogbn-mag/samples-\?\?\?\?\?-of-00100"
graph_schema_file = "/home/bit2/programs/ogbn-mag/schema.pbtxt"

Then, you can run your code like this

python /home/bit2/programs/tfgnn_in_memory/ogbn_mag_e2e.py --samples /home/bit2/programs/ogbn-mag/samples-\?\?\?\?\?-of-00100 --graph_schema /home/bit2/programs/ogbn-mag/schema.pbtxt --base_dir ./

Note: I rewrote the ogbn_mag_e2e using .py rather than the original .ipynb.

@arnoegw
Copy link
Collaborator

arnoegw commented Mar 27, 2023

Hi @ly3106 , thanks for reporting the twists you have noticed but overcome. We'll keep this in mind as we plan for future revisions of the OGBN-MAG demos.

Regarding gs://download.tensorflow.org/data/ogbn-mag/sampled/v1/edge/samples-?????-of-00100, I don't think we should ask users and download it from Google Cloud to their own machine or cluster, because it is the large temporary data between the graph sampler and the actual GNN model.

The OGBN-MAG example is meant to demonstrate how users can approach their own datasets, even if much larger than OGBN-MAG, by the combination of

  • subgraph sampling to a filesystem, followed by
  • streaming from these files into a trainer script.

A live demo from within Colab can only cover the second step, so we made a copy of the first step's output available for streaming into Colab and added a link to the Data Preparation guide that explains how the sampler was run.

Nonetheless, in real applications, both steps belong together. (Recall from the Colab how the message passing pattern of the model defines the requirements for graph sampling.) Copying out of GCS after the first step is probably not a good idea. It only seems doable for OGBN-MAG because it's small compared to other datasets for which this approach works.

Based on this reasoning, please allow me to close this issue.

@arnoegw arnoegw closed this as completed Mar 27, 2023
@arnoegw arnoegw self-assigned this Mar 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants