-
Notifications
You must be signed in to change notification settings - Fork 45.4k
Add data schema for the benchmark run in Bigquery. #3585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The current schema contains the entity information about model and train data metadata, as well as machine config. Future change will contain benchmark metric. The json schema can be used to create bigquery table. A sample table can be found in https://bigquery.cloud.google.com/table/tf-benchmark-dashboard:test_benchmark.benchmark_run.
|
Sample tables can be found in https://bigquery.cloud.google.com/dataset/tf-benchmark-dashboard:test_benchmark. |
| "mode": "REPEATED", | ||
| "name": "attribute", | ||
| "type": "RECORD" | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other things that would be nice:
- Commit hash indicating exactly what code was run.
- Command line used to run the model
- Any env variables set outside of the code that are relevant to the model itself rather than the compute environment (ie, TF_ENABLE_WINOGRAD_NONFUSED could be set from outside the code and would change algorithm choice inside the code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Adding tf verison info and environment variables. The command line info should be captured by the attributes.
| "type": "RECORD" | ||
| }, | ||
| { | ||
| "description": "The list of hyper parameter of the model.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: hyperparameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| } | ||
| ], | ||
| "mode": "REPEATED", | ||
| "name": "hyper_parameter", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same nit: one word
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| { | ||
| "mode": "NULLABLE", | ||
| "name": "model", | ||
| "type": "STRING" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to indicate in some way how the GPUs are configured:
- Cuda version
- Topology params we care about (cf @tfboyd )?
- Number of hosts will eventually become relevant, if we know it (ie, number of separate boards that all these GPUs live on)
Not sure if we want to anticipate these upfront, or just add as they come.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added cuda_version which is standard, not sure we could capture other info easily or not.
| "name": "version", | ||
| "type": "STRING" | ||
| } | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to try to capture cloud info here? ie, running on k8s versus a VM versus metal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added a section with minimal cloud info, and a free format key-value pair for the moment.
| "mode": "NULLABLE", | ||
| "name": "memory_available", | ||
| "type": "STRING" | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capturing env variables relevant to the compute environment would be good-- ie, CUDA_VISIBLE_DEVICES, whether to share GPU memory (sorry, forgetting what that one is right now, but it should suffice to say, there are many).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, for the moment, we will just dump them into env variables.
| } | ||
| ] | ||
| }, | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you went with JSON instead of Proto, which WFM if you find it preferable. But, to make the question more complicated-- what about YAML? We will have a bunch of those for k8s anyhow, and it's much more human-readable without all these brackets. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The schema file is used to create bigquery table, and bigquery only accept json as schema format. I don't have other option here.
| } | ||
| ] | ||
| }, | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should information about parameter server configuration live? I guess that's mostly about how the model itself is run. Maybe we don't need to explicitly store that as long as we capture the command line and code commit that was run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. I am not worrying that for the moment, we can update the data schema if we want in future.
|
Oh, and another thought: tensorflow build/version should be represented somewhere. |
1. Added Tensorflow version information. 2. Added environment variables. 3. Fix typo for hyperparameters. 4. Added cloud related information.
|
Ping |
The current schema contains the entity information about model
and train data metadata, as well as machine config. Future change
will contain benchmark metric.
The json schema can be used to create bigquery table. A sample
table can be found in
https://bigquery.cloud.google.com/table/tf-benchmark-dashboard:test_benchmark.benchmark_run.