Skip to content

Conversation

@qlzh727
Copy link
Member

@qlzh727 qlzh727 commented Mar 13, 2018

The current schema contains the entity information about model
and train data metadata, as well as machine config. Future change
will contain benchmark metric.

The json schema can be used to create bigquery table. A sample
table can be found in
https://bigquery.cloud.google.com/table/tf-benchmark-dashboard:test_benchmark.benchmark_run.

The current schema contains the entity information about model
and train data metadata, as well as machine config. Future change
will contain benchmark metric.

The json schema can be used to create bigquery table. A sample
table can be found in
https://bigquery.cloud.google.com/table/tf-benchmark-dashboard:test_benchmark.benchmark_run.
@qlzh727 qlzh727 requested a review from nealwu as a code owner March 13, 2018 22:19
@qlzh727 qlzh727 requested a review from tfboyd March 13, 2018 22:19
@qlzh727
Copy link
Member Author

qlzh727 commented Mar 14, 2018

@karmel karmel removed the request for review from nealwu March 14, 2018 16:39
"mode": "REPEATED",
"name": "attribute",
"type": "RECORD"
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other things that would be nice:

  • Commit hash indicating exactly what code was run.
  • Command line used to run the model
  • Any env variables set outside of the code that are relevant to the model itself rather than the compute environment (ie, TF_ENABLE_WINOGRAD_NONFUSED could be set from outside the code and would change algorithm choice inside the code).

Copy link
Member Author

@qlzh727 qlzh727 Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Adding tf verison info and environment variables. The command line info should be captured by the attributes.

"type": "RECORD"
},
{
"description": "The list of hyper parameter of the model.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hyperparameters

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
],
"mode": "REPEATED",
"name": "hyper_parameter",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit: one word

Copy link
Member Author

@qlzh727 qlzh727 Mar 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
"mode": "NULLABLE",
"name": "model",
"type": "STRING"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to indicate in some way how the GPUs are configured:

  • Cuda version
  • Topology params we care about (cf @tfboyd )?
  • Number of hosts will eventually become relevant, if we know it (ie, number of separate boards that all these GPUs live on)
    Not sure if we want to anticipate these upfront, or just add as they come.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added cuda_version which is standard, not sure we could capture other info easily or not.

"name": "version",
"type": "STRING"
}
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to try to capture cloud info here? ie, running on k8s versus a VM versus metal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added a section with minimal cloud info, and a free format key-value pair for the moment.

"mode": "NULLABLE",
"name": "memory_available",
"type": "STRING"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capturing env variables relevant to the compute environment would be good-- ie, CUDA_VISIBLE_DEVICES, whether to share GPU memory (sorry, forgetting what that one is right now, but it should suffice to say, there are many).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, for the moment, we will just dump them into env variables.

}
]
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you went with JSON instead of Proto, which WFM if you find it preferable. But, to make the question more complicated-- what about YAML? We will have a bunch of those for k8s anyhow, and it's much more human-readable without all these brackets. Thoughts?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema file is used to create bigquery table, and bigquery only accept json as schema format. I don't have other option here.

}
]
},
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should information about parameter server configuration live? I guess that's mostly about how the model itself is run. Maybe we don't need to explicitly store that as long as we capture the command line and code commit that was run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. I am not worrying that for the moment, we can update the data schema if we want in future.

@karmel
Copy link
Contributor

karmel commented Mar 14, 2018

Oh, and another thought: tensorflow build/version should be represented somewhere.

1. Added Tensorflow version information.
2. Added environment variables.
3. Fix typo for hyperparameters.
4. Added cloud related information.
@qlzh727
Copy link
Member Author

qlzh727 commented Mar 14, 2018

Ping

@qlzh727 qlzh727 merged commit bf5186b into tensorflow:master Mar 14, 2018
@qlzh727 qlzh727 deleted the benchmark-schema branch March 20, 2018 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants