Skip to content

Conversation

aneilbaboo
Copy link
Contributor

Added two client methods which enable importing data from Google Cloud Storage files, and waiting until the job is complete:

  • client.import_data_from_uris
  • client.wait_for_job

These methods have been tested against live Google Cloud Storage and BigQuery instances, and are currently being used in production.

schema = [ {"name": "username", "type": "string", "mode": "nullable"} ]
job = client.import_data_from_uris( ['gs://mybucket/mydata.json'],
                                    'dataset',
                                    'table',
                                    schema,
                                    source_format=JOB_SOURCE_FORMAT_JSON)

job = client.wait_for_job(job, timeout=60) 
print(job)

aneilbaboo and others added 14 commits August 9, 2014 00:01
The import_data_from_uris method imports data from Google Cloud
Storage.   The wait_for_job method takes a job resource or jobId
and waits until the job completes, polling with a user-specified interval
and timing out after a different user-specified interval.
Ensure that wait_for_job accepts a jobId.
Got rid of the default values in import_data_from_uri.  If values are not
provided, they are not inserted into the load configuration.  Also fixed
the structure of the load configuration JSON.  Still need to write tests.
Previous commits fixed the client after testing directly with BigQuery.
This commit fixes the tests to reflect the actual return values from
BigQuery, and adds tests for parameters to import_data_from_uris,
some of which are not permitted when source_format is not "CSV".
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the None values are being used in a falsey way, but the defaults should probably be False to be more explicit to users that these are bool values. Same goes for the other bool kwargs.

Optional parameters are not injected into the job configuration,
default values are given by BigQuery.  The docstring now
correctly explains this.  Also shortened some constants.
@tylertreat
Copy link
Owner

Looks great! I will cut a new release shortly.

tylertreat added a commit that referenced this pull request Aug 26, 2014
Enable data import from Google Cloud Storage files
@tylertreat tylertreat merged commit 57ddd41 into tylertreat:master Aug 26, 2014
@aneilbaboo
Copy link
Contributor Author

Awesome. TY.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants