Add support for scheduling Data Update jobs #891

vogelsgesang · 2021-09-18T14:49:11Z

To test this proposed interface locally, you can install my branch using

 pip install git+https://github.com/vogelsgesang/server-client-python.git@update-jobs

You can find the documentation for this method at https://vogelsgesang.github.io/server-client-python/docs/api-ref#datasourceupdate_data

Commit Description

This commit adds support for the datasources/<id>/data endpoint
through which one can schedule jobs to update the data within a
published live-to-Hyper datasource on the server.

The new datasources.update_data expects the arguments:

a datasource or a connection: If the datasource only contains a
single connections, the datasource is sufficient to identify which
Hyper file should be updated. Otherwise, for datasources with
multiple connections, the connections has to be provided. This
distinction happens on the server, so the client library only needs
to provide a way to specify either of both.
a request_id which will be used to ensure idempotency on the server.
This parameter is simply passed as a HTTP header .
an actions list, specifying how exactly the data on the server
should be modified. We expect the caller to provide list following
the structure documented in the REST API documentation. TSC does
not validate this object and simply passes it through to the server.
an optional payload file: For actions like insert, one can
provide a Hyper file which contains the newly inserted tuples or
other payload data. TSC will upload this file to the server and then
hand it over to the update-API endpoint.

Besides the addition of the datasources.update_data itself, this
commit also adds some infrastructure changes, e.g., to enable sending
PATCH requests and headers.

The documentation for the REST endpoint can be found at
https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm

Adds the documentation for the `datasources.update_data` method added in PR tableau#891.

vogelsgesang · 2021-09-18T16:11:55Z

@jharris126, @jonas-eckhardt, @FrancoisZim
I would love to hear your feedback on the interfaces proposed in this commit. Currently, we can still change it without breaking compatibility... So now is the time to chime in with any feedback you might have ;)

jharris126 · 2021-09-18T18:35:43Z

This was really intuitive for me. It took me only 15 minutes to refactor an existing use case I had including setting up a new venv and installing other deps, and the refactor cut the number of lines of code needed in half. See my only recommendation about request_id.

jonas-eckhardt · 2021-09-20T07:00:22Z

tableauserverclient/server/endpoint/datasources_endpoint.py

+    @api(version="3.13")
+    def update_data(self, datasource_or_connection_item, *, request_id, actions, payload = None):


Can you add some documentation about the parameters and what this method does + links to the documentation of the API?

How does this work? Do we split the documentation from the code or is this somehow merged?

for TSC, documentation and code are separated.
Not sure why, but that's how it is...
(Personally, I would prefer if functions would be documented in the source code itself, and the docs were generated from the source code, but that's not how TSC is currently handling its documentation)

Should this use BackgroundJobItem instead of JobItem?

not sure what the difference between JobItem and BackgroundJobItem is.
@t8y8, @bcantoni which one should I use here?

I've been trying to work out if the update_data method should wait for complete or return the jobitem and it looks like returning the jobitem is more consistent as this matches the other async method here (refresh).

To make this easier to use would you be open to adding a convenience method to the Jobs endpoint to wait for job completion - (e.g.
`
@api(version="3.13")
def wait_for_async_job(self, async_job_id, timeout=None):

ASYNC_JOB_POLL_INTERVAL=30 completed_at = None finish_code = None jobinfo = None wait_start_time = time.time() while completed_at is None: time.sleep(ASYNC_JOB_POLL_INTERVAL) jobinfo = self.get_by_id(async_job_id) completed_at = jobinfo.completed_at if completed_at is None: logger.debug("Job {} ... progress={}".format(async_job_id, jobinfo.progress)) if timeout is not None: wait_elapsed_time = (time.time() - wait_start_time) * 1000 if wait_elapsed_time > timeout: raise TimeoutError(f"Timeout after {wait_elapsed_time} seconds waiting for asynchronous job: {async_job_id}") else: finish_code = jobinfo.finish_code logger.info("Job {} Completed: Finish Code: {} - Notes:{}".format(async_job_id, finish_code, jobinfo.notes)) return finish_code

`

Do you think that the need to write the json for the "actions" parameter makes this harder to use than it needs to be? - I think we should keep this method so that you have the flexibility of writing your own 'actions' json but also create some convenience methods to pass the options as parameters - e.g.

def update_datasource( self, datasource_or_connection_item, *, payload, matching_columns=[['ROWID','ROWID']], source_schema="Extract", source_table="Extract", target_schema="Extract", target_table="Extract" ): match_conditions_args = [] for match_pair in matching_columns: match_conditions_args.append( { "op": "eq", "source-col": match_pair[0], "target-col": match_pair[1], } ) if len(match_conditions_args) > 1: match_conditions_json = {"op": "and", "args": match_conditions_args} else: match_conditions_json = match_conditions_args[0] json_request = [ # UPDATE action { "action": "update", "source-schema": source_schema, "source-table": source_table, "target-schema": target_schema, "target-table": target_table, "condition": match_conditions_json, }, ] request_id = str(uuid.uuid4()) return self.update_data(datasource_or_connection_item, request_id, json_request, payload)

i.e. hide some of the internals and allow an update by just calling:
server.datasources.update_datasource(datasource_id, path_to_database,[[keycol1,keycol1],[keycol2,keycol2]])

adding a convenience method to the Jobs endpoint to wait for job completion

Implemented a proposal in #903. Please provide feedback on the proposed wait_for_job in that other PR.

create some convenience methods to pass the options as parameters

I would prefer not to do so. We have a documented and powerful "actions" language, i.e. the JSON structures. We shouldn't invent yet another "wrapper mini-language" here in this Python library.

Common cases (such as a short-hand notation for matching_columns which doesn't need explicit op: eq elements) should rather be added on the server-side. That way, all clients (also JavaScript clients, etc.) can benefit from it.

(actually, there is a proposal for a matching_colums short-hand syntax. It's just that we never implemented it... @jonas-eckhardt do you think we should re-prioritize that convenience feature on the server-side?)

This commit adds support for the `datasources/<id>/data` endpoint through which one can schedule jobs to update the data within a published live-to-Hyper datasource on the server. The new `datasources.update_data` expects the arguments: * a datasource or a connection: If the datasource only contains a single connections, the datasource is sufficient to identify which Hyper file should be updated. Otherwise, for datasources with multiple connections, the connections has to be provided. This distinction happens on the server, so the client library only needs to provide a way to specify either of both. * a `request_id` which will be used to ensure idempotency on the server. This parameter is simply passed as a HTTP header . * an `actions` list, specifying how exactly the data on the server should be modified. We expect the caller to provide list following the structure documented in the REST API documentation. TSC does not validate this object and simply passes it through to the server. * an optional `payload` file: For actions like `insert`, one can provide a Hyper file which contains the newly inserted tuples or other payload data. TSC will upload this file to the server and then hand it over to the update-API endpoint. Besides the addition of the `datasources.update_data` itself, this commit also adds some infrastructure changes, e.g., to enable sending PATCH requests and HTTP headers. The documentation for the REST endpoint can be found at https://help.tableau.com/current/api/rest_api/en-us/REST/rest_api_how_to_update_data_to_hyper.htm

vogelsgesang · 2021-09-24T00:45:31Z

@t8y8 @bcantoni @jacalata I think this PR is ready for code review now.
The feedback received so far makes me confident that the proposed interface is reasonable and helpful.
Successful test case run: https://github.com/vogelsgesang/server-client-python/actions/runs/1268009840

Could I get a review from one of you before I merge this?

Also, @FrancoisZim brought up the question

Should this use BackgroundJobItem instead of JobItem?

above which I wasn't able to answer since I don't really understand the difference between JobItem and BackgrounJobItem. Maybe one of you is able to answer that

jacalata

Mm, I'm not familiar with the job vs BackgroundJob items, but I think it's safe enough to use JobItem. BackgroundJobItem doesn't appear to be used anywhere.

vogelsgesang marked this pull request as draft September 18, 2021 16:03

vogelsgesang added a commit to vogelsgesang/server-client-python that referenced this pull request Sep 18, 2021

Document the datasources.update_data method

df5686e

Adds the documentation for the `datasources.update_data` method added in PR tableau#891.

vogelsgesang mentioned this pull request Sep 18, 2021

Document the datasources.update_hyper_data method #893

Merged

jonas-eckhardt reviewed Sep 20, 2021

View reviewed changes

vogelsgesang mentioned this pull request Sep 24, 2021

Add jobs.wait_for_job method #903

Merged

vogelsgesang force-pushed the update-jobs branch from eec2f88 to 7129b24 Compare September 24, 2021 00:22

vogelsgesang changed the base branch from master to development September 24, 2021 00:23

vogelsgesang marked this pull request as ready for review September 24, 2021 00:24

vogelsgesang force-pushed the update-jobs branch from 7129b24 to b8d4a01 Compare September 24, 2021 00:31

jacalata approved these changes Sep 28, 2021

View reviewed changes

vogelsgesang merged commit 7c03396 into tableau:development Sep 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for scheduling Data Update jobs #891

Add support for scheduling Data Update jobs #891

Uh oh!

vogelsgesang commented Sep 18, 2021 •

edited

Loading

Uh oh!

vogelsgesang commented Sep 18, 2021

Uh oh!

jharris126 commented Sep 18, 2021 •

edited

Loading

Uh oh!

jonas-eckhardt Sep 20, 2021

Uh oh!

vogelsgesang Sep 20, 2021

Uh oh!

jonas-eckhardt Sep 20, 2021

Uh oh!

vogelsgesang Sep 20, 2021

Uh oh!

FrancoisZim Sep 21, 2021

Uh oh!

vogelsgesang Sep 21, 2021

Uh oh!

FrancoisZim Sep 22, 2021

Uh oh!

FrancoisZim Sep 22, 2021

Uh oh!

vogelsgesang Sep 24, 2021

Uh oh!

vogelsgesang commented Sep 24, 2021

Uh oh!

jacalata left a comment

Uh oh!

Uh oh!

		@api(version="3.13")
		def update_data(self, datasource_or_connection_item, *, request_id, actions, payload = None):

Add support for scheduling Data Update jobs #891

Add support for scheduling Data Update jobs #891

Uh oh!

Conversation

vogelsgesang commented Sep 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vogelsgesang commented Sep 18, 2021

Uh oh!

jharris126 commented Sep 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vogelsgesang commented Sep 24, 2021

Uh oh!

jacalata left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vogelsgesang commented Sep 18, 2021 •

edited

Loading

jharris126 commented Sep 18, 2021 •

edited

Loading