Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API to query Historical data #3874

Closed
slayerjain opened this issue Jan 6, 2021 · 4 comments
Closed

API to query Historical data #3874

slayerjain opened this issue Jan 6, 2021 · 4 comments

Comments

@slayerjain
Copy link

Is your feature request related to a problem? Please describe.
we want to do some simple analytics on historical events. Eg: count the number of events where some key in the output JSON has ‘x’ value.

For Cadence, we tried querying Cassandra directly, but it seems it serialized into a thrift binary object. We were able to find the thrift definitions but not sure of the right data structure. We’re able to query from S3 but since we’ve set up archival with retention of 14 days, the events only arrive in S3 after 14 days.
Proposed Solution
Maybe have an API which exposes this data, or a way to export data to a data warehouse.

Additional context
https://community.temporal.io/t/whats-the-best-way-to-do-simple-analytics-on-historical-data/1264

@yycptt
Copy link
Contributor

yycptt commented Jan 7, 2021

If you already have the workflowID and runID, you can call the GetWorkflowExecutionHistory API directly, it will automatically query archived history if it can't find the workflow history in Cassandra.

If you don't have workflowID and runID, you can get those info from ElasticSearch or visibility archival. Visibility archival happens after workflow closes instead of the retention period, so data will be available soon after the workflow closes.

@slayerjain
Copy link
Author

slayerjain commented Jan 8, 2021

@yycptt Thanks for the response.

what if we want to build dashboards based on the results of events? should we use the API to query the data or directly read the DB (either batched or CDC) and process them to feed the dashboards?

Querying the transactional system for analytics purpose might also put unnecessary load on it too.

@longquanzheng
Copy link
Collaborator

what if we want to build dashboards based on the results of events?

Those metrics are already emitted by Cadence server. See the bottom of Client dashboard/monitor in https://docs.google.com/document/d/1tQyLv2gEMDOjzFibKeuVYAA4fucjUFlxpojkOMAIwnA/edit#

It will be moved to cadence-docs soon.

@meiliang86
Copy link
Contributor

@slayerjain We don't recommend looking into history events except for debugging purposes. These are implementation details of Cadence and can change in later released.
You should be able to emit these metrics in your workflows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants