Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using closed intervals to avoid gaps in partially cached timeseries #522

Merged

Conversation

samuelgmartinez
Copy link
Contributor

As explained in #466, when the date range specified is partially catched, the missRanges are not calculated properly, generating queries that return gaps in the series.

@CLAassistant
Copy link

CLAassistant commented Jan 23, 2021

CLA assistant check
All committers have signed the CLA.

@samuelgmartinez
Copy link
Contributor Author

samuelgmartinez commented Jan 23, 2021

The TestClockOffsetWarning is failing no matter which commit I use to build and run the tests: from 1.1.0-beta to this commit, it always fails :/

Also, worth mentioning that it looks like proxy_request.go is not using the proper upstream URL objects to format the access logs. I assume that for a regular HTTP Proxy values from proxyRequest.Request.URL are the same as proxyRequest.upstreamRequest.URL but for cases like influxdb input and upstream request are quite different (partial hits, date interval normalization, etc).

In the case the scenario mentioned above is right, does it make sense to open an issue or should I go ahead and fix it this same PR?

@jranson
Copy link
Member

jranson commented Jan 23, 2021

@samuelgmartinez it actually looks like the test action is failing in the locks package (which we just updated recently and have been waiting on more feedback from the reporter before cutting a release). It looks like it is timing out after 2 minutes when it should normally take about 6 seconds. I will review this package again and get it fixed up in a separate PR, and merge it back here once it's fixed up. Your changes look good. If you would not mind doing the URL changes in a separate PR, that would be great, since it would affect more than just influxdb (no need to file an issue). Thanks for your contributions, we really appreciate it!

=== RUN   TestLocksConcurrent
FAIL	github.com/tricksterproxy/trickster/pkg/locks	122.207s

@samuelgmartinez
Copy link
Contributor Author

samuelgmartinez commented Jan 27, 2021

I pushed a fix for the TestClockOffsetWarning test. The test won't pass in any local environment if the computer running the tests uses any UTC+1 timezone.

Creates a Now() in local TZ (CET in my case), serialized in local tz but marked as if it was GMT, then parsed as GMT. As an example:

  • Now() is 12:00:00 +1
  • modifies the virtual date with 1h offset.
  • serialized as 11:00:00 GMT.
  • When compared, the parsed from the header UTC date (11:00:00 UTC) and the Now() (12:00:00 CET) are the same.

Also, marked the PR as ready to review. Let me know if you want any changes.

For the Prometheus use cases, I rather open a new PR with a proper testing environment to validate the parameters sent to the origin work as expected.

@samuelgmartinez samuelgmartinez marked this pull request as ready for review January 27, 2021 12:31
@@ -310,7 +310,7 @@ checkCache:
if resp.StatusCode == http.StatusOK && len(body) > 0 {
nts, err := client.UnmarshalTimeseries(body)
if err != nil {
pr.Logger.Error("proxy object unmarshaling failed",
pr.Logger.Error("proxy object unmarshalling failed",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

@jranson
Copy link
Member

jranson commented Jan 28, 2021

Thanks for this! Separately, we took a hard look at the Locks package to make sure the sporadic timeouts was also testing issue and not a deficiency in the locks class itself. We determined that test in question was just overwhelming the github actions VM vs. running in a more beefy local dev environment. I will push a separate PR that modifies that test to not be as intense for the CI process. We will also do a quick check of anywhere else that time.Now() is being used, to ensure everything is UTC. The work you have put into all of this much appreciated.

@jranson jranson merged commit a22b01f into trickstercache:v1.1.x Jan 28, 2021
@samuelgmartinez samuelgmartinez deleted the bugfix/466-closed-date-intervals branch January 28, 2021 14:49
@samuelgmartinez
Copy link
Contributor Author

samuelgmartinez commented Jan 28, 2021

I'll try to open a new PR to backport it to the master branch. I just saw that the code was refactored and some packages changed.

Regarding this fix being rolled out, do you have a plan to release this as part of the 1.1.4?

Thanks for your feedback and for maintaining the project! 💪

@jranson
Copy link
Member

jranson commented Jan 28, 2021

we will publish 1.1.4-rc2 today as a pre-release here and on Docker Hub, and give it a good 24-hour soak, and then cut stable release. For main, I am happy to fix up the UTC issues you found on the Offset Warning Test as part of our larger sweep of that particular issue, and have made a note to verify that the core issue w/ InfluxDB is also corrected in main. If you are interested in checking or getting more familiar with the 2.0 codebase, this functionality is in /pkg/timeseries/dataset, which is the new Common Time Series Format. In 2.0, all backend providers' json/csv blobs are marshaled into a *DataSet, so that they use a common collection of data manipulation functions (merge, sort, crop, etc.), rather than re-implementing that functionality per-provider. That way, the provider packages are only concerned with url manipulation and data un/marshaling. Hopefully that provides an easier path for new contributors who just want to enable a new backend.

@jranson
Copy link
Member

jranson commented Jan 28, 2021

You are good to go with the pre-release here on github and at tricksterproxy/trickster:1.1.4-rc2 on Docker Hub. Please let me know if you find any issues.

@samuelgmartinez
Copy link
Contributor Author

We'll deploy it tomorrow (European timezone) on our test env and I'll file an issue if we run into any bug.

@jranson
Copy link
Member

jranson commented Jan 30, 2021

@samuelgmartinez our 24-hour soak looks good, and we're all set to cut the release, but i'll hold of for a thumbs up from you that it resolved your issues. thanks so much

trickster-1 1 4-rc2 loadtest 28h

@samuelgmartinez
Copy link
Contributor Author

@jranson We deployed it in our lab environment and it looks like the issues are now fixed and it's worked as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants