Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daily summary for 'wind' has wrong values for dirsumtime #642

Closed
tkeffer opened this issue Jan 23, 2021 · 13 comments
Closed

Daily summary for 'wind' has wrong values for dirsumtime #642

tkeffer opened this issue Jan 23, 2021 · 13 comments

Comments

@tkeffer
Copy link
Contributor

tkeffer commented Jan 23, 2021

Using V4.2, we get these values for archive_day_wind:

sqlite> select dateTime, datetime(dateTime,'unixepoch','localtime'), xsum, ysum, dirsumtime from archive_day_wind where dateTime>1606780800;
1610524800|2021-01-13 00:00:00|-53.5641450403612|12.2458698356829|60
1610611200|2021-01-14 00:00:00|102.363033875865|-110.682615845738|288
1610697600|2021-01-15 00:00:00|-158.390950915973|-21.6570588513141|288
1610784000|2021-01-16 00:00:00|-50.3255480631354|-1.31546332640262|288
1610870400|2021-01-17 00:00:00|-785.954051501476|81.1908893497276|288
1610956800|2021-01-18 00:00:00|-48.9877413716206|-72.6094853973673|288
1611043200|2021-01-19 00:00:00|-45.7519937394591|-95.1474221069701|288
1611129600|2021-01-20 00:00:00|-3.8850944071945|20.3794476089489|288
1611216000|2021-01-21 00:00:00|111.064810206522|56.8100266596077|288
1611302400|2021-01-22 00:00:00|125.862310064998|26.0928379065027|196

After upgrading to V4.3 and running the reweighting patch, we get these values:

sqlite> select dateTime, datetime(dateTime,'unixepoch','localtime'), xsum, ysum, dirsumtime from archive_day_wind where dateTime>1606780800;
1610524800|2021-01-13 00:00:00|-16069.2435121084|3673.76095070486|60
1610611200|2021-01-14 00:00:00|30708.9101627597|-33204.7847537214|288
1610697600|2021-01-15 00:00:00|-47517.2852747918|-6497.11765539424|288
1610784000|2021-01-16 00:00:00|-15097.6644189406|-394.638997920752|288
1610870400|2021-01-17 00:00:00|-235786.215450443|24357.2668049182|288
1610956800|2021-01-18 00:00:00|-14696.3224114862|-21782.8456192102|288
1611043200|2021-01-19 00:00:00|-13725.5981218376|-28544.226632091|288
1611129600|2021-01-20 00:00:00|-1165.52832215836|6113.83428268467|288
1611216000|2021-01-21 00:00:00|33319.4430619566|17043.0079978823|288
1611302400|2021-01-22 00:00:00|37758.6930194993|7827.85137195082|196

The values for xsum and ysum got updated, but not dirsumtime.

@gjr80
Copy link
Contributor

gjr80 commented Jan 23, 2021

So if I understand this correctly the fix is simply to add dirsumtime to the list of accumulator properties at manager.DaySummaryManager._set_day_sums() that are weighted? Seems eerily simple.

That would fix any upgrades from 4.2 to the next release but what solution do we want to do for those on 4.3? Do we need to have daily summaries v4 so that we can reweight from the (erroneous) v3 to v4?

@tkeffer
Copy link
Contributor Author

tkeffer commented Jan 23, 2021

That does seem to be the problem. I guess we will need a V4.

Sigh.

Thanks so much. I was so steamed about the problem I was having a hard time seeing anything.

@gjr80
Copy link
Contributor

gjr80 commented Jan 23, 2021

OK. I’ve created branch dirsumtime. Done the easy bit, will work on the V4 daily summaries bit today.

@gjr80
Copy link
Contributor

gjr80 commented Jan 23, 2021

Been a while since I worked with patching daily summaries to v2.0 so thinking aloud to make sure v4.0 will cover all possibilities. Let's assume v4.0 daily summaries is release as WeeWX v4.4.0.

  1. v1.0 daily summaries. If v4.4.0 encounters v1.0 daily summaries the now updated manager.DaySummaryManager.recalculate_weights() kicks in and upgrades the daily summaries to v4.0. Nothing further required.

  2. v2.0 daily summaries. If v4.4.0 encounters v2.0 daily summaries the now updated ?manager.DaySummaryManager.recalculate_weights() kicks in and upgrades the daily summaries to v4.0. Nothing further required.

  3. v3.0 daily summaries. If v4.4.0 encounters v3.0 daily summaries it could find two distinct v3.0 states.

  • first possible state (let's call it 'v4.3.0 clean') is where a user has upgraded to v4.3.0 and the re-weighting was done incorrectly with incorrect dirsumtime
  • second possible state (let's call it v4.3.0 rebuilt) is where a user has upgraded to v4.3.0 and the daily summaries dirsumtime problem has been fixed (deliberately or unknowingly) by the user by rebuilding the daily summaries with wee_database after the upgrade to v4.3.0
  • we can fix the v4.3.0 clean state by either correctly re-weighting dirsumtime or by rebuilding the affected daily summaries. Since v4.3.0 only re-weighted daily summaries >= 1 June 2020 we only have to re-weight/rebuild daily summaries >= 1 June 2020.
  • there is no need to do anything to a database in the v4.3.0 clean state
  • unfortunately it will be impossible for WeeWX to tell these two states apart. So re-weighting dirsumtime is not a viable solution (v4.3.0 clean databases would end up with dirsumtime being overweight(?)). That leaves rebuilding the daily summaries >= 1 June 2020 as the only practical solution for v4.3.0 daily summaries.

So I think if we just rebuild the >= 1 June 2020 daily summaries on encountering v3.0 daily summaries we should be write. The other cases will be taken care of with the existing code.

One question springs to mind; how 'good' do we want to be when rebuilding the daily summaries, do we want to preserve highs and lows (will require more code) or not (simple code, just a call to wee_database)?

@gjr80
Copy link
Contributor

gjr80 commented Jan 24, 2021

OK, so the 3:30am start this morning clouded my brain. If we encounter v3.0 daily summaries the fix is not to rebuild the daily summaries but rather re-weight again from 1 June 2020 but this time with the updated manager.DaySummaryManager.recalculate_weights(). The hardest part here is going to be setting up the test cases to verify it works.

@gjr80
Copy link
Contributor

gjr80 commented Jan 24, 2021

Not my day, I've committed changes to branch dirsumtime that implements V4.0 database, fixes the dirsumtime omission of v4.3.0 and should correctly patch V1.0, V2.0 and V3.0 databases upon upgrade/install. Branch dirsumtime was made off master, realise that probably should have been off development but I saw changes to manager.py in master (and none in development) and did not want to risk losing those given the day I am having.

My apologies for the intermingled 40 odd changes to manager.py that remove whitespace, my PyCharm/GIT foo is lacking today.

I've yet to test the upgrades from various database versions, will get onto that next.

@tkeffer
Copy link
Contributor Author

tkeffer commented Jan 25, 2021

Having messed this up in the first place, I'm in no position to offer advice, but my instinct says to keep the V2 to V3 machinery, but this time get it right. Then invoke it for databases that are at either V2 or V3, ending up with a database that is unambiguously at V4.

That's slower than a specialized V3 to V4 update, but probably more bulletproof, with one less thing that we have to test.

Final step is to add a regression test to tests/test_manager.py.

@gjr80
Copy link
Contributor

gjr80 commented Jan 25, 2021

Having messed this up in the first place, I'm in no position to offer advice

I don't think you really mean that.

but my instinct says to keep the V2 to V3 machinery, but this time get it right. Then invoke it for databases that are at either V2 or V3, ending up with a database that is unambiguously at V4.

That is what I have done, the old patch_sums() method has been tweaked to process either v2.0 or v3.0 databases. Everything subordinate to patch_sums() is unchanged (except of course for fixing the cause by adding back dirsumtime).

That's slower than a specialized V3 to V4 update, but probably more bulletproof, with one less thing that we have to test.

Started down that path but was just going to duplicate too much code. Wasn't too worried about time, your method for updating from v2.0 to v3.0 was a big improvement over the method used patching V1.0 to v2.0.

Final step is to add a regression test to tests/test_manager.py.

Oh joy.

@gjr80
Copy link
Contributor

gjr80 commented Jan 26, 2021

Limited testing complete, I'm satisfied it's doing as it should.

Method and results

  • installed v3.5.0 and generated three months of v1.0 data that straddled 1 June 2020, confirmed db version and that daily summaries were unweighted

  • Run 1

    • used v3.5.0 with original three months of generated v1.0 data
    • upgraded to v3.7.0 and used wee_database --update to update to v2.0 db, confirmed v2.0 db and weighted daily summaries, ran simulator (generator) to add a few days to db
    • upgraded to v4.2.0 and ran simulator to add a few days to the db, confirmed v2.0 db but the added days were unweighted (confirms the 4.2.0 bug)
    • upgraded to v4.3.0 and ran simulator to add a few days to the db, confirmed v2.0 db was upgraded to v3.0, confirmed that dirsumtime for added days is not weighted but other fields were (confirms the 4.3.0 bug)
    • upgraded to dirsumtime branch and ran simulator to add a few days to the db, confirmed v3.0 db was upgraded to v4.0, confirmed that dirsumtime was correctly weighted, confirmed that added days were correctly weighted
  • Run 2

    • re-installed v3.5.0 with original three months of generated v1.0 data
    • upgraded to dirsumtime branch and ran simulator to add a few days to the db, confirmed v1.0 db, confirmed dirsumtime was unweighted, confirmed that added days were unweighted
  • Run 3

    • re-installed v3.5.0 with original three months of generated v1.0 data
    • upgraded to v3.7.0 and used wee_database --update to update to v2.0 db, confirmed v2.0 db and weighted daily summaries, ran simulator (generator) to add a few days to db
    • upgraded to dirsumtime branch and ran simulator to add a few days to the db, confirmed v2.0 db was upgraded to v4.0, confirmed that dirsumtime was correctly weighted, confirmed that added days were correctly weighted
  • Run 4

    • re-installed v3.5.0 with original three months of generated v1.0 data
    • upgraded to v3.7.0 and used wee_database --update to update to v2.0 database, confirmed v2.0 db and weighted daily summaries, ran simulator (generator) to add a few days to db
    • upgraded to v4.2.0 and ran simulator to add a few days to the db, confirmed v2.0 db but the added days were unweighted (confirms the 4.2.0 bug)
    • upgraded to dirsumtime branch and ran simulator to add a few days to the db, confirmed v2.0 db was upgraded to v4.0, confirmed that dirsumtime was correctly weighted, confirmed that added days were correctly weighted
  • Run 5

    • installed a clean dirsumtime branch ran simulator to generate a few days of data, confirmed v4.0 db and all fields correctly weighted

Made a few assumptions in testing:

  • only looked at xsum, ysum and dirsumtime in archive_day_wind in determining whether weighting was correctly applied or not
  • only tested SQLite; assumed MySQL performs identically
  • only tested key releases that have different database versions/known weighting bugs

Will move onto the regression test and documentation.

@gjr80
Copy link
Contributor

gjr80 commented Jan 26, 2021

Before I go down the wrong track re regression testing can I confirm your intent.

Current regression testing just checks sum, sumtime and wsum. To check sum, sum and dirsumtime is fairly simple, just a few simple queries. Checking xsum, yum and dirsumtime is a bit more complex with each archive wind vector needing to be decomposed into x and y components and summed. Happy to do the work but wanted to check you did not have some other simpler test in mind.

@tkeffer
Copy link
Contributor Author

tkeffer commented Jan 27, 2021

It's a regression test, so it's OK to hard wire in the correct values for xsum, ysum, and dirsumtime, then compare against that. How do you get the correct values? An easy way is to use the values being created by your dirsumtime branch, but hand check a few of them to convince yourself that you've got the right numbers.

@gjr80
Copy link
Contributor

gjr80 commented Jan 28, 2021

Patch/reweighting code finished and operation confirmed as of commit 5f972cb.

Test suite updated at commit 852903a.

Documentation updated at commit 69514e9.

I think we are done here. I am happy to close.

@tkeffer
Copy link
Contributor Author

tkeffer commented Jan 29, 2021

Merged into master.

Thanks so much for taking care of this, Gary. It was driving me crazy.

@tkeffer tkeffer closed this as completed Jan 29, 2021
tkeffer added a commit that referenced this issue Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants