Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow selective daily summaries to be rebuilt #117

Closed
tkeffer opened this issue Apr 22, 2016 · 11 comments
Closed

Allow selective daily summaries to be rebuilt #117

tkeffer opened this issue Apr 22, 2016 · 11 comments
Assignees

Comments

@tkeffer
Copy link
Contributor

tkeffer commented Apr 22, 2016

Specifically, allow daily summaries only following a specific date to be deleted and rebuilt.

@gjr80
Copy link
Contributor

gjr80 commented May 30, 2016

Needing some time out from importing from WU so giving this some thought...

I was thinking a solution might be extending wee_database with a --date command line option that would accept a date or a date range for use with --drop-daily and --backfill-daily eg

$ wee_database --drop-daily --date=2016/5/25
$ wee_database --drop-daily --date=2016/5/25-
$ wee_database --drop-daily --date=2016/5/25-2016/5/28

would drop the daily summary entries for 25 May 2016, 25 May 2016 to present inclusive and 25-28 May 2016 inclusive respectively. Similiarly:

$ wee_database --backfill-daily --date=2016/5/25
$ wee_database --backfill-daily --date=2016/5/25-
$ wee_database --backfill-daily --date=2016/5/25-2016/5/28

would backfill the daily summary entries for 25 May 2016, 25 May 2016 to present inclusive and 25-28 May 2016 inclusive respectively.

Is this the sort of solution you had in mind? I'll qualify my suggestion with the fact that my choice of parameter names often leaves a lot to be desired.

@tkeffer
Copy link
Contributor Author

tkeffer commented May 30, 2016

Thinking out loud here...

Because there is no point in dropping a set of dates without rebuilding them, it could be done in an all-in-one step. This also eliminates the possibility of dropping the dates and then forgetting to backfill them. Finally, I'd just attach the date to the rebuild command, resulting in

$wee_database --rebuild-daily=2016/5/25-2016/5/28

@gjr80
Copy link
Contributor

gjr80 commented May 31, 2016

Makes sense. Are you thinking there should be another command --rebuild-daily or was that a typo and did you really mean the existing --backfill-daily?

Been thinking about the process of dropping/rebuilding the daily summaries. When I have had to rebuild I have always blindly dropped the daily summaries and then rebuilt them. First thoughts on the equivalent process for rebuilding, say a few days of summaries, was to delete each row for each day in each daily summary table then perform a rebuild over the period concerned.

Rebuilding is easy, DaySummaryManager.backfill_day_summary() accepts a start_ts and stop_ts parameter. On the other hand, DaySummaryManager.drop_daily() drops all the daily summary tables, not quite what I want. I wondered, if we are just updating a small number of records in each daily summary table do we need to delete the records first, or will the SQL REPLACE INTO statement take care of everything?

Are there any corner cases where it will fall down? If there is a record for a given day in the daily sumamries, but the rebuild discovers no data in the (maybe new) archive for that day, I would expect the daily summaries to be filled with None for that day, or have I read the code wrong. If there is no data in the daily sumamries for a given day, but the (maybe new) archive does have data for that day, then that will rebuild OK. Maybe it will fall down if there is a daily summary for, say a non-standard obs such as appTemp, but the (maybe new) archive now has no appTemp column, so the appTemp daily summary will remain untouched. But I guess in this case you are talking about schema changes to the archive so probably would not be selectively rebuilding (and in any case the old data my remain in the daily summary but would never be used because appTemp is not in the schema). Starting to wander now.

Guess I just want to get a process clear in my mind.

@tkeffer
Copy link
Contributor Author

tkeffer commented May 31, 2016

No typo.

The problem with --drop-daily followed by --backfill-daily is that the user would have to enter the exact same dates in both, lest s/he get an error. The option --rebuild-daily does both in a single step, so you can't get it wrong.

In fact, I don't see why we can't get rid of --drop-daily and --backfill-daily.

No reason to delete first. Just use INSERT and it will write over the old data.

If the archive has changed enough that it has a new schema, then the daily summaries must be totally rebuilt.

"Null" values in the daily summary are a bit subtle. Yes, it's None for the mins and maxes, but for sums and weighted counts, it's zero. There's code in DaySummaryManager that deals with this.

BTW, see issue #61. I'd appreciate a second set of eyes looking at that one.

@gjr80
Copy link
Contributor

gjr80 commented May 31, 2016

Ok, well I guess I hold these thoughts. Accumulators did my brain a couple of years back when we were first putting together weewx-wd under v2.x. I will have a look at #61, not promising anything though :)

@vinceskahan
Copy link
Contributor

vinceskahan commented Oct 13, 2016

Great feature to add !

Would it be worthwhile to also be able to drop/rebuild only 'selected' tables for the specified time period ? I don't have any rebuilding speed data to see if it would/wouldn't be significant. Typically folks just want to fix up their missing/bad data and rebuild the specified time period. If it's just a day or few days, maybe not worth it. Just a thought.

@tkeffer tkeffer reopened this Oct 13, 2016
@tkeffer
Copy link
Contributor Author

tkeffer commented Oct 13, 2016

Your suggestion would require a user to have a model of what data are affected, and which are not. That's a lot more subtle than just a range of data. It would also require a much more complicated command line (observation types would have to be specified). In the interest of keeping it simple, let's just stick with date ranges.

@vinceskahan
Copy link
Contributor

agree for sure....thanks....

@gjr80
Copy link
Contributor

gjr80 commented Jan 7, 2017

After mentioning this issue in #61 this morning I thought I would have another look as it seems a fairly simple fix...well sort of.

I have reworked the --backfill-daily action to be --rebuild-daily. In light of the date/time command line option work we did with wee_import I have added --date, --from and --to options that are identical in function to wee_import but they are date only, no time.

All works well without using --drop-daily first; however, since DaySummaryManager.backfill_day_summary() calls DaySummaryManager._get_day_summary() to get a pre-loaded accumulator the sum fields were increasing with each rebuild. I guess there were a number of ways aroudn this but I decided to change the DaySummaryManager.backfill_day_summary() signature to accept named parameter rebuild that defaults to False. If rebuild is True is gets an empty accumulator instead of a pre-loaded one. This way other code (backfill during startup) keeps working and I can continue to use DaySummaryManager.backfill_day_summary() for the rebuilds.

A bit pressed for time right now but I will drop the updated files in the weighted_summaries_(#61) branch later today.

@gjr80
Copy link
Contributor

gjr80 commented Jan 7, 2017

A bit pressed for time right now but I will drop the updated files in the weighted_summaries_(#61) branch later today.

commit ab5c7d3 refers

@tkeffer
Copy link
Contributor Author

tkeffer commented Feb 3, 2017

Fixed in V3.7.0

@tkeffer tkeffer closed this as completed Feb 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants