Initial wee_database --calc-missing option #394

poblabs · 2019-04-08T18:18:10Z

Updates #363

For review on adding a new option to wee_database called --calc-missing. This will calculate all missing observation types within the database that StdWXCalculate knows about.

In order to do this action the database fields are put into a temporary variable (_rec), the data is dropped from the database (new deleteRecord() function), and is then re-added to the database (addRecord(_rec)). I feel this level of risk needs thorough review.

My test scenario:

Create a new database
Run Simulator on generator mode to quickly produce records
Extend database to add appTemp and windrun as example schema extensions
Verify the database schema has been extended by opening the database and manually checking
Run wee_database --calc-missing and enter y to continue
Verify appTemp and windrun have their missing values updated by opening the database and manually checking

My initial tests show that the newly extended fields in the schema are populated with data with no data loss to other fields.

tkeffer · 2019-04-08T20:19:43Z

A few comments:

unit_constants is not defined. You probably mean weewx.units.unit_constants
The functions wxservices.do_calculations() and wxservices.do_calculations_with_return are so similar, the former should be refactored to take advantage of the latter.
The function wee_database.calcMissing() should use the helper function y_or_n() instead of doing its own read using raw_input() (which does not exist under Python 3). Something like:
```
ans = y_or_n("Proceed (y/n)? ")

if ans == 'n':
    print("Nothing done.")
    return

...
```
I don't really understand why it's necessary to delete the record before replacing it. Why can't we just use SQL's UPDATE to update the new fields?
No documentation. You'll have to modify docs/utilities.htm.

poblabs · 2019-04-08T21:27:36Z

@tkeffer Some follow-ups:

I can update this. Probably an oversight.
I couldn't get it to work with wxservices.do_calculations() since it's not a return. Pehaps this is my noviceness showing?
I can update this.
This is the obvious choice, but I couldn't get a reliable method to work which would loop through each record in the archive and only UPDATE the ones StdWXCalculate knows about. I couldn't find the rhythm for that, so instead I opted to treat it like a full new record by storing locally, deleting, then re-adding it. Then addRecord() takes care of that. Perhaps I need to add another function, but I'm not sure about this.
I'll look into this and see what I can come up with.

tkeffer · 2019-04-08T23:25:32Z

For 2, simply rewrite wxservices.do_calculations() using your new function wxservices.do_calculations_with_return()

For 4, you have two choices. Either rewrite wxservices.do_calculations_with_return() so it keeps track of which fields have been updated and returns that set; or, probably simpler, do a compare of the starting record and the ending record. Update only those fields that have changed.

Why does this matter? Because I suspect that an UPDATE will prove to be much faster than a DELETE followed by an INSERT because the indexes don't have to be touched. But, I have no proof of that.

Even so, adding these new fields could prove to be very slow --- it would be good to do a test or two. There are speed up techniques we can use if it's a problem.

poblabs · 2019-04-09T19:36:43Z

I think I'm still a little bit lost on item 4. I have figured out that within wxservices.do_calculations_with_return() the missing observations are updated through getattr(self, 'calc_' + obs)(data_us, data_type), but I'm not sure of the logic on how it's doing it. Seems it's updating the list without returning it.

Nonetheless if I move past that, where I'm really lost is how to determine how to UPDATE. By the time the data_us dict is done from the getattr() it's already fully populated with all the missing values, plus those which existed before.

So I did some quick research on how to compare 2 dictionaries to find only the changes. Seemed rather simple, you convert to a set, subtract the two lists, then convert back to dict. Except it doesn't appear to be working well with wxservices.do_calculations_with_return()

For example (long hand):

            if calc:
                data_us_before = data_us # Have a copy of the data before missing values are calculated
                getattr(self, 'calc_' + obs)(data_us, data_type)
                set_1 = set(data_us_before.items())
                set_2 = set(data_us.items())
                updated_values = dict(set_2 - set_1) # Find the differences between the two, and convert back to dict

This returns an empty dictionary all the time. However If I try the below example manually:

data_us_before = {'leafTemp1': None, 'UV': 0.0, 'outHumidity': 76.84181092424011, 'rxCheckPercent': 56.2706273087321, 'maxSolarRad': None, 'consBatteryVoltage': 12.059103882317542, 'extraTemp2': None, 'txBatteryStatus': 0.0, 'outTempBatteryStatus': 0.0, 'inTempBatteryStatus': 0.0, 'supplyVoltage': 11.980025499309653, 'rainBatteryStatus': 0.0, 'hailRate': None, 'soilMoist4': None, 'soilMoist3': None, 'heatingVoltage': 12.016759203622588, 'soilMoist2': None, 'altimeter': 30.985868030808, 'heatindex': 31.597379166717428, 'radiation': 0.0, 'soilMoist1': None, 'leafTemp2': None, 'inTemp': 64.99461562686746, 'referenceVoltage': 12.0226042145351, 'windrun': None, 'windGustDir': 340.6145112017522, 'barometer': 30.994727030808004, 'windchill': 31.597379166717428, 'dewpoint': 25.181226265078386, 'soilTemp1': None, 'soilTemp2': None, 'soilTemp3': None, 'soilTemp4': None, 'rain': 0.0, 'heatingTemp': None, 'pressure': 30.994727030808004, 'extraHumid2': None, 'hail': None, 'extraHumid1': None, 'ET': 3.030416579457161e-05, 'rainRate': 0.0, 'extraTemp3': None, 'usUnits': 1, 'extraTemp1': None, 'humidex': 31.597379166717428, 'leafWet1': None, 'leafWet2': None, 'appTemp': None, 'inDewpoint': 29.2865448691676, 'windBatteryStatus': 0.0, 'interval': 5, 'dateTime': 1554847500, 'windDir': 341.04748406403144, 'outTemp': 31.597379166717428, 'windSpeed': 0.5263648459599787, 'inHumidity': 26.01076874626511, 'windGust': 0.6461829599415951, 'cloudbase': 1458.215141122168}


data_us = {'leafTemp1': None, 'UV': 0.0, 'outHumidity': 76.84181092424011, 'rxCheckPercent': 56.2706273087321, 'maxSolarRad': None, 'consBatteryVoltage': 12.059103882317542, 'extraTemp2': None, 'txBatteryStatus': 0.0, 'outTempBatteryStatus': 0.0, 'inTempBatteryStatus': 0.0, 'supplyVoltage': 11.980025499309653, 'rainBatteryStatus': 0.0, 'hailRate': None, 'soilMoist4': None, 'soilMoist3': None, 'heatingVoltage': 12.016759203622588, 'soilMoist2': None, 'altimeter': 30.985868030808, 'heatindex': 31.597379166717428, 'radiation': 0.0, 'soilMoist1': None, 'leafTemp2': None, 'inTemp': 64.99461562686746, 'referenceVoltage': 12.0226042145351, 'windrun': 0.6910412321076117, 'windGustDir': 340.6145112017522, 'barometer': 30.994727030808004, 'windchill': 31.597379166717428, 'dewpoint': 25.181226265078386, 'soilTemp1': None, 'soilTemp2': None, 'soilTemp3': None, 'soilTemp4': None, 'rain': 0.0, 'heatingTemp': None, 'pressure': 30.994727030808004, 'extraHumid2': None, 'hail': None, 'extraHumid1': None, 'ET': 3.030416579457161e-05, 'rainRate': 0.0, 'extraTemp3': None, 'usUnits': 1, 'extraTemp1': None, 'humidex': 31.597379166717428, 'leafWet1': None, 'leafWet2': None, 'appTemp': 26.842501332479085, 'inDewpoint': 29.2865448691676, 'windBatteryStatus': 0.0, 'interval': 5, 'dateTime': 1554847500, 'windDir': 341.04748406403144, 'outTemp': 31.597379166717428, 'windSpeed': 0.5263648459599787, 'inHumidity': 26.01076874626511, 'windGust': 0.6461829599415951, 'cloudbase': 1458.215141122168}

set_1 = set(data_us_before.items())
set_2 = set(data_us.items())

updated_values = dict(set_2 - set_1)

I get:

>>> updated_values
{'appTemp': 26.842501332479085, 'windrun': 0.6910412321076117}

Which looks like it'll satisfy what you're looking for by returning only the changed values. Then it makes sense to loop this changed dict and run UPDATE on those values. But getting a dict of changes is not working.

Any insight?

tkeffer · 2019-04-11T16:04:04Z

You're overthinking this. It's simple enough:

def do_calculations_with_return(self, data_dict, data_type):
    # ... (as before)
    return data_x

def do_calculations(self, data_dict, data_type):
    data_x = self.do_calculations_with_return(data_dict, data_type)
    data_dict.update(data_x)

Your second question is trickier. After do_calculations_with_return is done, you have two dictionaries, the incoming data_dict and the return value. You have to go through both and look for what changed. That means additional keys, as well as changed values. A function to do the compare would look something like this (NOT TESTED):
```
 def find_diffs(start_dict, end_dict):
     diffs = {}
     for key in end_dict:
         if key not in start_dict or start_dict[key] != end_dict[key]:
             diffs[key] = end_dict[key]

     return diffs
```
Now you need to add a function manager.Manager.updateValues(), similar to the existing updateValue(timestamp, data_dict), except it updates all fields in the incoming dictionary. See the sqlite documentation for the syntax. Make sure this works for MySQL as well.

poblabs · 2019-04-11T17:39:24Z

Thanks for the insight on item 1. I wasn't lost on that one but good to know I was on the right track.

I was only lost on the dict diffs. My previous reply was only focused on the dict diffs. I'll give your suggestions a try and see how it fares. Thanks

poblabs · 2019-04-22T13:26:57Z

I have to admit I'm lost on this one. Your suggested diff code is not working within this function. Let me explain:

I have a small Simulator sqlite db extended for appTemp with all appTemp values None, so that they can be updated.

Your code example works when comparing dicts outside of weewx, but comparing the dicts inside of weewx returns an empty diff.

Below is the code I'm trying to validate. Within wxservices.py and def do_calculations_with_return(), under the if calc: you can see I save the record to start_dict before it processes through weewx.wxformulas by way of getattr(). You'll see the prints under if calc:, so I can try to follow what's going on here.

    def do_calculations_with_return(self, data_dict, data_type):
        if self.ignore_zero_wind:
            self.adjust_winddir(data_dict)
        data_us = weewx.units.to_US(data_dict)
        for obs in self._dispatch_list:
            calc = False
            if obs in self.calculations:
                if self.calculations[obs] == 'software':
                    calc = True
                elif (self.calculations[obs] == 'prefer_hardware' and
                      (obs not in data_us or data_us[obs] is None)):
                    calc = True
            elif obs not in data_us or data_us[obs] is None:
                calc = True
            if calc:
                start_dict = data_us # Have a copy of the data before missing values are calculated
                print(start_dict["dateTime"])
                print("start_dict['appTemp']: %s" % str(start_dict["appTemp"]))
                getattr(self, 'calc_' + obs)(data_us, data_type)
                print("data_us['appTemp'] after weewx.wxformulas: %s" % data_us["appTemp"])
                print("")
        data_x = weewx.units.to_std_system(start_dict, data_dict['usUnits'])
        return data_x

For the sake of brevity, here's the output of 1 record as it passes through this function:

1554858300
start_dict['appTemp']: None
data_us['appTemp'] after weewx.wxformulas: None

1554858300
start_dict['appTemp']: None
data_us['appTemp'] after weewx.wxformulas: None

1554858300
start_dict['appTemp']: None
data_us['appTemp'] after weewx.wxformulas: None

1554858300
start_dict['appTemp']: None
data_us['appTemp'] after weewx.wxformulas: None

1554858300
start_dict['appTemp']: None
data_us['appTemp'] after weewx.wxformulas: 38.1799248347

1554858300
start_dict['appTemp']: 38.1799248347
data_us['appTemp'] after weewx.wxformulas: 38.1799248347

It's updated 6 times (as evidence by the dateTime record), and by the time it's done with getattr(self, 'calc_' + obs)(data_us, data_type), you'll see that start_dict is now identical to the data_us dict, so the dict diff compare isn't working because by the time that fires both dicts match.

I haven't followed why it's firing 6 times yet, so this is where I'm at - a little lost on where to go.

tkeffer · 2019-09-18T22:06:24Z

This is being addressed in #443. Closing.

poblabs · 2019-09-18T22:14:40Z

Ok, thanks

Initial wee_database --calc-missing option

ab56f80

poblabs mentioned this pull request Apr 8, 2019

calcMissing() from weeimport should be a core wee_database function #363

Closed

Updates items 1, 3 and 5 from weewx/weewx #394

7a8ebc6

tkeffer closed this Sep 18, 2019

poblabs deleted the development branch September 18, 2019 22:14

poblabs mentioned this pull request Oct 15, 2019

windrun calculation poblabs/weewx-belchertown#194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial wee_database --calc-missing option #394

Initial wee_database --calc-missing option #394

poblabs commented Apr 8, 2019

tkeffer commented Apr 8, 2019

poblabs commented Apr 8, 2019

tkeffer commented Apr 8, 2019

poblabs commented Apr 9, 2019 •

edited

tkeffer commented Apr 11, 2019

poblabs commented Apr 11, 2019 •

edited

poblabs commented Apr 22, 2019 •

edited

tkeffer commented Sep 18, 2019

poblabs commented Sep 18, 2019

Initial wee_database --calc-missing option #394

Initial wee_database --calc-missing option #394

Conversation

poblabs commented Apr 8, 2019

tkeffer commented Apr 8, 2019

poblabs commented Apr 8, 2019

tkeffer commented Apr 8, 2019

poblabs commented Apr 9, 2019 • edited

tkeffer commented Apr 11, 2019

poblabs commented Apr 11, 2019 • edited

poblabs commented Apr 22, 2019 • edited

tkeffer commented Sep 18, 2019

poblabs commented Sep 18, 2019

poblabs commented Apr 9, 2019 •

edited

poblabs commented Apr 11, 2019 •

edited

poblabs commented Apr 22, 2019 •

edited