Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial wee_database --calc-missing option #394

Closed
wants to merge 2 commits into from
Closed

Initial wee_database --calc-missing option #394

wants to merge 2 commits into from

Conversation

poblabs
Copy link

@poblabs poblabs commented Apr 8, 2019

Updates #363

For review on adding a new option to wee_database called --calc-missing. This will calculate all missing observation types within the database that StdWXCalculate knows about.

In order to do this action the database fields are put into a temporary variable (_rec), the data is dropped from the database (new deleteRecord() function), and is then re-added to the database (addRecord(_rec)). I feel this level of risk needs thorough review.

My test scenario:

  1. Create a new database
  2. Run Simulator on generator mode to quickly produce records
  3. Extend database to add appTemp and windrun as example schema extensions
  4. Verify the database schema has been extended by opening the database and manually checking
  5. Run wee_database --calc-missing and enter y to continue
  6. Verify appTemp and windrun have their missing values updated by opening the database and manually checking

My initial tests show that the newly extended fields in the schema are populated with data with no data loss to other fields.

@tkeffer
Copy link
Contributor

tkeffer commented Apr 8, 2019

A few comments:

  1. unit_constants is not defined. You probably mean weewx.units.unit_constants
  2. The functions wxservices.do_calculations() and wxservices.do_calculations_with_return are so similar, the former should be refactored to take advantage of the latter.
  3. The function wee_database.calcMissing() should use the helper function y_or_n() instead of doing its own read using raw_input() (which does not exist under Python 3). Something like:
    ans = y_or_n("Proceed (y/n)? ")
    
    if ans == 'n':
        print("Nothing done.")
        return
    
    ...
  4. I don't really understand why it's necessary to delete the record before replacing it. Why can't we just use SQL's UPDATE to update the new fields?
  5. No documentation. You'll have to modify docs/utilities.htm.

@poblabs
Copy link
Author

poblabs commented Apr 8, 2019

@tkeffer Some follow-ups:

  1. I can update this. Probably an oversight.
  2. I couldn't get it to work with wxservices.do_calculations() since it's not a return. Pehaps this is my noviceness showing?
  3. I can update this.
  4. This is the obvious choice, but I couldn't get a reliable method to work which would loop through each record in the archive and only UPDATE the ones StdWXCalculate knows about. I couldn't find the rhythm for that, so instead I opted to treat it like a full new record by storing locally, deleting, then re-adding it. Then addRecord() takes care of that. Perhaps I need to add another function, but I'm not sure about this.
  5. I'll look into this and see what I can come up with.

@tkeffer
Copy link
Contributor

tkeffer commented Apr 8, 2019

For 2, simply rewrite wxservices.do_calculations() using your new function wxservices.do_calculations_with_return()

For 4, you have two choices. Either rewrite wxservices.do_calculations_with_return() so it keeps track of which fields have been updated and returns that set; or, probably simpler, do a compare of the starting record and the ending record. Update only those fields that have changed.

Why does this matter? Because I suspect that an UPDATE will prove to be much faster than a DELETE followed by an INSERT because the indexes don't have to be touched. But, I have no proof of that.

Even so, adding these new fields could prove to be very slow --- it would be good to do a test or two. There are speed up techniques we can use if it's a problem.

@poblabs
Copy link
Author

poblabs commented Apr 9, 2019

I think I'm still a little bit lost on item 4. I have figured out that within wxservices.do_calculations_with_return() the missing observations are updated through getattr(self, 'calc_' + obs)(data_us, data_type), but I'm not sure of the logic on how it's doing it. Seems it's updating the list without returning it.

Nonetheless if I move past that, where I'm really lost is how to determine how to UPDATE. By the time the data_us dict is done from the getattr() it's already fully populated with all the missing values, plus those which existed before.

So I did some quick research on how to compare 2 dictionaries to find only the changes. Seemed rather simple, you convert to a set, subtract the two lists, then convert back to dict. Except it doesn't appear to be working well with wxservices.do_calculations_with_return()

For example (long hand):

            if calc:
                data_us_before = data_us # Have a copy of the data before missing values are calculated
                getattr(self, 'calc_' + obs)(data_us, data_type)
                set_1 = set(data_us_before.items())
                set_2 = set(data_us.items())
                updated_values = dict(set_2 - set_1) # Find the differences between the two, and convert back to dict

This returns an empty dictionary all the time. However If I try the below example manually:

data_us_before = {'leafTemp1': None, 'UV': 0.0, 'outHumidity': 76.84181092424011, 'rxCheckPercent': 56.2706273087321, 'maxSolarRad': None, 'consBatteryVoltage': 12.059103882317542, 'extraTemp2': None, 'txBatteryStatus': 0.0, 'outTempBatteryStatus': 0.0, 'inTempBatteryStatus': 0.0, 'supplyVoltage': 11.980025499309653, 'rainBatteryStatus': 0.0, 'hailRate': None, 'soilMoist4': None, 'soilMoist3': None, 'heatingVoltage': 12.016759203622588, 'soilMoist2': None, 'altimeter': 30.985868030808, 'heatindex': 31.597379166717428, 'radiation': 0.0, 'soilMoist1': None, 'leafTemp2': None, 'inTemp': 64.99461562686746, 'referenceVoltage': 12.0226042145351, 'windrun': None, 'windGustDir': 340.6145112017522, 'barometer': 30.994727030808004, 'windchill': 31.597379166717428, 'dewpoint': 25.181226265078386, 'soilTemp1': None, 'soilTemp2': None, 'soilTemp3': None, 'soilTemp4': None, 'rain': 0.0, 'heatingTemp': None, 'pressure': 30.994727030808004, 'extraHumid2': None, 'hail': None, 'extraHumid1': None, 'ET': 3.030416579457161e-05, 'rainRate': 0.0, 'extraTemp3': None, 'usUnits': 1, 'extraTemp1': None, 'humidex': 31.597379166717428, 'leafWet1': None, 'leafWet2': None, 'appTemp': None, 'inDewpoint': 29.2865448691676, 'windBatteryStatus': 0.0, 'interval': 5, 'dateTime': 1554847500, 'windDir': 341.04748406403144, 'outTemp': 31.597379166717428, 'windSpeed': 0.5263648459599787, 'inHumidity': 26.01076874626511, 'windGust': 0.6461829599415951, 'cloudbase': 1458.215141122168}


data_us = {'leafTemp1': None, 'UV': 0.0, 'outHumidity': 76.84181092424011, 'rxCheckPercent': 56.2706273087321, 'maxSolarRad': None, 'consBatteryVoltage': 12.059103882317542, 'extraTemp2': None, 'txBatteryStatus': 0.0, 'outTempBatteryStatus': 0.0, 'inTempBatteryStatus': 0.0, 'supplyVoltage': 11.980025499309653, 'rainBatteryStatus': 0.0, 'hailRate': None, 'soilMoist4': None, 'soilMoist3': None, 'heatingVoltage': 12.016759203622588, 'soilMoist2': None, 'altimeter': 30.985868030808, 'heatindex': 31.597379166717428, 'radiation': 0.0, 'soilMoist1': None, 'leafTemp2': None, 'inTemp': 64.99461562686746, 'referenceVoltage': 12.0226042145351, 'windrun': 0.6910412321076117, 'windGustDir': 340.6145112017522, 'barometer': 30.994727030808004, 'windchill': 31.597379166717428, 'dewpoint': 25.181226265078386, 'soilTemp1': None, 'soilTemp2': None, 'soilTemp3': None, 'soilTemp4': None, 'rain': 0.0, 'heatingTemp': None, 'pressure': 30.994727030808004, 'extraHumid2': None, 'hail': None, 'extraHumid1': None, 'ET': 3.030416579457161e-05, 'rainRate': 0.0, 'extraTemp3': None, 'usUnits': 1, 'extraTemp1': None, 'humidex': 31.597379166717428, 'leafWet1': None, 'leafWet2': None, 'appTemp': 26.842501332479085, 'inDewpoint': 29.2865448691676, 'windBatteryStatus': 0.0, 'interval': 5, 'dateTime': 1554847500, 'windDir': 341.04748406403144, 'outTemp': 31.597379166717428, 'windSpeed': 0.5263648459599787, 'inHumidity': 26.01076874626511, 'windGust': 0.6461829599415951, 'cloudbase': 1458.215141122168}

set_1 = set(data_us_before.items())
set_2 = set(data_us.items())

updated_values = dict(set_2 - set_1)

I get:

>>> updated_values
{'appTemp': 26.842501332479085, 'windrun': 0.6910412321076117}

Which looks like it'll satisfy what you're looking for by returning only the changed values. Then it makes sense to loop this changed dict and run UPDATE on those values. But getting a dict of changes is not working.

Any insight?

@tkeffer
Copy link
Contributor

tkeffer commented Apr 11, 2019

  1. You're overthinking this. It's simple enough:

    def do_calculations_with_return(self, data_dict, data_type):
        # ... (as before)
        return data_x
    
    def do_calculations(self, data_dict, data_type):
        data_x = self.do_calculations_with_return(data_dict, data_type)
        data_dict.update(data_x)
  2. Your second question is trickier. After do_calculations_with_return is done, you have two dictionaries, the incoming data_dict and the return value. You have to go through both and look for what changed. That means additional keys, as well as changed values. A function to do the compare would look something like this (NOT TESTED):

     def find_diffs(start_dict, end_dict):
         diffs = {}
         for key in end_dict:
             if key not in start_dict or start_dict[key] != end_dict[key]:
                 diffs[key] = end_dict[key]
    
         return diffs

    Now you need to add a function manager.Manager.updateValues(), similar to the existing updateValue(timestamp, data_dict), except it updates all fields in the incoming dictionary. See the sqlite documentation for the syntax. Make sure this works for MySQL as well.

@poblabs
Copy link
Author

poblabs commented Apr 11, 2019

Thanks for the insight on item 1. I wasn't lost on that one but good to know I was on the right track.

I was only lost on the dict diffs. My previous reply was only focused on the dict diffs. I'll give your suggestions a try and see how it fares. Thanks

@poblabs
Copy link
Author

poblabs commented Apr 22, 2019

I have to admit I'm lost on this one. Your suggested diff code is not working within this function. Let me explain:

I have a small Simulator sqlite db extended for appTemp with all appTemp values None, so that they can be updated.

Your code example works when comparing dicts outside of weewx, but comparing the dicts inside of weewx returns an empty diff.

Below is the code I'm trying to validate. Within wxservices.py and def do_calculations_with_return(), under the if calc: you can see I save the record to start_dict before it processes through weewx.wxformulas by way of getattr(). You'll see the prints under if calc:, so I can try to follow what's going on here.

    def do_calculations_with_return(self, data_dict, data_type):
        if self.ignore_zero_wind:
            self.adjust_winddir(data_dict)
        data_us = weewx.units.to_US(data_dict)
        for obs in self._dispatch_list:
            calc = False
            if obs in self.calculations:
                if self.calculations[obs] == 'software':
                    calc = True
                elif (self.calculations[obs] == 'prefer_hardware' and
                      (obs not in data_us or data_us[obs] is None)):
                    calc = True
            elif obs not in data_us or data_us[obs] is None:
                calc = True
            if calc:
                start_dict = data_us # Have a copy of the data before missing values are calculated
                print(start_dict["dateTime"])
                print("start_dict['appTemp']: %s" % str(start_dict["appTemp"]))
                getattr(self, 'calc_' + obs)(data_us, data_type)
                print("data_us['appTemp'] after weewx.wxformulas: %s" % data_us["appTemp"])
                print("")
        data_x = weewx.units.to_std_system(start_dict, data_dict['usUnits'])
        return data_x

For the sake of brevity, here's the output of 1 record as it passes through this function:

1554858300
start_dict['appTemp']: None
data_us['appTemp'] after weewx.wxformulas: None

1554858300
start_dict['appTemp']: None
data_us['appTemp'] after weewx.wxformulas: None

1554858300
start_dict['appTemp']: None
data_us['appTemp'] after weewx.wxformulas: None

1554858300
start_dict['appTemp']: None
data_us['appTemp'] after weewx.wxformulas: None

1554858300
start_dict['appTemp']: None
data_us['appTemp'] after weewx.wxformulas: 38.1799248347

1554858300
start_dict['appTemp']: 38.1799248347
data_us['appTemp'] after weewx.wxformulas: 38.1799248347

It's updated 6 times (as evidence by the dateTime record), and by the time it's done with getattr(self, 'calc_' + obs)(data_us, data_type), you'll see that start_dict is now identical to the data_us dict, so the dict diff compare isn't working because by the time that fires both dicts match.

I haven't followed why it's firing 6 times yet, so this is where I'm at - a little lost on where to go.

@tkeffer
Copy link
Contributor

tkeffer commented Sep 18, 2019

This is being addressed in #443. Closing.

@tkeffer tkeffer closed this Sep 18, 2019
@poblabs poblabs deleted the development branch September 18, 2019 22:14
@poblabs
Copy link
Author

poblabs commented Sep 18, 2019

Ok, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants