Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weewx.xtypes.XTypeTable.get_aggregate() is very very slow #939

Closed
roe-dl opened this issue Feb 26, 2024 · 8 comments
Closed

weewx.xtypes.XTypeTable.get_aggregate() is very very slow #939

roe-dl opened this issue Feb 26, 2024 · 8 comments

Comments

@roe-dl
Copy link
Contributor

roe-dl commented Feb 26, 2024

Test environment:

  • a fresh DEB install of WeeWX 5.0.2 with simulator driver and a weewx.sdb with 4 years of data under Ubuntu 22.04
  • a very simple test XType defining the observation type test that has a constant value
    #!/usr/bin/python3
    
    import weewx.xtypes
    import weewx.units
    from weewx.engine import StdService
    
    class TestXType(weewx.xtypes.XType):
    
        def get_scalar(self, obs_type, record, db_manager, **option_dict):
            if obs_type!='test':
                raise weewx.UnknownType('not test observation type')
            return weewx.units.ValueTuple(1,'degree_C','group_temperature')
    
    
    class TestService(StdService):
    
        def __init__(self, engine, config_dict):
            super(TestService,self).__init__(engine,config_dict)
            self.xtype = TestXType()
            weewx.xtypes.xtypes.append(self.xtype)

Test:

Before installing the test XType, creating the HTML files of the Seasons skins requires 0.43 seconds in the test environment:

Feb 26 11:00:16 weewx weewxd[6503]: INFO weewx.cheetahgenerator: Generated 8 files for report SeasonsReport in 0.43 seconds

I then added user.test.TestService to xtype_services in weewx.conf and the observation type test to observations_current and observations_stats in skin.conf. See below for details of the configuration changes.

After that creating the HTML files of the Seasons skin requires 14.50 seconds:

Feb 26 11:05:30 weewx weewxd[6503]: INFO weewx.cheetahgenerator: Generated 8 files for report SeasonsReport in 14.50 seconds

That are 3372 percent longer than before.

Why did I perform this test?

With WeeWX 5.0 the weewx-GTS extension consumes a lot of time, forcing WeeWX to restart main loop continuously. With WeeWX 4.10 all works fine. I looked for possible reasons and found out, that there is a new function weewx.xtypes.XTypeTable.get_aggregate, that is not available in older versions of WeeWX. Inserting some logging messages revealed that it is indeed this function, that consumes the time.

Changes to the configuration:

weewx.conf:

Added user.test.TestService to xtype_services only. This is the whole section:

...
[Engine]
    # This section specifies which services should be run and in what order.
    [[Services]]
        prep_services = weewx.engine.StdTimeSynch
        data_services = ,
        process_services = weewx.engine.StdConvert, weewx.engine.StdCalibrate, weewx.engine.StdQC, weewx.wxservices.StdWXCalculate
        xtype_services = weewx.wxxtypes.StdWXXTypes, weewx.wxxtypes.StdPressureCooker, weewx.wxxtypes.StdRainRater, weewx.wxxtypes.StdDelta, user.test.TestService
        #user.GTS.GTSService
        archive_services = weewx.engine.StdArchive
        restful_services = weewx.restx.StdStationRegistry, weewx.restx.StdWunderground, weewx.restx.StdPWSweather, weewx.restx.StdCWOP, weewx.restx.StdWOW, weewx.restx.StdAWEKAS
        report_services = weewx.engine.StdPrint, weewx.engine.StdReport

skin.conf:

Added test to the end of the list:

...
    observations_current = ..., test
    observations_stats = ..., test
...
@tkeffer
Copy link
Contributor

tkeffer commented Feb 26, 2024

Thanks, Karen, for this report. It isolates the issue nicely.

I will be on the road for the next 5 days, so I will be unable to take a look at this until the end of the week.

@tkeffer
Copy link
Contributor

tkeffer commented Feb 29, 2024

I'm seeing only small increases in generation time.

Couple more questions:

  1. How large is the database?
  2. What hardware are you running it on?

@roe-dl
Copy link
Contributor Author

roe-dl commented Mar 1, 2024

  1. The database is 285601792 bytes long.
  2. I used a virtual machine in a vmWare server Fujitsu Primergy RX1330 M3 with Intel Xeon processor and 32 GB RAM. The virtual machine has 4 GB RAM. It was used for this test only.

I started this investigation only because people were complaining to me about the weewx-GTS extension becoming very slow under WeeWX 5.0 causing their report generation to crash. Under WeeWX 4.10 all works fine.

To help those people I finally ended up changing the extension. The code now inserts the extension before XTypeTable and handles all the aggregations that XTypeTable handles. So the extension works under 5.0 as well as it worked under 4.10

The weewx-GTS extension provides some derived observation types used in agriculture.

Additionally you may want to read Rich's thoughts about XTypeTable.

@tkeffer
Copy link
Contributor

tkeffer commented Mar 1, 2024

I am using a 364 MB database on a NUC 11th gen I7 and 32 GB RAM, so similar processing power, similar database size.

I tried to isolate things to just the test xtype. I set

observations_current = test,
observations_stats = test,

I ran the CheetahGenerator only (no image generation).

No NOAA reports.

Version Time (6 files)
v4.10.2 0.17 s
v5.0 (HEAD) 4.63 s

There is no question that V5.0 is much slower, BUT it is doing more. For v4.10.2, the statistics.html file has no values in it. That is, no values were calculated for the day, week, month, year, and rain year highs, lows, and times. They're just blank.

For v5.0.2 (HEAD), there are complete results.

Probably the right thing to do is write a wiki page explaining this. If faced with a performance problem, there are several choices:

  • Put the type in the database.
  • Provide specialized versions of get_aggregate and get_series.
  • Manage file generation to minimize the calculations. That is, don't just blindly put the type in observation_stats.
  • Or, live with the results.

That's all I can come up with.

@tkeffer
Copy link
Contributor

tkeffer commented Mar 2, 2024

Just tried the test again, this time adding observation type test to the database. Now the results look like:

Version Database? # of calls Time (6 files)
v4.10.2 No 0 0.17 s
v5.0 (HEAD) No 344428 4.63 s
v5.0 (HEAD) Yes 0 0.19 s

@roe-dl
Copy link
Contributor Author

roe-dl commented Mar 2, 2024

There is no question that V5.0 is much slower,

And in reality people do not only have one such observation type but many. And every one adds some time. And in the end people experienced skin creation durations of several minutes, not seconds. The problem is, that the get_aggregate() function is called a lot of times during skin creation.

As you say people need more information:

  • Users need to be encouraged to include observation types into the database in case of timing problems.
  • Users should know that they shall not blame the extension developer for that.
  • Developers need to be made aware of XTypeTable, DailySummaries and ArchiveTable classes and the interference with their XTypes.

And, may be, some time we find a faster algorithm. With Python sometimes it is like in the early days of computing, when people thought about optimizing some small snippet of assembler code for days.

What I made different in my implementation of get_aggregate() is to use genSql() instead of genBatchRecords(). I am not sure if that makes a difference in timing. But in another environment I found the builtin zip() function very slow compared to other solutions. It was a huge difference when I found a way avoiding zip() then (if I remember it right, that is). zip() is used in genBachRecords(), but not in genSql().

@tkeffer
Copy link
Contributor

tkeffer commented Mar 3, 2024

Here are my thoughts and comments on XType performance: https://github.com/weewx/weewx/wiki/XTypes-performance

@tkeffer tkeffer closed this as completed Mar 3, 2024
@roe-dl
Copy link
Contributor Author

roe-dl commented Mar 3, 2024

Thank you. I added a link to your wiki article to the README of my XTypes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants