Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speedup create model #140

Merged
merged 1 commit into from
Aug 3, 2017
Merged

speedup create model #140

merged 1 commit into from
Aug 3, 2017

Conversation

tzipperle
Copy link
Collaborator

access to pandas DataFrame via loc is much slower as python dictionary

@ojdo
Copy link
Contributor

ojdo commented Aug 3, 2017

@tzipperle: can you roughly quantify your experienced speed-up on model creation, e.g. for the mimo-example with 8760 time steps? 5%, 50%, 95%? I always thought that Pyomo itself was the larger bottleneck in the process. Therefore, I'm very curious for your finding in this case.

@kais-siala
Copy link
Contributor

From what @tzipperle told me yesterday, 90%

@ojdo
Copy link
Contributor

ojdo commented Aug 3, 2017

I did a short glimpse over to pandas' GitHub repo. Can you comment on the pandas version used for this observation? Seems there have been some regressions w/r/t indexing performance lately, e.g. pandas-dev/pandas#16644. If it's caused by such a regression, we might be happy by just waiting for an upstream fix.

If it's actually inherent to using loc in this tight loop, then it's great you found that bottleneck. In retrospect, I should have really profiled the create_model function more 😐

@KSchoenleber
Copy link
Contributor

This is highly interesting! I will also run some tests with it.

@ojdo
Copy link
Contributor

ojdo commented Aug 3, 2017

Thank you for finally making me use the Python profiler on urbs. Here's my workflow to quantify the speedup on a basic run on runme.py. First, I do a reference run with a checked out master branch:

python3 -m cProfile -o before.pstats runme.py

Then checkout the new version (git checkout tzipperle/master) and run the profiler again:

python3 -m cProfile -o after.pstats runme.py

To inspect the results, there is the handy pstats module:

python3 -m pstats before.pstats

A nice guided tour to this interactive browser is provided by this blog post. I did:

before.pstats% strip
before.pstats% sort cumulative
before.pstats% stats 10
Thu Aug  3 15:37:26 2017    before.pstats

         652021370 function calls (643517983 primitive calls) in 870.679 seconds

   Ordered by: cumulative time
   List reduced from 9594 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   1392/1    0.024    0.000  870.688  870.688 {built-in method exec}
        1    0.659    0.659  870.688  870.688 runme.py:1(<module>)
        7    0.002    0.000  868.552  124.079 runme.py:93(run_scenario)
        7    0.007    0.001  526.096   75.157 model.py:7(create_model)
     1015    0.003    0.000  525.771    0.518 block.py:466(__setattr__)
 1162/700    0.018    0.000  525.767    0.751 block.py:687(add_component)
      315    1.055    0.003  520.315    1.652 constraint.py:661(construct)
1626989/953869    5.670    0.000  473.610    0.000 indexing.py:1302(__getitem__)
   435708    0.441    0.000  470.440    0.001 misc.py:51(apply_indexed_rule)
    39984    5.276    0.000  414.908    0.010 modelhelper.py:26(commodity_balance)

There, commodity_balance takes 415 seconds of total 870 seconds. To only get this result, I can enter stat commodity_balance instead. Then, I repeat the same steps with the other profiling results:

Before

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
39984    5.276    0.000  414.908    0.010 modelhelper.py:26(commodity_balance)

After

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
39984    0.333    0.000    2.791    0.000 modelhelper.py:26(commodity_balance)

Wow, 90% was a slight understatement. Please merge immediately. This is the biggest single speedup I seem to have missed in all of urbs... for years. Thanks @tzipperle for spotting this!

Copy link
Contributor

@maledo maledo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be correct

@maledo maledo merged commit f297c6a into tum-ens:master Aug 3, 2017
@KSchoenleber
Copy link
Contributor

@tzipperle Thank you very much!

@tzipperle
Copy link
Collaborator Author

With pleasure!

@lnksz
Copy link

lnksz commented Aug 10, 2017

@tzipperle The true power of the profiler is starting to be unleashed :D
I added some remarks in the relevant rivus issue
Have a look at it for extra info.

I found that the main fat is on the MultiIndexed DataFrames' loc indexer. (@ojdo s link has similar hints)
And also included some profiling on the indexer alternatives.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants