Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor garbage collection with petsc4py 3.18.3 (was "Memory leak in term.justErrorVector()", but this isn't strictly a leak) #896

Closed
Maxwellfire opened this issue Dec 27, 2022 · 13 comments · Fixed by #899

Comments

@Maxwellfire
Copy link

I'm trying to use a slightly unorthodox solving method that requires me to get the error vector from my set of equations repeatedly. Each time term.justErrorVector() is called, it consumes some memory that is never freed.

Here is some simple code that reproduces the memory increase for me:

import fipy
mesh = fipy.Grid1D(nx=100, dx=1)
var1 = fipy.CellVariable(mesh=mesh, hasOld=True, value = 1.0)
eq1 = fipy.DiffusionTerm(var = var1, coeff=1.) == 0

for n in range(1,50000):
    eq1.justErrorVector()

This takes a couple of minutes to run and consumes about 5GiB on my system.

I was going to try the other solver implementations (pysparse and scipy), but pysparse won't run on python3 and scipy throws an error (which I can submit a bug report for separately.)

@Maxwellfire
Copy link
Author

Maxwellfire commented Dec 27, 2022

Interestingly, this does not appear to happen on my other machine, which has the same version of fipy but a different version of petsc4py/petsc. The working machine is using 3.17.4 real_he17b7f0_100 conda-forge and the failing version is using 3.18.2 real_hce8fbc3_100 conda-forge

@guyer
Copy link
Member

guyer commented Jan 3, 2023

Interestingly, this does not appear to happen on my other machine, which has the same version of fipy but a different version of petsc4py/petsc. The working machine is using 3.17.4 real_he17b7f0_100 conda-forge and the failing version is using 3.18.2 real_hce8fbc3_100 conda-forge

Given that, the leak is likely to be in that build of petsc and not something fipy has control over. What os are you running? Some variant of linux, I assume? What version of Python? FWIW, this does not leak on my MacBook with Python 3.10.

@guyer
Copy link
Member

guyer commented Jan 3, 2023

scipy throws an error (which I can submit a bug report for separately.)

If the error is RuntimeError: Factor is exactly singular, then this is because your equation admits an infinite number of solutions. Different solvers respond differently, but ultimately, the problem is not well-defined. Add a Dirichlet constraint or a TransientTerm.

With a Dirichlet condition, I do see a small monotonic increase in RAM usage with --scipy from ~54 MiB to ~57 MiB. I hesitate to call this a leak without further diagnosis; it could well be that the Python garbage collector just hasn't triggered, yet. With --petsc, I see a much smaller rise from ~54 MiB to ~54.5 MiB.

Either way, 5 GiB is an insane amount of RAM usage for this tiny problem. Do both petsc4py/petsc builds use this much RAM?

@Maxwellfire
Copy link
Author

Maxwellfire commented Jan 3, 2023

Given that, the leak is likely to be in that build of petsc and not something fipy has control over. What os are you running? Some variant of linux, I assume? What version of Python? FWIW, this does not leak on my MacBook with Python 3.10.

It does seem likely that the issue is in petsc. Part of the reason that I submitted a bug here is that I don't know the internals of how FiPy calls petsc4py well enough to submit a detailed bug report to them.

I've attached two files below with my environments that do and don't work:

environment_leak.txt
environment_no_leak.txt

Both are using python 3.10.8, and were generated from the following environment.yml using conda, with different versions of petsc4py:

name: ionic

channels:
  - conda-forge
  - defaults

dependencies:
  - python
  - matplotlib
  - pandas
  - jupyterlab
  - bokeh<3.0
  - fipy
  - pytables
  - spyder
  - dotmap
  - petsc4py=3.18.2
 # - petsc4py=3.17.4

The version that works only consumes a couple of KiB of ram as it runs. The version that does not work rapidly consumes memory at around 1GiB per 30 seconds

I am running this on a kubuntu 22.10 system.

@Maxwellfire
Copy link
Author

Maxwellfire commented Jan 3, 2023

For the scipy solver, the issue was that I was setting os.environ['FIPY_SOLVERS'] = 'scipy', but then trying to use a hardcoded term.sweep(dt=dt, solver = fipy.solvers.petsc.linearGMRESSolver.LinearGMRESSolver()) call in part of my code. I guess that once you import fipy with a particular solver, you can't then use different solver. I assume this is expected behavior and not a bug?

If I don't specify the solver explicitly when I import, then switching between solvers on the fly seems to work.

@guyer
Copy link
Member

guyer commented Jan 3, 2023

I guess that once you import fipy with a particular solver, you can't then use different solver. I assume this is expected behavior and not a bug?

I think under some circumstances, it could work, but mixing solver packages is not intended usage.

@guyer
Copy link
Member

guyer commented Jan 3, 2023

It does seem likely that the issue is in petsc. Part of the reason that I submitted a bug here is that I don't know the internals of how FiPy calls petsc4py well enough to submit a detailed bug report to them.

I've attached two files below with my environments that do and don't work:

OK, thanks for this. If I can reproduce, I'll submit a ticket against petsc(4py) or, if I can't, I'll try to tell you how to create a minimally reproducible example that you can submit.

@guyer
Copy link
Member

guyer commented Jan 4, 2023

I can, indeed, reproduce, on both debian and macOS. Something is very wrong with that petsc(4py) build. I'll work on filing a ticket.

@Maxwellfire
Copy link
Author

Thank you! I also just checked and the error reproduces with the latest build in conda-forge of 3.18.3

@guyer
Copy link
Member

guyer commented Jan 5, 2023

Running with mprof, shows
leak
which seemed very familiar.

I eventually remembered trilinos/Trilinos#2327 from five years ago. The same "fix" is not effective, however.

@tkphd
Copy link
Contributor

tkphd commented Jan 5, 2023

Thanks for filing this ticket, @Maxwellfire! I encountered a memory leak with petsc4py 3.18 as well, but got lost in other things instead of tracking it down. I see the same leak from your MWE with petsc4py 3.18.3 (conda-forge), but no leaks with 3.16 or 3.17. This helps a lot!

@guyer
Copy link
Member

guyer commented Jan 7, 2023

While not a proper fix, this issue can be mitigated by adding a call to PETSc.garbage_cleanup() after every iteration. E.g.,

import fipy
from petsc4py import PETSc
mesh = fipy.Grid1D(nx=100, dx=1)
var1 = fipy.CellVariable(mesh=mesh, hasOld=True, value = 1.0)
eq1 = fipy.DiffusionTerm(var = var1, coeff=1.) == 0

for n in range(1,50000):
    eq1.justErrorVector()
    PETSc.garbage_cleanup()

I have identified some things that we should be cleaning up, but they don't completely resolve the leak (without manual garbage_cleanup()). I think this is not a bug in PETSc 3.18, although if it's an intentional change in behavior, I don't see anything obvious in the change log.

@guyer
Copy link
Member

guyer commented Jan 12, 2023

I realized I wasn't cleaning up the PETSc AO application-ordering object. Once I destroy that, too, I can get memory usage back to where it was with petsc4py 3.17.4, without calling PETSc.garbage_cleanup().

I have filed https://gitlab.com/petsc/petsc/-/issues/1309 against petsc4py, although I suspect they'll tell us to clean up after ourselves.

@guyer guyer self-assigned this Jan 12, 2023
@guyer guyer changed the title Memory leak in term.justErrorVector() Poor garbage collection with petsc4py 3.18.3 (was "leak in term.justErrorVector()", but this isn't strictly a leak) Jan 12, 2023
@guyer guyer changed the title Poor garbage collection with petsc4py 3.18.3 (was "leak in term.justErrorVector()", but this isn't strictly a leak) Poor garbage collection with petsc4py 3.18.3 (was "Memory leak in term.justErrorVector()", but this isn't strictly a leak) Jan 12, 2023
guyer added a commit to guyer/fipy that referenced this issue Jan 12, 2023
Garbage collection works differently (worse) in petsc4py 3.18.3
than in 3.17.4.

Fixes usnistgov#896
guyer added a commit to guyer/fipy that referenced this issue Jan 12, 2023
Garbage collection works differently (worse) in petsc4py 3.18.3
than in 3.17.4.

Fixes usnistgov#896
guyer added a commit that referenced this issue Jan 12, 2023
Garbage collection works differently (worse) in petsc4py 3.18.3
than in 3.17.4.

Fixes #896
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants