-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Big memory usage for density evaluation #121
Comments
The memory-doubling should be avoidable I think. A bigger issue is that we should be using some screening to not evaluate all the orbital overlaps, because we don't need them all. That is going to take a bit of refactoring, though perhaps @BradenDKelly has already looked at this? |
@leila-pujal @PaulWAyers yes, there are two different PRs for screening. Either one would save some time/calculations. I will investigate once home. |
@leila-pujal @marco-2023 this is the issue Leila was talking about. If there is a pull request open, it may be worth reviewing and merging it. |
I opened a PR #166 that fixes this bug. Below there is my testing code that uses the new grid (iron_hexacarbonyl.fchk is provided at the top of the issue):
This gives a smaller grid than the one used when this issue was opened. With the old code, we get the doubling in memory in line 45 in With the new code we avoid creating the intermediate matrix density and the memory stays the same: |
I commented on PR #166, I talked with @Ali-Tehrani earlier and he pointed out that condensing the lines of code does not solve the problem. I confirmed this earlier this week because I actually had a similar problem, the profiler only tracks memory increased/decreased before and after lines of code but doing the operations inside the np.sum it creates the array and deletes it at the same time. At this point, I am not sure if this doubling in memory can be avoided. I don't remember if we talked about some other alternative for this part of the code to avoid this happening. |
There are external libraries to fix this, like I think there is a way to do this with numpy "in place" operations but I am no expert. |
Don't think "in-place" will work for this specific case. The other suggestions alongside using "in-place" would be:
|
I think I was able to make some progress with this. I changed line 60 of gbasis/evals/density.py and got this:
Compare with the original:
So, this is my solution which uses like 130% of the memory instead of 200%:
I looked at screening, but I'm really unsure of how it could be used here. This is my first contribution to the Python code proper and I've been a bit confused trying to understand it all. Let me know if this suffices and I will open a PR. Cheers. |
@leila-pujal I believe it's ok. |
I did a quick check using command |
Yeah that is what I saw with the |
I have some more information here. I profiled this functionality while indicating the boundaries between functions. It seems the biggest spike occurs in |
I think chunking has to be done; then screening gets us to having the optimal scaling. The issue is that the "chunk size" needs to be adapted to the memory available. Once the chunk size is large enough, I expect the performance hit to be minimal. |
I think my notes (within the pull request) really belong here, so I'm moving them over. ==== For a grid point that is where This equation can be solved exactly for For There are two roots, one near the nucleus (where d^2 is small) and one far out. The second root is the one we want, and has the value, (the same formula, but with the Lambert W function replacing the logarithm. We could solve for other values of The procedure then is to:
In the case where no screening is done, this amounts to a "chunked" version of the old algorithm I think. |
I have come up with a mask function: def _evaluate_orb_mask(basis, points, tol_screen):
orb_mask = np.zeros((len(basis), points.shape[0]), dtype=bool)
for mask, shell in zip(orb_mask, basis):
cmin = shell.coeffs.min()
amin = shell.exps.min()
nmin = (
(2 * amin / np.pi) ** (3 / 4)
* ((4 * amin) ** (shell.angmom / 2))
/ np.sqrt(factorial2(2 * shell.angmom + 1))
)
dist = np.sqrt(-(np.log if shell.angmom == 0 else lambertw)(tol_screen / (cmin * nmin)) / amin)
np.less_equal(dist ** 2, np.sum((points - shell.coord) ** 2, axis=1), out=mask)
return orb_mask I'm working on putting it together with the existing code to evaluate the basis and then the density with it, but it will take me some time. I'm not sure how it should be done, so I'm working through the code. I think I will have to edit the |
This looks good @msricher. Feel free to post here your thoughts on what are you planning to modify in the modules you mentioned. |
OK, so I have the mask function, and the chunking is implemented. My issue now is that, using the mask, I am not sure of where I can save memory. I need to pass the mask into Line 114 in 3e063ad
construct_array_contraction Line 56 in 3e063ad
I haven't been able to figure out yet how to apply the mask in a way that will save memory. The chunking is fine, but I have doubts about this. |
@msricher, could you report how memory performance improves with only the chunking? @PaulWAyers , I have a doubt for the algorithm, for the mask, is the goal to reduce the size of |
See this comment. It's dynamic, though, based on the amount of memory available. I just set it manually to 4 chunks to show this. |
This works very well now, and I think will suffice to fix the memory issue. We can still work on the screening, though. Please see the PR #191 . |
The idea is to reduce the CPU time and memory by not evaluating every orbital at every point. This gets density evaluation down to linear-scaling in computation and memory, which is what it should be. (Right now it is quadratic scaling in computation and cubic in memory.) |
Okay I see.... I think we are talking about the same thing but just to double check; When doing this in the code: |
I had a chat with Michelle and there are potentially lots of ways to do this. We could store My guess is that it is simplest, fastest, and most direct to store AO evaluations on the grid as a sparse ( |
Describe the bug
Big memory usage for density evaluation
To Reproduce
I attach the system I used to test:
iron_hexacarbonyl.tar.gz
Expected behaviour
I don't know if this could be improved, I am just reporting the behaviour as an issue.
Screenshots
System Information:
Additional context
For computing the density it starts with
evaluate_density()
, then evaluates each orbital by computing the density for the basis functions and then it obtains the total density by using the density matrix. For evaluating the basis functions density, it is done in_deriv.py
module inside_eval_deriv_contractions()
. To monitor the memory I used several prints withnbytes
for arrays and@profile
frommemory_profiler import profile
. To monitor the memory of each contraction I printed the associated memory of matrix_contraction in base onegbasis/gbasis/base_one.py
Line 239 in c57de8f
_eval_deriv_contractions()
,evaluate_density()
andevaluate_density_using_evaluated_orbs
. With the information I gathered I think the memory dependcy in the code is as follows. Through the evaluation of all the generalized contractions, the memory will scale as a function of AO orbitals generated for each angular momentum. S-generalized contractions are the lowest memory cost and to obtain the memory cost for higher angular momentum generalized contractions you have to multiply the memory cost of the S-generalized contractions by the number of generated atomic orbitals (e.g l=1, 3p orbitals. Increase in memory = S-generalized contractions-memory x 3) The base memory corresponding to S-generalized contractions scales linearly with the number of points of the molecular grid. I fitted some values for not so big molecular sizes and I got the following regressionS-generalized contractions memory (MB)=(0.000008150229943 *n_points) - 0.24118548
. I checked to predict for some bigger ones and the predicted value was around 2MB wrong.To show the agreement of the memory predicted with the one monitored I used iron_hexacarbonyl:
For a radial grid =100points
angular = 1030
Total = 100 * 1202 * 13atoms = 1562600
Base memory for S-generalized contractions = 12.49MB
The values I obtained profiling the code are shown in the following pictures:
this is the profiling for
evaluate density()
You can see evaluates basis increases almost the same memory that it has been calculated in line 78.
But also, there's a spike of memory that it is not shown here that occurs in
evaluate_density_using_evaluated_orbs
. The profiling for this is shown in the following image.Here you can see in line 63 where the memory is doubled.
The text was updated successfully, but these errors were encountered: