Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operating memory research #1839

Closed
RaynaldM opened this issue Dec 6, 2023 · 11 comments
Closed

Operating memory research #1839

RaynaldM opened this issue Dec 6, 2023 · 11 comments
Assignees
Labels
bug Identified as a potential bug high High priority Jan'24 January 2024 release small effort Likely less than a day of development effort.
Milestone

Comments

@RaynaldM
Copy link
Collaborator

RaynaldM commented Dec 6, 2023

We regularly have a few Out Of Memory, which are very difficult to track down.
We have found 2 probable causes.
It's not much, but in a high-traffic context, small streams make big rivers.

Take a look at our PR to see the little fixes we've made.

But the hunt is not finish... 😃

@RaynaldM RaynaldM self-assigned this Dec 6, 2023
@RaynaldM RaynaldM added bug Identified as a potential bug small effort Likely less than a day of development effort. labels Dec 6, 2023
@RaynaldM RaynaldM added this to the December'23 milestone Dec 6, 2023
RaynaldM pushed a commit that referenced this issue Dec 6, 2023
@RaynaldM RaynaldM linked a pull request Dec 6, 2023 that will close this issue
@RaynaldM RaynaldM added the 2023 Annual 2023 release label Dec 7, 2023
@raman-m
Copy link
Member

raman-m commented Dec 8, 2023

For more lucky hunting I would say you could add more logger points...
I agree OOM problems cannot be detected easily now because Ocelot has no memory consuming monitor, indicators and memory events...

@raman-m
Copy link
Member

raman-m commented Dec 11, 2023

@ggnaegi and @RaynaldM
Any ideas on how to monitor such cases and memory management?
Seems we need definitely to design some lite memory controller/monitor...

@ggnaegi
Copy link
Member

ggnaegi commented Dec 11, 2023

@ggnaegi and @RaynaldM
Any ideas on how to monitor such cases and memory management?
Seems we need definitely to design some lite memory controller/monitor...

Metrics?

@raman-m
Copy link
Member

raman-m commented Dec 11, 2023

Does our lovely Bla-bla gateway have some metrics/indicators or monitoring features?
We could start from general metrics for memory consumption, but I'm afraid that will be not easy... Seems it requires a refactoring of all Ocelot core... 🤔
I believe for the first time we can read consumed memory on app domain level... That should be enough...
More precise metrics/indicators require big refactoring road/milestone...

@RaynaldM
Copy link
Collaborator Author

Our experience with metrics is currently limited to measuring response times, the number of active requests (requests but not yet responses), the number of requests per second and error counts.
As far as OOMs are concerned, we see them in the logs, but with a slight delay, and it's quite difficult to trace the thread of events that generated them. In many cases, an OOM is triggered on a method, but it is merely the victim of a leak elsewhere.
And to top it all off, OOMs happen quite randomly.
I think it's better to concentrate on improving Ocelot gently and punctually, whenever we recognize a problematic piece of code (and there really are lots of places where we can improve it).

@RaynaldM
Copy link
Collaborator Author

RaynaldM commented Dec 11, 2023

As you can see, memory consumption varies greatly from pod to pod, due to the diversity of operations to be carried out.

image

It varies from 150 MB to 1.2 GB.

@ks1990cn
Copy link
Contributor

ks1990cn commented Dec 15, 2023

@RaynaldM Intresting, how you are measuring these graphs? Is it something on Production with azure/aws, which tool it is?

What about just finding through visual studio profilers, is it effective enough (Cause it will not be able to replicate high traffic cases)?

Just gone through statement and trying to understand, what approach you are following.

https://learn.microsoft.com/en-us/visualstudio/profiling/profiling-feature-tour?view=vs-2022

@ks1990cn
Copy link
Contributor

ks1990cn commented Dec 16, 2023

image

Here we can see from 50s to 1:40min we got 2 times allocations, one before GC and after GC. This is done with 100 Virtual Users load for 5 mins.
We can through VS revisit our codes where it makes more allocations?

@raman-m raman-m added the high High priority label Dec 16, 2023
@ks1990cn
Copy link
Contributor

.Net 8 have new feature for realtime GC monitoring, .Net Aspire.

https://learn.microsoft.com/en-us/dotnet/aspire/get-started/aspire-overview
https://www.youtube.com/watch?v=DORZA_S7f9w
https://github.com/dotnet/eShop

Can we give user option to integrate?

@raman-m raman-m changed the title Hunting for memory leaks Operating memory research Dec 27, 2023
raman-m pushed a commit that referenced this issue Feb 17, 2024
@raman-m raman-m added Jan'24 January 2024 release and removed 2023 Annual 2023 release labels Feb 17, 2024
@raman-m raman-m modified the milestones: Annual 2023, January'24 Feb 17, 2024
@raman-m
Copy link
Member

raman-m commented Feb 17, 2024

But the hunt is not finish... 😃

@RaynaldM @ggnaegi Do we need to hunt more?

@ggnaegi
Copy link
Member

ggnaegi commented Feb 17, 2024

@raman-m @RaynaldM We should try to improve the overall application performance, but the last changes had a big impact on OOMs. I think this issue isn't specific enough, maybe we should convert it to a discussion?

@ThreeMammals ThreeMammals locked and limited conversation to collaborators Feb 19, 2024
@raman-m raman-m converted this issue into discussion #1966 Feb 19, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
bug Identified as a potential bug high High priority Jan'24 January 2024 release small effort Likely less than a day of development effort.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants