New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removeRange not called on allocArray #591

Closed
etcimon opened this Issue Mar 21, 2014 · 20 comments

Comments

Projects
None yet
3 participants
@etcimon
Contributor

etcimon commented Mar 21, 2014

When creating the table in hashmap.d with:

m_table = allocArray!TableEntry(m_allocator, new_size);

The indirections cause it to add its entire memory range in memory.d with addRange

However it is freed with m_allocator.free(cast(void[])oldtable); at every resize without calling removeRange

This causes instability b/c eventually the app crashes from inside the GC during collection due to unreferenced pointer / access violation.

I'm not sure if this is the only place we're forgetting to call removeRange.

I suggest this be added to fix it:

        import core.memory;
        GC.removeRange(&oldTable);
@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 21, 2014

Contributor

When vibe.d crashes due to this, the program stays open without errors and blocks at 0.8mb memory usage.

Contributor

etcimon commented Mar 21, 2014

When vibe.d crashes due to this, the program stays open without errors and blocks at 0.8mb memory usage.

@s-ludwig s-ludwig closed this in b9a63ab Mar 21, 2014

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Mar 21, 2014

Member

Good catch! Not sure, but that could also be a reason for growing memory use due to false pointers. A quick search through the code base didn't yield and other places where GC.removeRange is missing.

Would be good if you could briefly comment on #560.

Member

s-ludwig commented Mar 21, 2014

Good catch! Not sure, but that could also be a reason for growing memory use due to false pointers. A quick search through the code base didn't yield and other places where GC.removeRange is missing.

Would be good if you could briefly comment on #560.

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 21, 2014

Contributor

growing memory use due to false pointers

I'm working on that bug right now, through an http connection the allocators register about 90mb in the vibe.utils.memory yet it shows about 250mb of memory usage in the task manager, and it comes down to a realistic value if I throw an exception in the task. I'm wondering if the task's stack wouldn't get flushed properly when the task is freed.

Would be good if you could briefly comment on #560.

ok I'm on it :)

Contributor

etcimon commented Mar 21, 2014

growing memory use due to false pointers

I'm working on that bug right now, through an http connection the allocators register about 90mb in the vibe.utils.memory yet it shows about 250mb of memory usage in the task manager, and it comes down to a realistic value if I throw an exception in the task. I'm wondering if the task's stack wouldn't get flushed properly when the task is freed.

Would be good if you could briefly comment on #560.

ok I'm on it :)

@Extrawurst

This comment has been minimized.

Show comment
Hide comment
@Extrawurst

Extrawurst Mar 21, 2014

Contributor

I'm wondering if the task's stack wouldn't get flushed properly when the task is freed.

I had the same impression today when my tasks suddenly all were out of stack memory (never saw that exception b4)

Contributor

Extrawurst commented Mar 21, 2014

I'm wondering if the task's stack wouldn't get flushed properly when the task is freed.

I had the same impression today when my tasks suddenly all were out of stack memory (never saw that exception b4)

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 21, 2014

Contributor

I had the same impression today when my tasks suddenly all were out of stack memory (never saw that exception b4)

/**
 * Resets this fiber so that it may be re-used, optionally with a
 * new function/delegate.  This routine may only be called for
 * fibers that have terminated, as doing otherwise could result in
 * scope-dependent functionality that is not executed.
 * Stack-based classes, for example, may not be cleaned up
 * properly if a fiber is reset before it has terminated.
 *
 * In:
 *  This fiber must be in state TERM.
 */
final void reset();

I couldn't find the use of this method anywhere, it seems necessary for cleaning up the task though

Contributor

etcimon commented Mar 21, 2014

I had the same impression today when my tasks suddenly all were out of stack memory (never saw that exception b4)

/**
 * Resets this fiber so that it may be re-used, optionally with a
 * new function/delegate.  This routine may only be called for
 * fibers that have terminated, as doing otherwise could result in
 * scope-dependent functionality that is not executed.
 * Stack-based classes, for example, may not be cleaned up
 * properly if a fiber is reset before it has terminated.
 *
 * In:
 *  This fiber must be in state TERM.
 */
final void reset();

I couldn't find the use of this method anywhere, it seems necessary for cleaning up the task though

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Mar 22, 2014

Member

Fibers just keep running continuously and just grab the next task callback after the previous one has returned (or thrown). The stack at that point will have been rewinded as usual (if not, that would be a severe compiler issue). It's also just 64 kiB per stack by default, so it must be a lot of tasks to get >100 MB.

Regarding the stack overflow, I think I'll increase the default stack size to something much larger, so that this will not happen as easy. It will have the side effect that 32-bit application will manually need to tune down the value if thousands of tasks need to run in parallel, but that seems to be a lot more predictable than running out of stack memory.

Member

s-ludwig commented Mar 22, 2014

Fibers just keep running continuously and just grab the next task callback after the previous one has returned (or thrown). The stack at that point will have been rewinded as usual (if not, that would be a severe compiler issue). It's also just 64 kiB per stack by default, so it must be a lot of tasks to get >100 MB.

Regarding the stack overflow, I think I'll increase the default stack size to something much larger, so that this will not happen as easy. It will have the side effect that 32-bit application will manually need to tune down the value if thousands of tasks need to run in parallel, but that seems to be a lot more predictable than running out of stack memory.

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 22, 2014

Contributor

It's also just 64 kiB per stack by default, so it must be a lot of tasks to get >100 MB.

That could still explain why the GC may have pointers stuck in there. I made a little test here:
https://github.com/globecsys/vibe.d/commit/e176416978b755e1055a0c17292d9b2ad793a046

The process leaves everything created on the stack of the handler, in the GC heap. I suppose this is because the fiber stack isn't flushed and keeps the array to those scope strings?

For example, the process takes 26 MB of space even after it returns from the handler, since its an array the size it can use in the gc is limitless I think. Refresh the page 15 times and you're Out Of Memory

Isn't there a way to run removeRange on the fiber stack until it's woken up?

Contributor

etcimon commented Mar 22, 2014

It's also just 64 kiB per stack by default, so it must be a lot of tasks to get >100 MB.

That could still explain why the GC may have pointers stuck in there. I made a little test here:
https://github.com/globecsys/vibe.d/commit/e176416978b755e1055a0c17292d9b2ad793a046

The process leaves everything created on the stack of the handler, in the GC heap. I suppose this is because the fiber stack isn't flushed and keeps the array to those scope strings?

For example, the process takes 26 MB of space even after it returns from the handler, since its an array the size it can use in the gc is limitless I think. Refresh the page 15 times and you're Out Of Memory

Isn't there a way to run removeRange on the fiber stack until it's woken up?

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 22, 2014

Contributor

If I add GC.free(keys.ptr); the memory leaks are resolved within fibers. You can refresh the page and see the memory go up to ~60MB and max out there. I'd have to make it a habit for now until we find a more automatic workaround. :)

Contributor

etcimon commented Mar 22, 2014

If I add GC.free(keys.ptr); the memory leaks are resolved within fibers. You can refresh the page and see the memory go up to ~60MB and max out there. I'd have to make it a habit for now until we find a more automatic workaround. :)

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Mar 22, 2014

Member

You could try to use "buildOptions": ["stackStomping"] (-gx compile switch) and compare the results to make sure if it is because of stack based pointers or something else. Regarding memory usage, it's also important to note that the GC currently (AFAIK) never gives back memory to the OS, so the memory usage reported will never decline if only GC memory is used.

This one seems strange, though, I'll try to reproduce it later, too.

Isn't there a way to run removeRange on the fiber stack until it's woken up?

Only in an unsafe and non-portable way it seems. I think we'd also need to clear the whole stack to make sure that the pointers won't still affect things when the fiber is reused.

Member

s-ludwig commented Mar 22, 2014

You could try to use "buildOptions": ["stackStomping"] (-gx compile switch) and compare the results to make sure if it is because of stack based pointers or something else. Regarding memory usage, it's also important to note that the GC currently (AFAIK) never gives back memory to the OS, so the memory usage reported will never decline if only GC memory is used.

This one seems strange, though, I'll try to reproduce it later, too.

Isn't there a way to run removeRange on the fiber stack until it's woken up?

Only in an unsafe and non-portable way it seems. I think we'd also need to clear the whole stack to make sure that the pointers won't still affect things when the fiber is reused.

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 22, 2014

Contributor

I understand the memory won't be returned to the OS, but the problem I pointed out is that it doesn't max out after every time the fibers return, even if the data pointers are out of scope. The data within fibers needs to be manually delete'd

You could try to use "buildOptions": ["stackStomping"](-gx compile switch)

That didn't work,, I just tried it.

I think we'd also need to clear the whole stack to make sure that the pointers won't still affect things when the fiber is reused.

That would be the best option on the long run I guess

Contributor

etcimon commented Mar 22, 2014

I understand the memory won't be returned to the OS, but the problem I pointed out is that it doesn't max out after every time the fibers return, even if the data pointers are out of scope. The data within fibers needs to be manually delete'd

You could try to use "buildOptions": ["stackStomping"](-gx compile switch)

That didn't work,, I just tried it.

I think we'd also need to clear the whole stack to make sure that the pointers won't still affect things when the fiber is reused.

That would be the best option on the long run I guess

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Mar 22, 2014

Member

You could try to use "buildOptions": "stackStomping"

That didn't work,, I just tried it.

So if I remember right what -gx does, there must be another source of false pointers apart from the stack - or the GC behaves weird for a different reason...

Member

s-ludwig commented Mar 22, 2014

You could try to use "buildOptions": "stackStomping"

That didn't work,, I just tried it.

So if I remember right what -gx does, there must be another source of false pointers apart from the stack - or the GC behaves weird for a different reason...

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 22, 2014

Contributor

or the GC behaves weird for a different reason...

So it's not a known issue I guess. I'll try running druntime and phobos in a debug build to see where these ranges come from. VisualD shows the sources when debugging. A good strategy for GC debugging that I've tried was monitoring mark, addRange and removeRange using printf and redirecting stdout to a file, find a pattern that matches the calls. Or I can assert in addRange on the pointer of string[] keys and see the call stack to find where the range was added... Do you know of a better way?

Contributor

etcimon commented Mar 22, 2014

or the GC behaves weird for a different reason...

So it's not a known issue I guess. I'll try running druntime and phobos in a debug build to see where these ranges come from. VisualD shows the sources when debugging. A good strategy for GC debugging that I've tried was monitoring mark, addRange and removeRange using printf and redirecting stdout to a file, find a pattern that matches the calls. Or I can assert in addRange on the pointer of string[] keys and see the call stack to find where the range was added... Do you know of a better way?

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Mar 22, 2014

Member

Not really, I had once started to try and abuse the GC proxy interface to intercept calls to the GC without modifying druntime, but then got distracted by other things. But I really think that there should be a pluggable library for memory debugging (ideally in Phobos), so whenever I get back to that (and nothing exists, yet), I'd start making one.

Member

s-ludwig commented Mar 22, 2014

Not really, I had once started to try and abuse the GC proxy interface to intercept calls to the GC without modifying druntime, but then got distracted by other things. But I really think that there should be a pluggable library for memory debugging (ideally in Phobos), so whenever I get back to that (and nothing exists, yet), I'd start making one.

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 22, 2014

Contributor

whenever I get back to that (and nothing exists, yet), I'd start making one.

I think the ideal would be to have it as a profiler library with different output devices such as a database (sqlite?) because it could end up capturing millions of entries every few seconds.

Also, A good GC library would capture the call stack, time frame, pointee data too.

Filtering in a DB would involve using WHERE on the pointee or LIKE %% something in the call stack

Contributor

etcimon commented Mar 22, 2014

whenever I get back to that (and nothing exists, yet), I'd start making one.

I think the ideal would be to have it as a profiler library with different output devices such as a database (sqlite?) because it could end up capturing millions of entries every few seconds.

Also, A good GC library would capture the call stack, time frame, pointee data too.

Filtering in a DB would involve using WHERE on the pointee or LIKE %% something in the call stack

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Mar 22, 2014

Member

I'd say it depends. The versions of such debugging tools that I did in the past always used an array backed hash map like the one in vibe.d and removed entries as soon as they would get unreferenced. Since each entry then only occupies a couple of bytes (possibly some more for a stack trace), they usually went well as pure in-memory solutions (because amount of actual memory >> amount of hash map entry memory).

But for very demanding applications (and of course sampling for CPU profiling) I'd say just appending new entries to a binary file should be pretty optimal (possibly with something like a "snappy" compressor in-between). Using a generic database sounds like it should be noticeably slower.

Member

s-ludwig commented Mar 22, 2014

I'd say it depends. The versions of such debugging tools that I did in the past always used an array backed hash map like the one in vibe.d and removed entries as soon as they would get unreferenced. Since each entry then only occupies a couple of bytes (possibly some more for a stack trace), they usually went well as pure in-memory solutions (because amount of actual memory >> amount of hash map entry memory).

But for very demanding applications (and of course sampling for CPU profiling) I'd say just appending new entries to a binary file should be pretty optimal (possibly with something like a "snappy" compressor in-between). Using a generic database sounds like it should be noticeably slower.

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 22, 2014

Contributor

But for very demanding applications (and of course sampling for CPU profiling) I'd say just appending new entries to a binary file should be pretty optimal (possibly with something like a "snappy" compressor in-between). Using a generic database sounds like it should be noticeably slower.

That could be an option in some settings. Sometimes having accesses, sometimes having additions, sometimes having removals, sometimes only the remaining ranges when the profiler is stopped. The output could be chosen to be binary, database, hashmap, etc.

I'd say every situation is a little different, and it wouldn't be too much work to make it more flexible. I'm sure I could put some time on it because it would help solve this problem here within a few minutes to be able to search for the identifier and the ranges that refer to it, and then pull the call stack of those ranges.

Contributor

etcimon commented Mar 22, 2014

But for very demanding applications (and of course sampling for CPU profiling) I'd say just appending new entries to a binary file should be pretty optimal (possibly with something like a "snappy" compressor in-between). Using a generic database sounds like it should be noticeably slower.

That could be an option in some settings. Sometimes having accesses, sometimes having additions, sometimes having removals, sometimes only the remaining ranges when the profiler is stopped. The output could be chosen to be binary, database, hashmap, etc.

I'd say every situation is a little different, and it wouldn't be too much work to make it more flexible. I'm sure I could put some time on it because it would help solve this problem here within a few minutes to be able to search for the identifier and the ranges that refer to it, and then pull the call stack of those ranges.

@s-ludwig

This comment has been minimized.

Show comment
Hide comment
@s-ludwig

s-ludwig Mar 22, 2014

Member

I was rather thinking in the direction of implementing a single solution that is fast for the most demanding case, so that it should also automatically be fast for the simpler cases. But then of course that may be cases where it may be desirable/necessary to send the data over the network because of limited storage or similar.. so adding multiple ways to achieve the goal would certainly not hurt ;)

Member

s-ludwig commented Mar 22, 2014

I was rather thinking in the direction of implementing a single solution that is fast for the most demanding case, so that it should also automatically be fast for the simpler cases. But then of course that may be cases where it may be desirable/necessary to send the data over the network because of limited storage or similar.. so adding multiple ways to achieve the goal would certainly not hurt ;)

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 22, 2014

Contributor

I'm thinking the GC proxy would have to be template to achieve the best introspection ability. Would you think that's possible?

Contributor

etcimon commented Mar 22, 2014

I'm thinking the GC proxy would have to be template to achieve the best introspection ability. Would you think that's possible?

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 22, 2014

Contributor

http://forum.dlang.org/thread/lao9fn$1d70$1@digitalmars.com?page=1

Type info is a prerequisite to a GC debugging library. Without it, there would be no additional information to show other than a pointer address & size, and there wouldn't be a way of finding a call stack for the range that caused an invalid pointer to be marked, or a way to find the ranges that leaked data, or even a way to resolve the size/identifier/type of each variable that still exists in the GC.

However, I'm quite certain that it would work out if the library was hard-wired with GDB to keep track of where each pointer comes from, and then used dscanner to resolve the identifiers & type info and attempt a semantic call stack. It seems realizable that way you think?

Contributor

etcimon commented Mar 22, 2014

http://forum.dlang.org/thread/lao9fn$1d70$1@digitalmars.com?page=1

Type info is a prerequisite to a GC debugging library. Without it, there would be no additional information to show other than a pointer address & size, and there wouldn't be a way of finding a call stack for the range that caused an invalid pointer to be marked, or a way to find the ranges that leaked data, or even a way to resolve the size/identifier/type of each variable that still exists in the GC.

However, I'm quite certain that it would work out if the library was hard-wired with GDB to keep track of where each pointer comes from, and then used dscanner to resolve the identifiers & type info and attempt a semantic call stack. It seems realizable that way you think?

@etcimon

This comment has been minimized.

Show comment
Hide comment
@etcimon

etcimon Mar 22, 2014

Contributor

I thought it out pretty well and I think I'll make a GDB plugin that logs the backtrace, sizes, pointers at key points in druntime/gc/gc.d to feed it in a database and then spits out reports of what keeps the biggest data alive. That would help for 90% of memory leaks, including this one where I suppose the fiber stacks are keeping limitless string allocations alive everywhere...

Contributor

etcimon commented Mar 22, 2014

I thought it out pretty well and I think I'll make a GDB plugin that logs the backtrace, sizes, pointers at key points in druntime/gc/gc.d to feed it in a database and then spits out reports of what keeps the biggest data alive. That would help for 90% of memory leaks, including this one where I suppose the fiber stacks are keeping limitless string allocations alive everywhere...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment