What is the purpose of InterfaceProxy? #48

Boris-Barboris · 2018-01-11T01:42:55Z

Hi again.

I'm currently profiling vibe-d with Valgrind and after an obvious json hog my attention falls on quite a big number of calls to eventcore ChoppedVector opIndex function.

This is a trace from a web-service answering to 1000 subsequent (keepalive) HTTP requests from one persistent TCP connection to PSQL. add and releaseRef call numbers are monstrous, AlgebraicVector is not slacking. TCPConnection handle is destroyed around 50k times. And the criminal doesn't hide - it's an InterfaceProxy facility.

I never looked at it before, but now I did, and I have a series of questions:

1). Why was it written, what purpose was it serving? It looks like a thing to write in a language without both generics and polymorphism, yet you need them both just to implement such thing.
2). Why not just use uniquePtr-alike structs to pass stuff around. Why all this postblit\destruction on every function call... Basically, what was an original idea?
3). Is there any reason besides an obvious time investment requirements to not want to purge it completely from both vibe-core and the whole vibe-d?

Boris-Barboris · 2018-01-16T01:02:40Z

@s-ludwig

So, i did completely purge InterfaceProxy from vibe-core and vibe.d and made TCPConnection a class instead of a structure.
Vibe-core: Boris-Barboris@2229aec
Vibe-d: Boris-Barboris/vibe.d@ca28d20

Results (1 process, 1 thread, localhost):

Debug build:

echo server ab -n 100000 -k -c 1
old: Requests per second: 13630.73 [#/sec] (mean)
purged: Requests per second: 15522.40 [#/sec] (mean)

echo server ab -n 100000 -k -c 32
old: Requests per second: 14781.74 [#/sec] (mean)
purged: Requests per second: 18893.18 [#/sec] (mean)

Release build:

echo server ab -n 100000 -k -c 32
old: Requests per second: 31315.76 [#/sec] (mean)
purged: Requests per second: 38037.71 [#/sec] (mean)

s-ludwig · 2018-01-16T08:36:34Z

Okay, so the only reason why InterfaceProxy exists is because of the move away from GC allocated classes to reference counted and efficiently stored struct based primitives. The problem is simply that quite some parts, especially in the HTTP server, still rely on polymorphism. I've started to introduce more static typing in the http_speedup branch to eventually get rid of InterfaceProxy, but the worst offenders are still there.

Basically the plan is to rewrite the whole HTTP module anyway (see https://github.com/vibe-d/vibe-http/wiki for an rough initial sketch of the planned changes), so that I've pretty much accepted that there will be a performance regression in the meantime (especially since it has already been compensated by the performance tuning in other areas during the last releases).

If it weren't for the InterfaceProxy issue, the RC/struct based approach would be a good bit faster than the old class based approach, even if there is a (too) high number of opIndex calls (with -inline the cost should be pretty low anyway, at least if the CPU cache can cope).

Boris-Barboris · 2018-01-16T09:40:38Z

move away from GC allocated classes to reference counted and efficiently stored struct based primitives

Except that the reference counting is the least efficient way. You are providing inversion-of-control framework, there is nothing bad about calling destroy\close methods manually in library code. And if it's not fancy enough, unique_ptr is the preferred choice.

Also, structs are pain in the ass when you actually need reference semantics. Final classes (that don't implement an Interface btw, not like it's done in eventcore) that are statically asserted to look like an interface, IMO, are better.

s-ludwig · 2018-01-16T09:48:13Z

Well, unique_ptr is not a practical solution for objects of this kind, especially since the API currently allows to freely copy references around and such a change would simply break all code in existence. Calling destroy manually would either still rely on the GC, which is the main goal to avoid, mainly due to pause times and occasional unexplainable leaks (possibly caused by false-pointers), or it would leave dangling pointers around, making it non-@safe.

The structs already have reference semantics, so that shouldn't be the problem, they really only manage thereference counting using RAII. The reference counting is also thread-local, so it's actually pretty lightweight.

Another big benefit of this approach is that the payload can actually stored within the same single array/ChoppedVector as the bookkeeping data that is required for the basic low-level operations, making it very cache friendly and zero-allocation/deallocation-cost.

Boris-Barboris · 2018-01-16T10:10:20Z

is not a practical solution for objects of this kind, especially since the API currently allows to freely copy references around and such a change would simply break all code in existence

Why not. What kind of objects do you want to guard btw, just to make sure we are on common ground? HTTPrequest\responses perhaps, what else? Task-local memory? Also, I have a feeling you are trying to enforce unenforsible. Just saying "do not escape this reference to a socket and send data to it three hours later from another thread" is absolutely fine. Programmers are not baboons. Well, mostly.

Calling destroy manually would either still rely on the GC, which is the main goal to avoid, mainly due to pause times and occasional unexplainable leaks (possibly caused by false-pointers), or it would leave dangling pointers around, making it non-@safe.

Destroy relies on D runtime and calls class or struct destructor. GC is mostly uninvolved. Memory is still there. You can then insert a bunch of asserts around that raise when destroyed struct\class method is invoked.

The structs already have reference semantics, so that shouldn't be the problem, they really only manage thereference counting using RAII. The reference counting is also thread-local, so it's actually pretty lightweight.

Not really, they don't, they are copied and postblited tens of thousands of times around inside those InterfaceProxy shenanigans you can see in the graph in the first post. It is so expensive, that it took 15-20% of time on the echo server. The point of reference semantics is, that when you pass reference to a function, no code is run. There is a chain of 20+ calls from http layer to the eventcore, each involves passing structs. You simply cannot write postblit code for those structs and to expect no penalty.

Another big benefit of this approach is that the payload can actually stored within the same single array/ChoppedVector as the bookkeeping data that is required for the basic low-level operations, making it very cache friendly and zero-allocation/deallocation-cost.

I don't mind custom allocators, but judging by what i see, most of it (allocation) happens in user code, not the library one.

s-ludwig · 2018-01-16T18:02:29Z

is not a practical solution for objects of this kind, especially since the API currently allows to freely copy references around and such a change would simply break all code in existence

Why not. What kind of objects do you want to guard btw, just to make sure we are on common ground? HTTPrequest\responses perhaps, what else? Task-local memory? Also, I have a feeling you are trying to enforce unenforsible. Just saying "do not escape this reference to a socket and send data to it three hours later from another thread" is absolutely fine. Programmers are not baboons. Well, mostly.

I have to disagree there. Experience suggests that such errors (race conditions, use after free, etc.) are extremely common and having the compiler enforce correct use is worth a ton. This is especially true for a library that is most often used in the context of web facing applications.

Basically all low-level primitives would be affected, TCPConnection, Timer, File, DirectoryWatcher. HTTP requests/responses on the other hand could indeed be candidates for unique_ptr/scope, since it is presumably very unlikely that there is much code, if any, that passes those around.

Calling destroy manually would either still rely on the GC, which is the main goal to avoid, mainly due to pause times and occasional unexplainable leaks (possibly caused by false-pointers), or it would leave dangling pointers around, making it non-@safe.

Destroy relies on D runtime and calls class or struct destructor. GC is mostly uninvolved. Memory is still there. You can then insert a bunch of asserts around that raise when destroyed struct\class method is invoked.

That's what I meant with the "still rely on the GC" case. But that's just the status quo of the old vibe-d:core module, which is in a dead-end performance-wise.

The structs already have reference semantics, so that shouldn't be the problem, they really only manage thereference counting using RAII. The reference counting is also thread-local, so it's actually pretty lightweight.

Not really, they don't, they are copied and postblited tens of thousands of times around inside those InterfaceProxy shenanigans you can see in the graph in the first post. It is so expensive, that it took 15-20% of time on the echo server. The point of reference semantics is, that when you pass reference to a function, no code is run. There is a chain of 20+ calls from http layer to the eventcore, each involves passing structs. You simply cannot write postblit code for those structs and to expect no penalty.

"Reference semantics" as I know the term refers to the fact that when you copy the "entity", it will still point to the same data, so that changes in the data will be visible through all copies of that entity. shared_ptr/RefCounted, although running code, are still definitely references in that sense.

You can also get rid of most of those calls by simply passing the TCPConnection as ref or auto ref. And InterfaceProxy will be eliminated at some point, no matter if structs or classes are used, so that these calls will more or less vanish completely in the future.

Another big benefit of this approach is that the payload can actually stored within the same single array/ChoppedVector as the bookkeeping data that is required for the basic low-level operations, making it very cache friendly and zero-allocation/deallocation-cost.

I don't mind custom allocators, but judging by what i see, most of it (allocation) happens in user code, not the library one.

The problem is that every allocation increases the total cost. So while the user may still have to allocate more memory, that would mean just one allocation per connection (for example) and all of the library part is for free. If that is the only allocation on the user side, having another one for the connection itself doubles the costs, which can already be a significant contribution to the total processing time.

The most efficient solution would definitely be to let the user code allocate everything, including the memory for the library parts, but that currently can only be done in a @safe way using the GC. Especially the most efficient way to allocate, using the stack, relies on the programmer to avoid memory errors and is also quite error prone due to things like D's movable structs. DIP1000 will improve the situation quite a bit, but even then this approach would also mean that the library iterface is pretty cumbersome in cases where top performance is not needed. The ref counting approach provides a very attractive middleground there and, if excessive copies are avoided, really barely has any overhead.

Boris-Barboris · 2018-01-16T18:28:15Z

Experience suggests that such errors (race conditions, use after free, etc.) are extremely common and having the compiler enforce correct use is worth a ton. This is especially true for a library that is most often used in the context of web facing applications.

As long as it does not encumber an app's efficiency, I can agree. I just wouldn't like to end up in the situation, when 99% of people get "punished" by an architecture, designed to restrict the other 1% on very strange edge cases.

I am of an opinion, that speed, undiminished by expressiveness is the only thing that D has going for it on the web arena. If we don't endorce speed, we'd all be beter off on fucking Java.

You can also get rid of most of those calls by simply passing the TCPConnection as ref or auto ref. And InterfaceProxy will be eliminated at some point, no matter if structs or classes are used, so that these calls will more or less vanish completely in the future.

That would be nice.

The ref counting approach provides a very attractive middleground there and, if excessive copies are avoided, really barely has any overhead.

Yes. I guess I understand you. Thank you for your time.

Boris-Barboris closed this as completed Jan 16, 2018

Boris-Barboris mentioned this issue Jan 17, 2018

My two cents vibe-d/vibe-http#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the purpose of InterfaceProxy? #48

What is the purpose of InterfaceProxy? #48

Boris-Barboris commented Jan 11, 2018

Boris-Barboris commented Jan 16, 2018 •

edited

Loading

s-ludwig commented Jan 16, 2018

Boris-Barboris commented Jan 16, 2018

s-ludwig commented Jan 16, 2018

Boris-Barboris commented Jan 16, 2018 •

edited

Loading

s-ludwig commented Jan 16, 2018

Boris-Barboris commented Jan 16, 2018

What is the purpose of InterfaceProxy? #48

What is the purpose of InterfaceProxy? #48

Comments

Boris-Barboris commented Jan 11, 2018

Boris-Barboris commented Jan 16, 2018 • edited Loading

s-ludwig commented Jan 16, 2018

Boris-Barboris commented Jan 16, 2018

s-ludwig commented Jan 16, 2018

Boris-Barboris commented Jan 16, 2018 • edited Loading

s-ludwig commented Jan 16, 2018

Boris-Barboris commented Jan 16, 2018

Boris-Barboris commented Jan 16, 2018 •

edited

Loading

Boris-Barboris commented Jan 16, 2018 •

edited

Loading