-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gnr::forwarder with std::bind -- static_assert(std::is_trivially_copyable<functor_type> ? #3
Comments
Basically: #include <iostream>
#include <functional>
void print_num(int i)
{
std::cout << i << '\n';
}
int main()
{
auto bound = std::bind(print_num, 31337);
static_assert(std::is_trivially_copyable<std::decay_t<decltype(bound)>>{},
"functor not trivially copyable");
auto b2 = bound;
return 0;
} Above the |
Yes, the |
Hmm... ok I think I see where you're coming from... But: Does a missing destructor produce faster code, even for trivially copyable objects, that don't really need a destructor? Or is the |
It's up to the compiler. You view the disassembly, try different things and
in the end produce something. Code that does not need to be called is best
for performance and smart pointers (heap allocation in general) are death
for performance.
I wrote the old delegate.hpp first, but I saw std::function was just as
good and sometimes better, so there was no need to keep it around and then
there was an article where a person implemented a forwarder lookalike and
showed it essentially compiled to very efficient code and so I used this
idea in forwarder.hpp.
https://avdgrinten.wordpress.com/2013/08/07/c-stdfunction-with-the-speed-of-a-macro/
2018-05-16 0:37 GMT+02:00 user706 <notifications@github.com>:
… Hmm... ok I think I see where you're coming from...
Does a missing destructor produce faster code, even for trivially copyable
objects, that don't really need a destructor?
Or is the static_assert there to avoid possible leaks (or late
deallocation (in the case of bound smart-pointers)) since the destructor
(corresponding to placement-new) is not called?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH6jVMjvCj4wLyoL74ElPkoqiHhjMv--ks5ty1iWgaJpZM4T-fLu>
.
|
BTW: You didn't try out memfun_ref in your tests, it's even faster than
memfun.
2018-05-16 1:37 GMT+02:00 janezz55 . <janezz55@gmail.com>:
… It's up to the compiler. You view the disassembly, try different things
and in the end produce something. Code that does not need to be called is
best for performance and smart pointers (heap allocation in general) are
death for performance.
I wrote the old delegate.hpp first, but I saw std::function was just as
good and sometimes better, so there was no need to keep it around and then
there was an article where a person implemented a forwarder lookalike and
showed it essentially compiled to very efficient code and so I used this
idea in forwarder.hpp.
https://avdgrinten.wordpress.com/2013/08/07/c-stdfunction-
with-the-speed-of-a-macro/
2018-05-16 0:37 GMT+02:00 user706 ***@***.***>:
> Hmm... ok I think I see where you're coming from...
>
> Does a missing destructor produce faster code, even for trivially
> copyable objects, that don't really need a destructor?
>
> Or is the static_assert there to avoid possible leaks (or late
> deallocation (in the case of bound smart-pointers)) since the destructor
> (corresponding to placement-new) is not called?
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#3 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AH6jVMjvCj4wLyoL74ElPkoqiHhjMv--ks5ty1iWgaJpZM4T-fLu>
> .
>
|
I think I know why you demand a type that does not need an explicit destructor. Because So how does one handle destruction of a previously placement-new constructed type, when one does not know the type. buffer->~MyType(); // but one does not know MyType anymore ? The "simplest solution" is to just bypass the problem, by just demanding a type who's destructor does nothing... |
The type of the object you erase is "remembered", since you instantiate a function, that acts as an invoker, i.e. the stub. Similarly you can instantiate functions that act as "deleters" and they remember the type too. Or you can instantiate a static class which remembers the type you erased and store a pointer to it. These are all complications, that affect performance, however. I saw from disassembly, that forwarder produced the fastest code of all alternatives, with caveats. |
Ah yes, thanks!
Very nice! However... this thing about the fastest code has caveats (as you write). It is very dependent on
Because to tell you the truth, in the following measurements But yes, if I want no heap, then struct A
{
A(): a(2) {}
int operator()(int val) { return val * a; }
int arr[8] = {}; // pad it fat!
int a;
};
int main()
{
A a;
std::function<int(int)> f1{std::ref(a)};
decltype(gnr::memfun<MEMFUN(A::times_a)>(a)) binder = gnr::memfun<MEMFUN(A::times_a)>(a);
std::function<int(int)> f2{binder};
gnr::forwarder<int(int), sizeof(A)> f3{a};
embxx::util::StaticFunction<int(int), sizeof(A)+4> f4{a};
for (int i = 0; i < 10; ++i) {
f1(i); // slow
f2(i); // slow
f3(i); // fast
f4(i); // fastest
}
} On my machine, the benchmark here gives the following output:
|
What numbers do you get, if you do: git clone https://github.com/user706/CxxFunctionBenchmark.git
cd CxxFunctionBenchmark/
mkdir build/
cd build/
cmake .. # or if you want to use a custom boost: cmake -DBOOST_ROOT=/path/to/boost ..
make -j4 VERBOSE=1
./various ? |
I've run my benchmark again... got new numbers... So in addition to:
one needs to add
This type of benchmarking stuff needs to be taken with a grain of salt, I think... |
I took one short glance at embxx/StaticFunction and your results are very
surprising, not only does it have a destructor, it even uses virtual
function, which is one additional way to "remember" the type-erased type.
For example:
class base
{
// bunch of virtuals
};
template <typename T>
class special<T>: public base
{
};
you can then access special<T> via base virtual methods. Essentially this
adds another level of indirection, a hidden function pointer. Anyway, I'm
surprised :)
2018-05-20 23:29 GMT+02:00 user706 <notifications@github.com>:
… I've run my benchmark again...
got new numbers...
https://pastebin.com/ETXb453D
So in addition to:
However... this thing about the fastest code has caveats (as you write).
It is very dependent on
- platform/architecture (i.e. 64 bit vs 32 bit; arm vs intel, etc.)
- compiler and c++standard used
- optimization settings
- what one is actually measuring (just function call, or including
time of constructor and possible destructor)
one needs to add
- what else is running on your PC, while you're running the benchmark
This type of benchmarking stuff needs to be taken with a grain of salt, I
think...
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH6jVNJ6cYAhyP0xY0VbAsD1cn6AAgIfks5t0eBHgaJpZM4T-fLu>
.
|
here are the results on my machine an old 64-bit i7. I did test the
forwarder on 32-bit arm and 64-bit arm (raspberry pi).
2018-05-21 1:33 GMT+02:00 janezz55 . <janezz55@gmail.com>:
I took one short glance at embxx/StaticFunction and your results are very
surprising, not only does it have a destructor, it even uses virtual
function, which is one additional way to "remember" the type-erased type.
For example:
class base
{
// bunch of virtuals
};
template <typename T>
class special<T>: public base
{
};
you can then access special<T> via base virtual methods. Essentially this
adds another level of indirection, a hidden function pointer. Anyway, I'm
surprised :)
2018-05-20 23:29 GMT+02:00 user706 ***@***.***>:
> I've run my benchmark again...
>
> got new numbers...
> https://pastebin.com/ETXb453D
>
> So in addition to:
>
> However... this thing about the fastest code has caveats (as you write).
> It is very dependent on
>
> - platform/architecture (i.e. 64 bit vs 32 bit; arm vs intel, etc.)
> - compiler and c++standard used
> - optimization settings
> - what one is actually measuring (just function call, or including
> time of constructor and possible destructor)
>
> one needs to add
>
> - what else is running on your PC, while you're running the benchmark
>
> This type of benchmarking stuff needs to be taken with a grain of salt, I
> think...
>
> —
> You are receiving this because you modified the open/close state.
> Reply to this email directly, view it on GitHub
> <#3 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AH6jVNJ6cYAhyP0xY0VbAsD1cn6AAgIfks5t0eBHgaJpZM4T-fLu>
> .
>
[size]
stdex::function<int(int)>: 24
std::function<int(int)>: 32
cxx_function::function<int(int)>: 32
multifunction<int(int)>: 32
boost::function<int(int)>: 32
func::function<int(int)>: 32
generic::delegate<int(int)>: 48
ssvu::FastFunc<int(int)>: 40
fu2::function<int(int)>: 32
fixed_size_function<int(int)>: 128
gnr_forwarder: 64
embxx_util_StaticFunction: 64
[function_pointer]
Perf< no_abstraction >: 0.2190502250 [s] {checksum: 0}
Perf< stdex::function<int(int)> >: 0.2912204150 [s] {checksum: 0}
Perf< std::function<int(int)> >: 0.2811304580 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 0.2838583470 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 0.2899502330 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 0.2860720720 [s] {checksum: 0}
Perf< func::function<int(int)> >: 0.2813547890 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 0.2560037310 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 0.2822438790 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 0.2900245950 [s] {checksum: 0}
Perf< gnr_forwarder >: 0.2900408110 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 0.2891810190 [s] {checksum: 0}
[compile_time_function_pointer]
Perf< no_abstraction >: 0.0650755000 [s] {checksum: 0}
Perf< stdex::function<int(int)> >: 0.2192806090 [s] {checksum: 0}
Perf< std::function<int(int)> >: 0.2194916450 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 0.1886739930 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 0.1879196400 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 0.2221924360 [s] {checksum: 0}
Perf< func::function<int(int)> >: 0.2190854170 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 0.1952578350 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 0.2254802030 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 0.2505953180 [s] {checksum: 0}
Perf< gnr_forwarder >: 0.2261936670 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 0.2246436990 [s] {checksum: 0}
[compile_time_delegate]
Perf< no_abstraction >: 0.0944580360 [s] {checksum: 0}
Perf< stdex::function<int(int)> >: 0.2256888380 [s] {checksum: 0}
Perf< std::function<int(int)> >: 0.2205044610 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 0.2378045040 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 0.2455463540 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 0.2195023050 [s] {checksum: 0}
Perf< func::function<int(int)> >: 0.1893228600 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 0.2353120100 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 0.2790692480 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 0.2755830610 [s] {checksum: 0}
Perf< gnr_forwarder >: 0.2504297450 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 0.2396675670 [s] {checksum: 0}
[lambda]
Perf< stdex::function<int(int)> >: 0.2199850760 [s] {checksum: 0}
Perf< std::function<int(int)> >: 0.2196872400 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 0.2226756710 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 0.1922039320 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 0.2255027130 [s] {checksum: 0}
Perf< func::function<int(int)> >: 0.1878392010 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 0.1886518520 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 0.2198399990 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 0.2503789400 [s] {checksum: 0}
Perf< gnr_forwarder >: 0.2320170140 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 0.2222990850 [s] {checksum: 0}
[lambda_capture]
Perf< stdex::function<int(int)> >: 0.2219075770 [s] {checksum: 0}
Perf< std::function<int(int)> >: 0.2261333630 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 0.2048382070 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 0.2198264010 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 0.2255272320 [s] {checksum: 0}
Perf< func::function<int(int)> >: 0.1876329790 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 0.2569013630 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 0.2578576160 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 0.2581670800 [s] {checksum: 0}
Perf< gnr_forwarder >: 0.2341760650 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 0.2237431580 [s] {checksum: 0}
[heavy_functor]
Perf< stdex::function<int(int)> >: 0.1880240370 [s] {checksum: 0}
Perf< std::function<int(int)> >: 0.2259007060 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 0.1886734220 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 0.1878468190 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 0.2198834010 [s] {checksum: 0}
Perf< func::function<int(int)> >: 0.1905719520 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 0.1944659180 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 0.2189201720 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 0.2396398960 [s] {checksum: 0}
Perf< gnr_forwarder >: 0.2278462760 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 0.2182110980 [s] {checksum: 0}
[non_assignable]
Perf< stdex::function<int(int)> >: 0.2216139180 [s] {checksum: 0}
Perf< std::function<int(int)> >: 0.2269405480 [s] {checksum: 0}
Perf< cxx_function::function<int(int)> >: 0.2386603470 [s] {checksum: 0}
Perf< multifunction<int(int)> >: 0.2422907120 [s] {checksum: 0}
Perf< boost::function<int(int)> >: 0.2256537900 [s] {checksum: 0}
Perf< func::function<int(int)> >: 0.1878584440 [s] {checksum: 0}
Perf< generic::delegate<int(int)> >: 0.3658719230 [s] {checksum: 0}
Perf< fu2::function<int(int)> >: 0.2831464890 [s] {checksum: 0}
Perf< fixed_size_function<int(int)> >: 0.2747338570 [s] {checksum: 0}
Perf< gnr_forwarder >: 0.2571728440 [s] {checksum: 0}
Perf< embxx_util_StaticFunction >: 0.2409786010 [s] {checksum: 0}
|
But essentially, your tests are flawed in the sense, that you don't define
NDEBUG while compiling, if there are asserts in the code, they will skew
the results.
2018-05-21 1:44 GMT+02:00 janezz55 . <janezz55@gmail.com>:
… here are the results on my machine an old 64-bit i7. I did test the
forwarder on 32-bit arm and 64-bit arm (raspberry pi).
2018-05-21 1:33 GMT+02:00 janezz55 . ***@***.***>:
> I took one short glance at embxx/StaticFunction and your results are very
> surprising, not only does it have a destructor, it even uses virtual
> function, which is one additional way to "remember" the type-erased type.
>
> For example:
>
> class base
> {
> // bunch of virtuals
> };
>
> template <typename T>
> class special<T>: public base
> {
> };
>
> you can then access special<T> via base virtual methods. Essentially this
> adds another level of indirection, a hidden function pointer. Anyway, I'm
> surprised :)
>
> 2018-05-20 23:29 GMT+02:00 user706 ***@***.***>:
>
>> I've run my benchmark again...
>>
>> got new numbers...
>> https://pastebin.com/ETXb453D
>>
>> So in addition to:
>>
>> However... this thing about the fastest code has caveats (as you write).
>> It is very dependent on
>>
>> - platform/architecture (i.e. 64 bit vs 32 bit; arm vs intel, etc.)
>> - compiler and c++standard used
>> - optimization settings
>> - what one is actually measuring (just function call, or including
>> time of constructor and possible destructor)
>>
>> one needs to add
>>
>> - what else is running on your PC, while you're running the benchmark
>>
>> This type of benchmarking stuff needs to be taken with a grain of salt,
>> I think...
>>
>> —
>> You are receiving this because you modified the open/close state.
>> Reply to this email directly, view it on GitHub
>> <#3 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AH6jVNJ6cYAhyP0xY0VbAsD1cn6AAgIfks5t0eBHgaJpZM4T-fLu>
>> .
>>
>
>
|
Anyway, I'm surprised, embxx_util_StaticFunction is more versatile than forwarder, produces more code, has virtual functions, yet is faster :) Maybe it's a warning to us, that the choice of a delegate is not as important as actually writing a useful app :) |
No, I do compile with |
Ha ha, never mind. If you have time, you can research why one delegate is better than another. It seems you are interested in this - but it really is not relevant a great deal - you can't always get a perfect compiler/architecture fit. As for me, I'm allergic about everything virtual, so even if a delegate using virtual member functions is faster, I am not going to use it :) BTW: He should have qualified the virtuals as final. |
But it has a stricter license...
yip! |
Ha well partially, but it's a rabbit hole that leads into a gigantic cave of vast proportions, and in the and one does not know up from down. You put it best here:
|
Still, you're not the only one who tried to test as many C++ delegates as
possible - every now an then someone embarks on this crusade. If you
publish an extensive benchmark people will take a look at it, since C++
delegates fascinate many people for some reason.
2018-05-23 14:47 GMT+02:00 user706 <notifications@github.com>:
… If you have time, you can research why one delegate is better than
another. It seems you are interested in this
Ha well partially, but it's a rabbit hole that leads into a gigantic cave
of vast proportions, and in the and one does not know up from down.
You put it best here:
Maybe it's a warning to us, that the choice of a delegate is not as
important as actually writing a useful app :)
—
You are receiving this because you modified the open/close state.
Reply to this email directly, view it on GitHub
<#3 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH6jVF-gv_TEtSDXe5IbQeifXKU-QLSZks5t1VpkgaJpZM4T-fLu>
.
|
I don't think I'll go beyond the few pull-requests I've done here. But just a thought... What about a different approach: run-time code generation approach!!? (Some JIT approaches could perhaps be used. ref) One would generate custom invocation code that is guaranteed to be the fastest possible. But that would need some time, and maybe I'm just dreaming and in the end it would not really be faster... |
Runtime code generation is so fancy, you better do it in your own repository. It's also architecture-dependent, unless you generate code for some virtual machine (like JVM). If you have an idea about your own delegate, just write one - I don't mind. I doubt though, that JIT is the way towards a faster delegate, C++ compilers are simply too good. Better spend your time on something else. I'm satisfied with forwarder.hpp, callback.hpp (which you didn't test at all) and memfun.hpp. If you figure out why others are faster, be sure to tell me :) |
Just some more thoughts: bothering with delegates isn't worth it. I suggest you find something worthwhile like graphics..., and implement stuff from that field. There's plenty of delegates to choose from, loool. Or maybe you could just use std::function<>. |
Hi,
Currently
gnr::forwarder
cannot take astd::bind
object, since the following static_assertstatic_assert(std::is_trivially_copyable<functor_type>{},""
(ref) will issue an error.Question: is that
static_assert
needed?Thanks.
Details:
If you try the following code
https://github.com/user706/code_generic/blob/master/code/intern/forwarder_ex2.cpp
building as follows:
and then change the following snippet https://github.com/user706/code_generic/blob/master/code/intern/forwarder_ex2.cpp#L7
to use
gnr::forwarder
, as followsthen you'll find that a rebuild with
make
will fail as followsThe offending lines, are the ones using
std::bind
--- here, here and here.However if one removes the following
static_assert
from
code/extern/generic_cmake/generic/forwarder.hpp
(ref), then it will compile.Question: is that
static_assert
needed?Thanks.
The text was updated successfully, but these errors were encountered: