-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ opaque handle implementation is based on undefined behaviour #119
Comments
I believe that this pattern is fine, because the objects are not accessed through a pointer of the respective API type: the implementations of all functions cast the opaque API pointer back to its real type first. (I could be wrong, but that's my reading of the spec and the code. FWIW, UBSan is currently not complaining.) |
Hi, I am UBsan, and I'm currently complaining about this. |
Absence of evidence is not evidence of absence. See also class.mfct.non-static/2:
|
Do folks have any suggestion how to achieve implementation independence here without using such a cast or making all methods static? (Though, practically speaking, V8 itself relies on patterns like this so fundamentally that I doubt the ones here ever matter.) |
Well played :-)
I know, and in particular UBSan only provides partial coverage. Hence "FWIW".
TIL. Well, for whatever that's worth, this is a common pattern for libraries, so it's unlikely that compilers will be able to afford to break it, but it's good to know that this actually is UB. In that case, I would support moving to a different design, however I don't have a good suggestion for what might be such a design. |
@rossberg do you have a link to where this is done in V8? Would be interested in having a look! I'll have another look to tomorrow and see if I can think of another possibility which maintains the performance characteristics while being safer standards-wise and bit easier to work with, unless someone beats me to it 😄. |
@TartanLlama, just look at the implementation of the factory class. Pretty much every allocation function there goes through some of various general mechanisms for allocating on the GC'ed heap and then casts the result to one of the gazillion object classes. With everything going on in a real-world GC, I have a hard time imagining a sane alternative to this approach. And there probably are many far worse uses of what's technically UB all over the place. Would be a fun challenge to write a production VM in C++ without any UB. I would make any bet that that's outright impossible. :) |
@rossberg : V8's heap objects actually don't use this pattern any more :-) But V8's API still uses it, and AFAIK it's a fairly common pattern for libraries to have public types that are essentially opaque pointers that get reinterpret_casted to their internal equivalents. The benefits are that (1) the internal classes' implementations are hidden from the public API, and (2) you can use idiomatic member function calls on the external types. Unless I'm missing something, using inheritance would break (1) for non-trivial class hierarchies (maybe unless one uses multiple inheritance?), while switching to static functions would obviously break (2). Maybe something could be built using composition with a forward-declared pointer to the internal object, roughly:
But I haven't really thought through the implications of that yet, e.g. regarding possible overhead, construction/destruction/lifetime management issues, or any non-obvious corner cases.
My understanding is that strictly speaking it's even impossible to implement hello-world in C++ without relying on UB, because one has to assume that any program requires a non-zero amount of stack space, which might not be available in some environments, and behavior on stack overflow is undefined. |
There's a straight-forward fix:
Then implementers can:
Example:
and impl-wasm.cc has:
A few comments on this approach:
which is necessary if we expect the implementer to extend the size of the object when deriving.
The other main alternative is to use |
This comment serves as something of a footnote. As far as I know, the existing header file can be implemented without UB in C++17. You just have to construct an object with only a single deleted constructor. No problem.
Note that the base object initialization used This is no longer possible in C++20 because classes with constructors (including deleted constructors) are not considered aggregates for the purposes of uniform initialization. |
@nlewycky, thanks for that! The only downside I see is a further increase of the boilerplate in the header file's class definitions, but this is C++, so who cares. I assume an empty deleter struct will not increase the size of the unique_ptr object? (The current C implementation on top of the C++ API does some really verboten reinterpret casts between |
@rossberg Strictly speaking there's no guarantee about the size or representation of a unique_ptr in any case, but I'd be surprised if you find any production-quality C++ standard library where a unique_ptr with an empty deleter is not simply the pointer. If you were happy with Related: https://stackoverflow.com/questions/13460395/how-can-stdunique-ptr-have-no-size-overhead |
Can I interest you in creating a PR for making this change? (including adapting the wasm-v8.cc prototype?) :) |
I've mostly implemented it in the draft PR #161. There's a pre-existing bug with Shared<> that I'm not sure what approach to take to address and am looking for input. wasm-v8.cc defines an explicit template specialization for
(not discussed: partial templates or template templates) It's incorrect to call an explicit specialization when expecting an instantiation per https://eel.is/c++draft/temp.expl.spec#7.sentence-1 . So this combination:
wasm-v8.cc:
is a non-starter. (If it helps, you can imagine that an explicit specialization has a different type signature, so it could be mangled differently, though both popular ABIs chose to mangle them the same which is why it still links.) Any code that might trigger implicit instantiation of the declaration of d'tor/destroy() must know that this is defined with an explicit specialization. How would we do that? We could forbid C++ implementations from using an explicit specialization. We could tell users that it's normal to have to include a second vendor-specific header file even though the entire interface defined in wasm.hh is portable at the source level. We could declare all the explicit specializations in wasm.hh and require all implementations to define for each specialization? We could replace |
Thanks a lot for the help, I'll have a look at your PR soon. As for Shared, IIRC my intention was that the template specialisation for each sharable type (currently, only Module) be declared in wasm.hh, since the implementation is always going to be type-specific. I just forgot doing that, and the toolchain did not complain. Do you foresee a problem with that path? (The reason for using a template is so that there is a natural, uniform way for writing shared types.) |
I think there might be an issue with that, but we can defer it to another issue if there is. Making a specialization of a struct means that you replace the entire innards of the struct.
There's no need for the specialization's contents to have any relationship with the template pattern, so the API shown in wasm.hh isn't necessarily the API that the specialization will have. Offhand I'm not sure if there's any special-case rules that come into play if the specialization is declared but not defined and a template pattern is provided, etc. |
Ah, I may have been using inaccurate terminology above. I meant that the class template specialisation for My understanding is that a class template specialization that is declared but not defined yields an incomplete type, so is only useful as a forward declaration. |
basic.lval/11:
The implementation of opaque handles dynamically allocates an object of one type and casts it to an unrelated type. Accessing the object through a pointer to one of the API types (e.g. in order to call
copy
orkind
) is then undefined behaviour.Is there documentation somewhere which outlines the use of this strategy as opposed to inheritence- or composition-based ones which don't break this rule? I realise that it may work on compilers which this has been tested with, but it seems somewhat risky to mandate UB through the interface, as implementations may change, or this may trigger sanitizer fails.
The text was updated successfully, but these errors were encountered: