C++ opaque handle implementation is based on undefined behaviour #119

TartanLlama · 2019-11-13T11:14:41Z

If a program attempts to access the stored value of an object through a glvalue whose type is not similar ([conv.qual]) to one of the following types the behavior is undefined:52

the dynamic type of the object,

a type that is the signed or unsigned type corresponding to the dynamic type of the object, or

a char, unsigned char, or std::byte type.

The implementation of opaque handles dynamically allocates an object of one type and casts it to an unrelated type. Accessing the object through a pointer to one of the API types (e.g. in order to call copy or kind) is then undefined behaviour.

Is there documentation somewhere which outlines the use of this strategy as opposed to inheritence- or composition-based ones which don't break this rule? I realise that it may work on compilers which this has been tested with, but it seems somewhat risky to mandate UB through the interface, as implementations may change, or this may trigger sanitizer fails.

The text was updated successfully, but these errors were encountered:

jakobkummerow · 2019-11-13T13:56:54Z

I believe that this pattern is fine, because the objects are not accessed through a pointer of the respective API type: the implementations of all functions cast the opaque API pointer back to its real type first. (I could be wrong, but that's my reading of the spec and the code. FWIW, UBSan is currently not complaining.)

strega-nil · 2019-11-13T17:53:53Z

Hi, I am UBsan, and I'm currently complaining about this.

peter-b · 2019-11-13T17:56:03Z

UBSan is currently not complaining.

Absence of evidence is not evidence of absence.

See also class.mfct.non-static/2:

If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.

rossberg · 2019-11-13T18:42:48Z

Do folks have any suggestion how to achieve implementation independence here without using such a cast or making all methods static? (Though, practically speaking, V8 itself relies on patterns like this so fundamentally that I doubt the ones here ever matter.)

jakobkummerow · 2019-11-13T18:44:02Z

Hi, I am UBsan, and I'm currently complaining about this.

Well played :-)

Absence of evidence is not evidence of absence.

I know, and in particular UBSan only provides partial coverage. Hence "FWIW".

If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined.

TIL. Well, for whatever that's worth, this is a common pattern for libraries, so it's unlikely that compilers will be able to afford to break it, but it's good to know that this actually is UB. In that case, I would support moving to a different design, however I don't have a good suggestion for what might be such a design.

TartanLlama · 2019-11-13T19:27:07Z

@rossberg do you have a link to where this is done in V8? Would be interested in having a look!

I'll have another look to tomorrow and see if I can think of another possibility which maintains the performance characteristics while being safer standards-wise and bit easier to work with, unless someone beats me to it 😄.

rossberg · 2019-11-13T19:58:08Z

@TartanLlama, just look at the implementation of the factory class. Pretty much every allocation function there goes through some of various general mechanisms for allocating on the GC'ed heap and then casts the result to one of the gazillion object classes. With everything going on in a real-world GC, I have a hard time imagining a sane alternative to this approach. And there probably are many far worse uses of what's technically UB all over the place.

Would be a fun challenge to write a production VM in C++ without any UB. I would make any bet that that's outright impossible. :)

jakobkummerow · 2019-11-13T21:09:15Z

@rossberg : V8's heap objects actually don't use this pattern any more :-)

But V8's API still uses it, and AFAIK it's a fairly common pattern for libraries to have public types that are essentially opaque pointers that get reinterpret_casted to their internal equivalents. The benefits are that (1) the internal classes' implementations are hidden from the public API, and (2) you can use idiomatic member function calls on the external types. Unless I'm missing something, using inheritance would break (1) for non-trivial class hierarchies (maybe unless one uses multiple inheritance?), while switching to static functions would obviously break (2).

Maybe something could be built using composition with a forward-declared pointer to the internal object, roughly:

//// public.h

namespace internal {
class Internal;
}
class Public {
 public:
  void foo();
 private:
  internal::Internal* impl_;
};

//// internal.cc

namespace internal {
class Internal {
 public:
  foo() { /* real implementation */ }
 private:
  int field_;
};
}  // namespace internal

Public::foo() { impl_->foo(); }

But I haven't really thought through the implications of that yet, e.g. regarding possible overhead, construction/destruction/lifetime management issues, or any non-obvious corner cases.
Also, having to forward-declare a bunch of internal classes might be fine for a project-specific API like V8's, but seems unfortunate for an implementation-independent API like the wasm-c-api. So I'm not arguing in favor of this particular approach, just putting the idea out there.

Would be a fun challenge to write a production VM in C++ without any UB. I would make any bet that that's outright impossible.

My understanding is that strictly speaking it's even impossible to implement hello-world in C++ without relying on UB, because one has to assume that any program requires a non-zero amount of stack space, which might not be available in some environments, and behavior on stack overflow is undefined.

nlewycky · 2020-10-27T22:48:58Z

There's a straight-forward fix:

make the constructors defaulted and protected
make the destructor protected
remove the overloaded operator delete
add a private destroy() method
change own<T> from unique_ptr<T> to unique_ptr<T, Deleter> where Deleter calls destroy()
make the classes friends with their Deleters so that unique_ptr can delete them but Foo::make()->destroy() doesn't compile (if it did, it would lead to a double-free).

Then implementers can:

create their own implementation classes that public-derive from these classes and add their own additional member variables and methods
define the declared make() method to always create objects of the matching derived type.
define the other methods by static_cast'ing the (this) pointer to the derived type which is always valid since the only way to create one is with the make() method that always creates the derived type anyways
like the other methods destroy() casts to derived type, then calls "delete casted_this;" or calls a derived destroy() which does "delete this;"

Example:
wasm.hh has:

template <typename T>
struct Destroyer {
  void operator()(T *ptr) {
    ptr->destroy();
  }
};

template<class T> using own = std::unique_ptr<T, Destroyer<T>>;

template<class T>
auto make_own(T* x) -> own<T> { return own<T>(x); }

class WASM_API_EXTERN Config {
  friend struct Destroyer<Config>;
  void destroy();

protected:
  Config() = default;
  ~Config();

public:
  static auto make() -> own<Config>;

  // Implementations may provide custom methods for manipulating Configs.
};

and impl-wasm.cc has:

class ImplConfig : public Config {
public:
  ImplConfig() = default;
  ~ImplConfig() {}

  static ImplConfig *from(Config *config) {
    return static_cast<ImplConfig *>(config);
  }

private:
  int impl_state;
};

Config::~Config() {}

auto Config::make() -> own<Config> { return make_own<Config>(new ImplConfig); }

void Config::destroy() { delete ImplConfig::from(this); }

A few comments on this approach:

A unique_ptr<Derived, Deleter<Derived>> doesn't freely cast to a unique_ptr<Base, Deleter<Base>>. I think the easiest fix it to pass the explicit T to make_own when doing a make + downcast combined as shown in my definition of make() above.
With this approach the potential bug of deleting the object of base type is prevented:

void test(Config *config) {
  delete config;
}

dt.cc:52:10: error: calling a protected destructor of class 'Config'
  delete config;
         ^
dt.cc:20:3: note: declared protected here
  ~Config();
  ^

which is necessary if we expect the implementer to extend the size of the object when deriving.

Manual memory management is still possible but obvious in code review.

void ok1() {
  auto x = Config::make();
}

void ok2() {
  auto x = Config::make().release();
  Destroyer<Config>()(x);
}

void error1() {
  Config::make()->destroy();
}

dt.cc:63:19: error: 'destroy' is a private member of 'Config'
  Config::make()->destroy();
                  ^
dt.cc:19:8: note: implicitly declared private here
  void destroy();
       ^

The other main alternative is to use virtual. I assume we're avoiding that?

nlewycky · 2020-10-27T23:22:28Z

This comment serves as something of a footnote. As far as I know, the existing header file can be implemented without UB in C++17. You just have to construct an object with only a single deleted constructor.

No problem.

class WASM_API_EXTERN WasmerConfig : public Config {
public:
  WasmerConfig() : Config{} {}

  static WasmerConfig *from(void *base) {
    return reinterpret_cast<WasmerConfig *>(base);
  }

private:
  // TODO: custom config state
};

Config::~Config() {}

void Config::operator delete(void *ptr) { delete WasmerConfig::from(ptr); }

auto Config::make() -> own<Config> { return make_own(new WasmerConfig); }

// TODO: add custom config calls

Note that the base object initialization used Config{} and not Config() because uniform initialization syntax treats Config as an aggregate and we bypass the deleted constructor by directly initializing its elements (all zero of them).

This is no longer possible in C++20 because classes with constructors (including deleted constructors) are not considered aggregates for the purposes of uniform initialization.

rossberg · 2020-10-28T08:38:18Z

@nlewycky, thanks for that! The only downside I see is a further increase of the boilerplate in the header file's class definitions, but this is C++, so who cares.

I assume an empty deleter struct will not increase the size of the unique_ptr object? (The current C implementation on top of the C++ API does some really verboten reinterpret casts between unique_ptr<T>* and T** to avoid copying vectors. =:-} )

nlewycky · 2020-10-28T20:48:07Z

@rossberg Strictly speaking there's no guarantee about the size or representation of a unique_ptr in any case, but I'd be surprised if you find any production-quality C++ standard library where a unique_ptr with an empty deleter is not simply the pointer. If you were happy with reinterpret_cast before, you should be just as happy afterwards.

Related: https://stackoverflow.com/questions/13460395/how-can-stdunique-ptr-have-no-size-overhead

rossberg · 2020-10-29T06:36:29Z

Can I interest you in creating a PR for making this change? (including adapting the wasm-v8.cc prototype?) :)

nlewycky · 2020-11-10T01:31:36Z

I've mostly implemented it in the draft PR #161. There's a pre-existing bug with Shared<> that I'm not sure what approach to take to address and am looking for input. wasm-v8.cc defines an explicit template specialization for Shared<Module>::~Shared(), which is a problem because no specialization was declared and the destructor (or now destroy() in my approach) may be implicitly instantiated. I know this aspect of C++ templates is not well understood so I'll give a primer:

implicit instantiation. The template is defined generally over some template parameters and you use that pattern. This is what happens when you use std::vector<char> for instance. For implicit instantiations, multiple definitions is not a linker error and it is optional whether the compiler chooses to emit the the symbols to the .o file (for example, it may depend on the optimization level whether a function template is inlined into all callers so now there is no need for a symbol -- this is what leads to most cases of mysterious C++ template linking errors).
explicit template instantiation definition. Emits a definition to the .o file for all the symbols in this template. Two definitions may or may not be a linker error, but the definition must be present. For example (template Shared<int>;).
explicit template instantiation declaration. Added in C++11. Informs the compiler that the template is expected to be provided by another .o file, in conjunction with clang's -Wundefined-var-template and -Wundefined-func-template to prevent those mysterious linking errors. (extern template Shared<int>;).
explicit template specialization definition. Replace the body of a template pattern (a whole class or a function or variable) with a different one for a particular set of template arguments. (template<> struct Shared<int> { /* ... */ };)
explicit template specialization declaration. Declare that there exists an explicit specialization for this particular set of template arguments. (template<> struct Shared<int>;)

(not discussed: partial templates or template templates)

It's incorrect to call an explicit specialization when expecting an instantiation per https://eel.is/c++draft/temp.expl.spec#7.sentence-1 . So this combination:
wasm.hh:

class WASM_API_EXTERN Shared {
public:
  Shared() = delete;
  ~Shared();
  void operator delete(void*);
};

wasm-v8.cc:

template<>
Shared<Module>::~Shared() {
  stats.free(Stats::MODULE, this, Stats::SHARED);
  impl(this)->~vec();
}

is a non-starter. (If it helps, you can imagine that an explicit specialization has a different type signature, so it could be mangled differently, though both popular ABIs chose to mangle them the same which is why it still links.) Any code that might trigger implicit instantiation of the declaration of d'tor/destroy() must know that this is defined with an explicit specialization.

How would we do that? We could forbid C++ implementations from using an explicit specialization. We could tell users that it's normal to have to include a second vendor-specific header file even though the entire interface defined in wasm.hh is portable at the source level. We could declare all the explicit specializations in wasm.hh and require all implementations to define for each specialization? We could replace Shared<T> with a non-template?

rossberg · 2020-11-10T09:02:20Z

Thanks a lot for the help, I'll have a look at your PR soon.

As for Shared, IIRC my intention was that the template specialisation for each sharable type (currently, only Module) be declared in wasm.hh, since the implementation is always going to be type-specific. I just forgot doing that, and the toolchain did not complain. Do you foresee a problem with that path? (The reason for using a template is so that there is a natural, uniform way for writing shared types.)

nlewycky · 2020-11-12T00:11:49Z

I think there might be an issue with that, but we can defer it to another issue if there is. Making a specialization of a struct means that you replace the entire innards of the struct.

template<typename T> class Shared { int x; };
template<> class Shared<char> { using x = char; };

There's no need for the specialization's contents to have any relationship with the template pattern, so the API shown in wasm.hh isn't necessarily the API that the specialization will have. Offhand I'm not sure if there's any special-case rules that come into play if the specialization is declared but not defined and a template pattern is provided, etc.

…m-c-api#119 .

rossberg · 2020-11-12T07:54:45Z

Ah, I may have been using inaccurate terminology above. I meant that the class template specialisation for Shared<Module> is defined in the header, but its methods are merely declared.

My understanding is that a class template specialization that is declared but not defined yields an incomplete type, so is only useful as a forward declaration.

peter-b mentioned this issue Nov 13, 2019

Why the Extern hierarchy? #110

Open

nlewycky mentioned this issue Nov 5, 2020

Make it possible to subclass wasm C++ API classes for the implementor. #161

Merged

nlewycky added a commit to wasmerio/wasmer that referenced this issue Nov 12, 2020

Fix wasm.cc to build against the wasm.hh presently in WebAssembly/was…

4f87051

…m-c-api#119 .

rossberg closed this as completed in #161 Nov 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ opaque handle implementation is based on undefined behaviour #119

C++ opaque handle implementation is based on undefined behaviour #119

TartanLlama commented Nov 13, 2019 •

edited

Loading

jakobkummerow commented Nov 13, 2019

strega-nil commented Nov 13, 2019

peter-b commented Nov 13, 2019

rossberg commented Nov 13, 2019

jakobkummerow commented Nov 13, 2019

TartanLlama commented Nov 13, 2019

rossberg commented Nov 13, 2019

jakobkummerow commented Nov 13, 2019

nlewycky commented Oct 27, 2020 •

edited

Loading

nlewycky commented Oct 27, 2020 •

edited

Loading

rossberg commented Oct 28, 2020

nlewycky commented Oct 28, 2020

rossberg commented Oct 29, 2020 •

edited

Loading

nlewycky commented Nov 10, 2020 •

edited

Loading

rossberg commented Nov 10, 2020

nlewycky commented Nov 12, 2020 •

edited

Loading

rossberg commented Nov 12, 2020

C++ opaque handle implementation is based on undefined behaviour #119

C++ opaque handle implementation is based on undefined behaviour #119

Comments

TartanLlama commented Nov 13, 2019 • edited Loading

jakobkummerow commented Nov 13, 2019

strega-nil commented Nov 13, 2019

peter-b commented Nov 13, 2019

rossberg commented Nov 13, 2019

jakobkummerow commented Nov 13, 2019

TartanLlama commented Nov 13, 2019

rossberg commented Nov 13, 2019

jakobkummerow commented Nov 13, 2019

nlewycky commented Oct 27, 2020 • edited Loading

nlewycky commented Oct 27, 2020 • edited Loading

rossberg commented Oct 28, 2020

nlewycky commented Oct 28, 2020

rossberg commented Oct 29, 2020 • edited Loading

nlewycky commented Nov 10, 2020 • edited Loading

rossberg commented Nov 10, 2020

nlewycky commented Nov 12, 2020 • edited Loading

rossberg commented Nov 12, 2020

TartanLlama commented Nov 13, 2019 •

edited

Loading

nlewycky commented Oct 27, 2020 •

edited

Loading

nlewycky commented Oct 27, 2020 •

edited

Loading

rossberg commented Oct 29, 2020 •

edited

Loading

nlewycky commented Nov 10, 2020 •

edited

Loading

nlewycky commented Nov 12, 2020 •

edited

Loading