Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCaml C stub bugfixes #4912

Merged

Conversation

edwintorok
Copy link
Contributor

Fixes a race condition introduced in 40b755b6d, but also some other long standing bugs (wrong number of arguments to C stubs, incorrect O_DIRECT setting).
It also prepares the C stubs for OCaml 5.0 compatibility by removing some global state (although I don't claim this to be fully OCaml 5.0 ready yet).
There are some '[maintenance]' commits as well to make debugging better by preserving errno in a Unix_error instead of losing it with 'failwith', and better editor integration for C stubs (write out a compile_flags.txt with paths to headers/etc. so that clang can act as a language server, just like ocaml-lsp would for OCaml file).

There is also a static analyzer to check that these particular race conditions got fixed everywhere, but that depends on a newer version of GCC, and will come in a separate PR.

edwintorok and others added 9 commits February 16, 2023 15:52
compile_flags.txt (or compile_commands.json) can be used by C language servers,
like 'clangd' to provide in-editor feedback about C code compile
warnings/errors (similar to ocaml-lsp).

By default it wouldn't know where to find the OCaml includes and show a lot of
warnings/errors when editing C stubs, and with this file it works.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
…n and unshadow

OCaml passes Val_unit for calls with '()', see the manual.

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
See the manual https://v2.ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code
"After caml_release_runtime_system() was called and until
caml_acquire_runtime_system() is called, the C code must not access any OCaml
data, nor call any function of the run-time system, nor call back into OCaml
code."

_H was previously just a cast, and although it didn't comply with the 'no naked
pointers rule', it did comply with the above rule.
(no naked pointers is only active by default in OCaml 5+)

The commit that made to code compatible with 'no naked pointers mode' has thus
introduced a read race condition: the OCaml GC may have moved the value in
another thread, where the code in the current thread might've already stored
the value in a register, and thus it'd derefence a pointer to something
potentially entirely different (or crash if memory has been compacted and
deallocated).
That race condition is not visible by reviewing the diff of the commit though.

To find more remaining race conditions like this and to avoid reintroducing
this bug in the future I started developing a static analyzer to look for this
problem.

It fixes the following problems:
```
[Error][Race] DomainLock: must be held when dereferencing OCaml value xch (ocaml/xenopsd/c_stubs/xenctrlext_stubs.c:271:9-271:184)
[Error][Race] DomainLock: must be held when dereferencing OCaml value xch (ocaml/xenopsd/c_stubs/xenctrlext_stubs.c:271:9-271:184)
[Error][Race] DomainLock: must be held when dereferencing OCaml value xch (ocaml/xenopsd/c_stubs/xenctrlext_stubs.c:299:9-299:134)
[Error][Race] DomainLock: must be held when dereferencing OCaml value xch (ocaml/xenopsd/c_stubs/xenctrlext_stubs.c:311:9-311:106)
[Error][Race] DomainLock: must be held when dereferencing OCaml value xch (ocaml/xenopsd/c_stubs/xenctrlext_stubs.c:313:9-313:110)
[Error][Race] DomainLock: must be held when dereferencing OCaml value xch (ocaml/xenopsd/c_stubs/xenctrlext_stubs.c:231:2-231:90)
[Error][Race] DomainLock: must be held when dereferencing OCaml value xch (ocaml/xenopsd/c_stubs/xenctrlext_stubs.c:259:9-259:146)
```

Replace _H with xch_of_val to ensure we get a build failure should anything still use
it (e.g. inside an #ifdef we missed).
This mirrors the changes done in Xen.

Fixes: 40b755b6d ("xenctrlext: fix race conditions")

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
…pport.h

There is no need to declare it by hand (it also has a slightly different
prototype taking a 'const char*' as argument).

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
If a file descriptor is open in the normal way then its flags will be 0,
and the code here that attempts to enable O_DIRECT will do nothing.

This appears to have been an incorrect way of inverting an error condition
`ret < 0`, the negation of which is `ret >= 0`, and not `ret > 0`.

We open the file with O_DIRECT elsewhere though (in vhd-format), but fix this
code to avoid confusion.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Edwin Török <edvin.torok@citrix.com>
…use just errno which is

It is usually set to XC_INTERNAL_ERROR, except in xenguest, but we have no
bindings for that.

This API might get deleted in upstream Xen too: the error code is stored in the
xenctrl handle, but we made that global per process, and shared among all
threads. Which means that xenctrl calls from different threads can overwrite
each-others' error code.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
This is a C call, and might benefit from a bit more parallelism if it inside
the blocking section.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
@edwintorok edwintorok merged commit e7358c9 into xapi-project:master Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants