Description
Let there be a string: null\0
.
When code containing it is compiled to CIR, it'll turn into something like this (emitting less important parts):
cir.global <...> = #cir.const_array<"null" : !cir.array<!s8i x 4>, trailing_zeros> : !cir.array<!s8i x 6> ...
.
Notice that const_array
's type length is shorter than that of global
it's contained within.
As this global is eventually used, get_global
will state the type as !cir.array<!s8i x 6>
.
Compiler doesn't throw any warnings or errors through all of this.
However, the issue rises when the compiled file is later attempted to be parsed by mlir
tools, namely mlir::parseSourceFile
. Instead of a reference to an object representing parsed code, it returns nullptr
, and writes the next text to stderr
:
error: 'cir.get_global' op result type pointee type ''!cir.array<!cir.int<s, 8> x 6>'' does not match type '!cir.array<!cir.int<s, 8> x 4>' of the global @.str
.
Steps to reproduce:
- Create a
.c
-file with the following code pasted in:
const char *funnyThing() {
return "null\0";
}
-
Compile it to ClangIR:
clang -S -Xclang -emit-cir-flat <just created file>.c
-
Create another program that tries to parse CIR files:
#include <clang/CIR/Dialect/IR/CIRDialect.h>
#include <mlir/Parser/Parser.h>
int main(int argc, char *argv[]) {
mlir::MLIRContext context;
mlir::DialectRegistry registry;
registry.insert<cir::CIRDialect>();
context.appendDialectRegistry(registry);
context.allowUnregisteredDialects();
mlir::ParserConfig parseConfig(&context);
auto module =
mlir::parseSourceFile<mlir::ModuleOp>("<compiled file>", parseConfig);
if (module.get() == nullptr) {
return -1;
}
return 0;
}
- Compile and run it
Expected behaviour:
The program exits successfully.
Actual behaviour:
The program exits with a return code of -1
and the aforementioned error text message in console.
Additional notes:
-
The same happens if you initialise a
char
-array with a string literal of shorter length:
char notNull[25] = "null";
-
Everything is parsed just fine if the null-char is placed in the middle of the string, as in:
null\0null
-
This behaviour was observed on a slightly modified clangir repo for our own needs, but none of the original source code was explicitly changed. It is, though, behind upstream.