Skip to content

Improper handling of strings explicitly ending with null #1538

Open
@DanielELog

Description

@DanielELog

Let there be a string: null\0.
When code containing it is compiled to CIR, it'll turn into something like this (emitting less important parts):
cir.global <...> = #cir.const_array<"null" : !cir.array<!s8i x 4>, trailing_zeros> : !cir.array<!s8i x 6> ....
Notice that const_array's type length is shorter than that of global it's contained within.
As this global is eventually used, get_global will state the type as !cir.array<!s8i x 6>.
Compiler doesn't throw any warnings or errors through all of this.

However, the issue rises when the compiled file is later attempted to be parsed by mlir tools, namely mlir::parseSourceFile. Instead of a reference to an object representing parsed code, it returns nullptr, and writes the next text to stderr:
error: 'cir.get_global' op result type pointee type ''!cir.array<!cir.int<s, 8> x 6>'' does not match type '!cir.array<!cir.int<s, 8> x 4>' of the global @.str.

Steps to reproduce:

  • Create a .c-file with the following code pasted in:
const char *funnyThing() {
  return "null\0";
}
  • Compile it to ClangIR:
    clang -S -Xclang -emit-cir-flat <just created file>.c

  • Create another program that tries to parse CIR files:

#include <clang/CIR/Dialect/IR/CIRDialect.h>
#include <mlir/Parser/Parser.h>

int main(int argc, char *argv[]) {
  mlir::MLIRContext context;
  mlir::DialectRegistry registry;
  registry.insert<cir::CIRDialect>();
  context.appendDialectRegistry(registry);
  context.allowUnregisteredDialects();

  mlir::ParserConfig parseConfig(&context);
  auto module =
      mlir::parseSourceFile<mlir::ModuleOp>("<compiled file>", parseConfig);
  if (module.get() == nullptr) {
    return -1;
  }
  return 0;
}
  • Compile and run it

Expected behaviour:

The program exits successfully.

Actual behaviour:

The program exits with a return code of -1 and the aforementioned error text message in console.

Additional notes:

  • The same happens if you initialise a char-array with a string literal of shorter length:
    char notNull[25] = "null";

  • Everything is parsed just fine if the null-char is placed in the middle of the string, as in: null\0null

  • This behaviour was observed on a slightly modified clangir repo for our own needs, but none of the original source code was explicitly changed. It is, though, behind upstream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions