Skip to content

Three bugs in yytables_fload (binary tables-file loader) #730

@MarkLee131

Description

@MarkLee131

Found these in src/cpp-flex.skl while writing a libFuzzer harness against the runtime tables loader. All three trigger from yytables_fload(fp, scanner) (the public API a consumer calls when %option tables-file= is used). Reproduced against the unmodified loader in master @ 13c8018; same code is present in 2.6.4 unchanged.

Two heap-OOB reads and one integer-driven multi-GB allocation. ASAN traces below; PoC bytes are at the bottom of this issue.

1. heap-OOB read in strlen (yytbl_hdr_read)

src/cpp-flex.skl:4010

th->th_name = th->th_version + strlen (th->th_version) + 1;

th_version was just fread'd (line 4002) into a yyalloc(bytes) buffer (line 3997), where bytes = th_hsize - 14 is set at line 3996. Nothing guarantees those bytes contain a NUL. The header sanity check at line 3990 only requires hsize >= 16, so the smallest payload is 2 bytes. With those 2 bytes set to 'A','A', strlen walks past the heap allocation.

ASAN with a 16-byte input file:

==58898==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000d2
READ of size 3 at 0x6020000000d2 thread T0
    #0 strlen
    #1 cl_tables_fload  lex.cl_.c:2701
    #2 main             poc_strlen_oob.c:101

0x6020000000d2 is located 0 bytes after 2-byte region
allocated by thread T0 here:
    #0 malloc
    #1 cl_tables_fload  lex.cl_.c:2701

(The lex.cl_.c:2701 frame is the public cl_tables_fload after Apple clang inlined yytbl_hdr_read and yytbl_fload. Internal frames separate at -O0 -fno-inline.)

2. heap-OOB read in strcmp (yytbl_fload)

src/cpp-flex.skl:4278

if (strcmp(th.th_name, key) != 0) {

Different root cause from #1. Even when the version section IS NUL-terminated, th_name is set unconditionally to th_version + strlen + 1. If the version's NUL lands at the LAST byte of the buffer, th_name points just past the heap region, and strcmp walks past it. PoC: 16-byte input where hsize=16, payload is 'A','\0'. ASAN reports READ of size 1 past a 2-byte region, attributed to cl_tables_fload.

The fix for #1 and #2 is the same shape: validate that th_version and th_name each find a NUL inside the buffer. Roughly:

size_t vlen = strnlen(th_version, bytes);
if (vlen >= bytes - 1) { yyfree(th_version,...); return -1; }
size_t nlen = strnlen(th_version + vlen + 1, bytes - vlen - 1);
if (nlen >= bytes - vlen - 1) { yyfree(th_version,...); return -1; }
th_name = th_version + vlen + 1;

3. integer-driven multi-GB allocation (yytbl_data_load)

src/cpp-flex.skl:4070-4072

if ((td.td_flags & YYTD_STRUCT)) {
    bytes = sizeof(struct yy_trans_info) * td.td_lolen * (td.td_hilen ? td.td_hilen : 1);
} else {
    bytes = td.td_lolen * (td.td_hilen ? td.td_hilen : 1) * dmap->dm_sz;
}
... yyalloc(bytes, ...);

td_lolen and td_hilen are flex_uint32_t read straight from the file, no validation. A 50-byte tables file with td_lolen = 0xF1238000 makes the loader call malloc(8091271168) (7.54 GB) for one record. On any consumer with realistic memory limits (containers, embedded, mobile) this is fatal: either the OOM-killer takes the process or malloc returns NULL and YY_FATAL_ERROR("out of dynamic memory in yytbl_data_load()") calls exit().

There's already a precedent fix for the build-time tool's allocators in commit 9c54eb6 build: detect overflow for [re]allocate_array. The runtime loader was never given the same guard. A two-line clamp before the multiplication would mirror that commit.

Reproduction

PoCs are standalone C main() programs that link the unmodified lex.cl_.c from flex --tables-file=cl.tab sample.l:

clang -g -O1 -fno-omit-frame-pointer \
      -fsanitize=address,undefined -fno-sanitize-recover=undefined \
      poc_strlen_oob.c lex.cl_.c -o poc_strlen_oob
ASAN_OPTIONS=detect_leaks=0:halt_on_error=1 ./poc_strlen_oob

All three reproduce in <10ms through the public cl_tables_fload (≡ yytables_fload). PoC files attached. Affects only scanners generated with %option tables-file= or --tables-file=.

Suggested fixes

  1. and 2.: replace strlen with bounded strnlen and verify both NULs fall inside the buffer (snippet above).
  2. clamp td_lolen and td_hilen to a sane upper bound before the multiplication, in the same spirit as 9c54eb6. Something like if (td_lolen > YYTBL_MAX_DIM || td_hilen > YYTBL_MAX_DIM) return -1; with YYTBL_MAX_DIM set to whatever real scanners actually need (a few million is more than enough; 32k DFA-states is the documented limit).

Happy to send a PR covering all three if you'd like.

Found via OSS-Fuzz-shaped libFuzzer harness, expanding what the upstream OSS-Fuzz projects/flex/ integration covers (the upstream flex-patch.diff neuters the m4 fork, which incidentally also keeps the loader off the fuzzed code path; that's why none of these have surfaced from continuous fuzzing).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions