Skip to content

Surprising tokenization of f-strings #135251

Closed as not planned
Closed as not planned
@nedbat

Description

@nedbat

Bug report

Bug description:

Tokenizing an f-string with double braces produces tokens with single braces:

import tokenize, token

TEXT = b"f'{hello:.23f} this: {{braces}} done'"
f = iter([TEXT]).__next__

for ty, st, _, _, _ in tokenize.tokenize(f):
    print(f"{token.tok_name[ty]}, {st!r}")

Running this with 3.12 shows:

ENCODING, 'utf-8'
FSTRING_START, "f'"
OP, '{'
NAME, 'hello'
OP, ':'
FSTRING_MIDDLE, '.23f'
OP, '}'
FSTRING_MIDDLE, ' this: {'
FSTRING_MIDDLE, 'braces}'
FSTRING_MIDDLE, ' done'
FSTRING_END, "'"
NEWLINE, ''
ENDMARKER, ''

Should the FSTRING_MIDDLE tokens have single braces? Will it stay this way? Are they guaranteed to be split at the braces as shown here, or might they become one FSTRING_MIDDLE token ' this: {braces} done'? To recreate the original source, is it safe to always double the braces found in an FSTRING_MIDDLE token, or are there edge cases I haven't thought of?

Related to nedbat/coveragepy#1980

CPython versions tested on:

3.12, 3.13, 3.14, CPython main branch

Operating systems tested on:

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-parsertype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions