Complicated regex somehow breaks shed --refactor #92

DRMacIver · 2023-04-28T11:30:36Z

Haven't looked into what's going on beyond to minimize the bug, but the following code:

import re

def tokenize_csv(csv_string: str, delimiter: str = ",", quote_char: str = '"') -> list:
    token_strings = re.split(fr"\{delimiter}(?=(?:(?:\{quote_char})*[^\\{quote_char}]))?", csv_string)
    tokens = [re.sub(fr"{quote_char}(?=(?:(?:\{quote_char})*[^\\{quote_char}]))|\\", "", token) for token in token_strings]
    return tokens

Results in the following error:

Internal error formatting 'shedbug.py': ParserSyntaxError: Syntax Error @ 1:1.
tokenizer error: unterminated string literal

import re
^
    Please report this to https://github.com/Zac-HD/shed/issues

The text was updated successfully, but these errors were encountered:

Zac-HD · 2023-04-28T22:54:18Z

Reproduced with libcst andrf"\\", so moved upstream to Instagram/LibCST#917.

For local mitigation let's handle this as suggested in #93; we can have shed explicitly check if it's a libcst bug and report that if so.

Zac-HD closed this as completed Apr 28, 2023

Zac-HD mentioned this issue Apr 28, 2023

Proposal: shed should use ast.parse to validate code on errors #93

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complicated regex somehow breaks shed --refactor #92

Complicated regex somehow breaks shed --refactor #92

DRMacIver commented Apr 28, 2023

Zac-HD commented Apr 28, 2023

Complicated regex somehow breaks shed --refactor #92

Complicated regex somehow breaks shed --refactor #92

Comments

DRMacIver commented Apr 28, 2023

Zac-HD commented Apr 28, 2023