Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mystery solved! A fix for the infamous 5 year old \x bug, that is driving users away. #422

Closed
JamesParrott opened this issue Apr 26, 2023 · 1 comment

Comments

@JamesParrott
Copy link

JamesParrott commented Apr 26, 2023

The test function test_bug_148 in /tests/test_api.py was incorrectly coded five years ago [^1]. This has prevented /uiri/toml's automated testing and CI from showing that the original fix for bug 148 was also incorrect (a notoriously difficult to understand block of code in encoder._dump_str) .

Guys, it's about time we put that right!

Over the last 5 years, numerous frustrated users have raised multiple separate issues to no avail [^3], which unbeknownst to them, related to this unfixed bug, which this flawed test enabled to persist undetected. Furthermore, a couple of projects [^4] have migrated to tomli-w. And the core Python developers decided against /uiri/toml, to read TOML natively in Python 3.11.

[^1] This gist of mine contains two test files that prove this, test_bug_148_original.py and test_bug_148_fixed.py

Firstly, in test_bug_148_original.py, the original test function has been abstracted into a factory returning a function
that simply applies that same test to any module containing a dumps function. It also contains a second factory, that
uses exactly the same string literals (and dictionaries) as in the original test_bug_148, but applies toml.loads to the
LHSs instead of toml.dumps to the RHSs of the three equations. [^5]

test_bug_148_original.py enables a fair comparison between toml, tomli-w, tomlkit and toml_tools using
the original test function test_bug_148. Running test_bug_148_original.py with pytest in a venv (with all the deps installed), one test passes, but all 7 other tests fail (8 in >= Python 3.11). toml passes its own test as would be expected. But not only do Python 3.11's native tomllib, its precursor tomli, tomli-w, tomlkit, and toml_tools (and my own fork of toml) all fail it, toml.loads fails the inverse version of this same test, that only toml.dumps passes.

I assert test_bug_148 was incorrectly coded and hope you agree. But how should it be fixed, or what should replace it?

Inside the second file in the gist, test_bug_148_correctly.py the code is much the same, but contains what I believe to be the correct string literals, to achieve what /uiri originally intended to test (that a Python string literal r'\x' is encoded in a TOML
quoted string literal r'\x's, even when (and only when) an even number of backslashes precedes it in its repr string). This
not only allows a fair comparison between toml, tomli/tomllib, tomlkit and toml_tools as before, but between toml.dumps and toml.loads.

Running this we find that 7 tests pass (8 in >= Python 3.11), and only toml.dumps fails its test function from the new factory (for encoders).

Who doesn't love the colourised output from pytest?:

image

Same results in Ubuntu 22.04 WSL, Python 3.10:
image

I believe this was an honest mistake on the part of /uiri . Considering how they chose to write the original bug fix by analysing strings almost manually, instead of using a regular expression, and how they used normal string literals instead of r-string literals containing half the number of \s in test_bug_148, perhaps their test function was a victim of the backslash plague.

I will submit PRs shortly that correct these issues, one containing test functions from the factories in test_bug_148_correctly.py and one containing the fix I wrote for my fork (toml_tools -I have my own specific requirements (Iron Python) that mean I do need my own fork of toml).

[^3] These issues include:
#411
#404
#261
#201
and more

[^4] Projects that have migrated from toml to tomli-w et al:
davidfokkema/tailor@ece266a
gramineproject/gramine@5619486
remarshal-project/remarshal#19 (comment)
dictation-toolbox/Caster@0ba75e1

[^5] Coding note/ apology. I could've defined the string literals as constants, to guaranteed the second factory uses the same values as the first. However I wanted to demonstrate to anyone reading that it contains exactly the same code as in /toml/tests/tests/test_api.py:test_bug_148

FYI, in your own code, using pytest.mark.parametrize would achieve the same goal more concisely. But similarly, I wanted to be transparent, and to produce understandable test results in the command line output via the test functions' names.

@JamesParrott
Copy link
Author

Code from the gists for inspection, so you don't have to download random untrusted .py files:
test_bug_148_original.py

r"""  Commands to reproduce:
python -m venv test_bug_148
Posix: source test_bug_148/bin.activate.sh
Windows: .\test_bug_148\Scripts\activate.bat
pip install pytest toml tomli tomli-w tomlkit toml_tools
pytest --tb=no -rA -q test_bug_148_original.py

Expected on Windows 10 Python 3.12:
F.FFFFFFF                                                                                                                                                          [100%]
======================================================================== short test summary info ======================================================================== 
PASSED test_bug_148_original.py::test_bug_148_toml
FAILED test_bug_148_original.py::test_inverse_bug_148_toml - AssertionError: assert {'a': 'd'} == {'a': '\\x64'}
FAILED test_bug_148_original.py::test_inverse_bug_148_tomllib - AssertionError: assert {'a': 'd'} == {'a': '\\x64'}
FAILED test_bug_148_original.py::test_inverse_bug_148_tomli - AssertionError: assert {'a': 'd'} == {'a': '\\x64'}
FAILED test_bug_148_original.py::test_inverse_bug_148_tomlkit - AssertionError: assert {'a': 'd'} == {'a': '\\x64'}
FAILED test_bug_148_original.py::test_inverse_bug_148_toml_tools - AssertionError: assert {'a': 'd'} == {'a': '\\x64'}
FAILED test_bug_148_original.py::test_bug_148_tomli_w - assert 'a = "\\u0064"\n' == 'a = "\\\\x64"\n'
FAILED test_bug_148_original.py::test_bug_148_tomlkit - assert 'a = "\\u0064"\n' == 'a = "\\\\x64"\n'
FAILED test_bug_148_original.py::test_bug_148_toml_tools - assert 'a = "\\u0064"\n' == 'a = "\\\\x64"\n'
8 failed, 1 passed in 0.11s


"""

import sys
if sys.version_info >= (3,11):
    import tomllib


import tomli
import tomli_w
import tomlkit
import toml
import toml_tools


def make_bug_148_test_function(toml_module):
    def test_bug_148():
        assert 'a = "\\u0064"\n' == toml_module.dumps({'a': '\\x64'})
        assert 'a = "\\\\x64"\n' == toml_module.dumps({'a': '\\\\x64'})
        assert 'a = "\\\\\\u0064"\n' == toml_module.dumps({'a': '\\\\\\x64'})
    return test_bug_148



def make_inverse_bug_148_test_function(toml_module):
    def test_bug_148_inverse():
        assert toml_module.loads('a = "\\u0064"\n') == {'a': '\\x64'}
        assert toml_module.loads('a = "\\\\x64"\n') == {'a': '\\\\x64'}
        assert toml_module.loads('a = "\\\\\\u0064"\n') == {'a': '\\\\\\x64'}
    return test_bug_148_inverse



test_inverse_bug_148_toml = make_inverse_bug_148_test_function(toml)
test_bug_148_toml = make_bug_148_test_function(toml)

# native in Python 3.11
try:
    test_inverse_bug_148_tomllib = make_inverse_bug_148_test_function(tomllib)
except NameError:
    pass

test_inverse_bug_148_tomli = make_inverse_bug_148_test_function(tomli)
test_inverse_bug_148_tomlkit = make_inverse_bug_148_test_function(tomlkit)
test_inverse_bug_148_toml_tools = make_inverse_bug_148_test_function(toml_tools)

test_bug_148_tomli_w = make_bug_148_test_function(tomli_w)
test_bug_148_tomlkit = make_bug_148_test_function(tomlkit)
test_bug_148_toml_tools = make_bug_148_test_function(toml_tools)

test_bug_148_fixed.py

r"""  Commands to reproduce:
python -m venv test_bug_148
Posix: source test_bug_148/bin.activate.sh
Windows: .\test_bug_148\Scripts\activate.bat
pip install pytest toml tomli tomli-w tomlkit toml_tools
pytest --tb=no -rA -q test_bug_148_fixed.py

Expected on Windows 10 Python 3.12:
.F.......                                                                                                                                                          [100%]
======================================================================== short test summary info ========================================================================
PASSED test_bug_148_correctly.py::test_inverse_bug_148_toml
PASSED test_bug_148_correctly.py::test_inverse_bug_148_tomllib
PASSED test_bug_148_correctly.py::test_inverse_bug_148_tomli
PASSED test_bug_148_correctly.py::test_inverse_bug_148_tomlkit
PASSED test_bug_148_correctly.py::test_inverse_bug_148_toml_tools
PASSED test_bug_148_correctly.py::test_bug_148_tomli_w
PASSED test_bug_148_correctly.py::test_bug_148_tomlkit
PASSED test_bug_148_correctly.py::test_bug_148_toml_tools
FAILED test_bug_148_correctly.py::test_bug_148_toml - assert 'a = "\\\\x64"\n' == 'a = "\\u0064"\n'
1 failed, 8 passed in 0.13s
"""


import sys
if sys.version_info >= (3,11):
    import tomllib


import tomli
import tomli_w
import tomlkit
import toml
import toml_tools


def make_correct_bug_148_test_function(toml_module):
    def test_bug_148():
        assert r'a = "\\x64"' + '\n' == toml_module.dumps({'a': r'\x64'})
        assert r'a = "\\\\x64"' + '\n' == toml_module.dumps({'a': r'\\x64'})
        assert r'a = "\\\\\\x64"' + '\n' == toml_module.dumps({'a': r'\\\x64'})

    # original from  
    #     assert 'a = "\\u0064"\n' == toml_module.dumps({'a': '\\x64'})
    #     assert 'a = "\\\\x64"\n' == toml_module.dumps({'a': '\\\\x64'})
    #     assert 'a = "\\\\\\u0064"\n' == toml_module.dumps({'a': '\\\\\\x64'})
    return test_bug_148



def make_correct_inverse_bug_148_test_function(toml_module):
    def test_bug_148_inverse():
        assert toml_module.loads(r'a = "\\x64"' + '\n') == {'a': r'\x64'}
        assert toml_module.loads(r'a = "\\\\x64"' + '\n') == {'a': r'\\x64'}
        assert toml_module.loads(r'a = "\\\\\\x64"' + '\n') == {'a': r'\\\x64'}
    return test_bug_148_inverse



test_inverse_bug_148_toml = make_correct_inverse_bug_148_test_function(toml)
test_bug_148_toml = make_correct_bug_148_test_function(toml)

# native in Python 3.11
try:
    test_inverse_bug_148_tomllib = make_correct_inverse_bug_148_test_function(tomllib)
except NameError:
    pass

test_inverse_bug_148_tomli = make_correct_inverse_bug_148_test_function(tomli)
test_inverse_bug_148_tomlkit = make_correct_inverse_bug_148_test_function(tomlkit)
test_inverse_bug_148_toml_tools = make_correct_inverse_bug_148_test_function(toml_tools)

test_bug_148_tomli_w = make_correct_bug_148_test_function(tomli_w)
test_bug_148_tomlkit = make_correct_bug_148_test_function(tomlkit)
test_bug_148_toml_tools = make_correct_bug_148_test_function(toml_tools)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant