Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect handling of backslashes in dict keys #280

Open
AlexMedia opened this issue Jan 8, 2020 · 2 comments
Open

Incorrect handling of backslashes in dict keys #280

AlexMedia opened this issue Jan 8, 2020 · 2 comments
Labels
component: encoder Related to serialising in `toml.dump` syntax: strings Related to string literals type: bug A confirmed bug or unintended behavior

Comments

@AlexMedia
Copy link

Using toml 0.10.0 installed via pip, I run into some issues when dealing with dictionaries with backslashes in the keys.

I've used this snippet of code to create the initial TOML file:

import toml

obj = {'a\\': {'key': 'value'}}
with open("target.toml", "w") as f:
    toml.dump(obj, f)

Which produces the following TOML file:

["a\"]
key = "value"

When I try to read the file again, using the following snippet of Python...

import toml
with open("target.toml", "r") as r:
    toml.load(r)

I get this error:

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/usr/local/lib/python3.7/site-packages/toml/decoder.py", line 134, in load
    return loads(f.read(), _dict, decoder)
  File "/usr/local/lib/python3.7/site-packages/toml/decoder.py", line 301, in loads
    raise TomlDecodeError("Unbalanced quotes", original, i)
toml.decoder.TomlDecodeError: Unbalanced quotes (line 1 column 7 char 6)

According to the TOML spec, the issue is that the backslash in the key ("a") has not been escaped properly when the TOML file was written. This in turn causes the decoder to treat the quote character following the backslash as part of the value and not as the end of the name.

If I manually escape the backslash in the TOML file, a second issue pops up when reading the file:

import toml
with open("target.toml", "r") as r:
    obj = toml.load(r)

for k in obj.keys():
    print(k)

This leads to the following output being printed:
a\\

As you can see, the double backslash in the TOML file has been treated as two separate characters. This should have been considered as a single backslash.

dmerejkowsky added a commit to your-tools/tbump that referenced this issue Feb 4, 2020
We were hit by this bug: uiri/toml#280

Since tomlkit is a dependency of poetry, it's less likely to
happen again.

This fixes tests on Windows, which involve the string `python\3.7.6\x64\python.exe`
@dmerejkowsky
Copy link

Got hit by the same bug. If you're looking for an alternative, you mak want to take a look at https://pypi.org/project/tomlkit/

dmerejkowsky added a commit to your-tools/tbump that referenced this issue Feb 4, 2020
We were hit by this bug: uiri/toml#280

Since tomlkit is a dependency of poetry, it's less likely to
happen again.

This fixes tests on Windows, which involved the string `python\3.7.6\x64\python.exe`

Note that tomlkit use its own clases for the contents of the TOML file, so

* we have to be more careful when modifying the contents in place
  (as in the hook tests)

* we can no longer put the regex directly into the parsed contents, and must
  do it when we instantiate the Config object
dmerejkowsky added a commit to your-tools/tbump that referenced this issue Feb 4, 2020
We were hit by this bug: uiri/toml#280

Since tomlkit is a dependency of poetry, it's less likely to
happen again.

This fixes tests on Windows, which involved the string `python\3.7.6\x64\python.exe`

Note that tomlkit use its own clases for the contents of the TOML file, so

* we have to be more careful when modifying the contents in place
  (as in the hook tests)

* we can no longer put the regex directly into the parsed contents, and must
  do it when we instantiate the Config object
@cur33
Copy link

cur33 commented Feb 5, 2020

@dmerejkowsky I got hit by this too; I was going to workaround it, but I see that tomlkit is style-preserving, which I was also about to work around using toml. If tomlkit is stable enough, that will likely save me a lot of time. Thanks for the recommendation!

facebook-github-bot pushed a commit to facebook/sapling that referenced this issue Apr 21, 2020
Summary:
On Windows, paths components are usually separated by '\', and since the
repository path is stored in a toml file, whatever character is after a '\',
will be escaped. In my case, this is followed by U (for C:\Users), and thus
toml expects the next characters to be an escaped unicode. That's obviously
not the case and thus EdenFS fails to parse the config, preventing me from
cloning fbsource.

Since Windows is perfectly fine with '/' as path separator, let's just
replace '\' with '/'.

The underlying bug appears to be in the toml Python code: uiri/toml#280

Manually trying some random path is pretty conclusive:
  (Pdb) toml.dumps({'foo': 'c:\\Users\\wez'})
  'foo = "c:\\\\Users\\\\wez"\n'
  (Pdb) toml.dumps({'foo': 'c:\\Users\\xavier'})
  'foo = "c:\\Users\\xavier"\n'

Reviewed By: chadaustin

Differential Revision: D21143545

fbshipit-source-id: 448471da12c253dd37680f6a28251a1e69850920
@pradyunsg pradyunsg added component: encoder Related to serialising in `toml.dump` type: bug A confirmed bug or unintended behavior syntax: strings Related to string literals labels Apr 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: encoder Related to serialising in `toml.dump` syntax: strings Related to string literals type: bug A confirmed bug or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants