Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for newlines, backslashes, trailing comments and unquoted UTF-8 #148

Merged
merged 4 commits into from Dec 5, 2018

Conversation

bbc2
Copy link
Collaborator

@bbc2 bbc2 commented Oct 28, 2018

This adds support for:

This supersedes a previous pull-request, #142, which would add support for
multiline values in Dotenv.parse but not in the CLI (dotenv get and dotenv set).

The internal change is significant but I have added a lot of test cases to reduce the risk of breaking anything. Previous test cases are still present, so I wouldn't expect any major backward incompatibility.

I have written detailed commit messages and made my code as clear as possible. Let me know if anything should be improved. I'll be happy to fix anything unsatisfactory.

@coveralls
Copy link

coveralls commented Oct 28, 2018

Coverage Status

Coverage increased (+0.9%) to 90.244% when pulling 7bfa3d5 on bbc2:improve-parser into 3b7e60e on theskumar:master.

@bbc2 bbc2 force-pushed the improve-parser branch 3 times, most recently from a5f4853 to 46d5a74 Compare October 28, 2018 16:36
@bbc2
Copy link
Collaborator Author

bbc2 commented Oct 28, 2018

I added a test case so that coverage is kept at a high level. 📈

@theskumar
Copy link
Owner

This looks great! I'm happy to merge this in, possible you can make it upto date with master? Let me know so I can hold up any further merges to master.

The conflict seems to be due to 43af2c5

@bbc2
Copy link
Collaborator Author

bbc2 commented Oct 31, 2018

I've just rebased, this should be good to go!

I noticed two small issues when rebasing, which I've fixed and added test cases for:

  • Parsing could fail when a line would only contain a comment.
  • A special character in the name of a variable (it's legitimate but very unusual) would cause the parser to loop forever.

dotenv/main.py Outdated Show resolved Hide resolved
This was also caught by Flake8 as:

    ./dotenv/main.py:19:2: W605 invalid escape sequence '\$'
    ./dotenv/main.py:19:4: W605 invalid escape sequence '\{'
    ./dotenv/main.py:19:8: W605 invalid escape sequence '\}'
    ./dotenv/main.py:19:12: W605 invalid escape sequence '\}'
This avoids the use of the `is_file` class variable by abstracting away
the difference between `StringIO` and a file stream.
Parsing .env files is a critical part of this package.  To make it
easier to change it and test it, it is important that it is done in only
one place.

Also, code that uses the parser now doesn't depend on the fact that each
key-value binding spans exactly one line.  This will make it easier to
handle multiline bindings in the future.
This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
@amancevice
Copy link

Any plans to release soon with this patch included?

@theskumar theskumar merged commit d33366c into theskumar:master Dec 5, 2018
@theskumar
Copy link
Owner

Thanks @bbc2 and others for the patience. I've had some little busy schedule. I'm updating the README etc. and then make a release.

@bbc2 bbc2 deleted the improve-parser branch December 5, 2018 21:13
@mungojam
Copy link

mungojam commented Jan 4, 2019

This was a breaking change for us. We had an entry output = "C:\temp" in our .env file which was then broken by this change as the \t is now interpreted as a tab character. We didn't pick up on it initially because VS Code doesn't apply this new behaviour when parsing the .env file.

@OlegSmelov
Copy link

That's interesting, I thought Bash would interpret \t as a tab character, but it doesn't:

$ a="1\t2"
$ echo $a
1\t2
$ bash --version
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

However, Zsh outputs a tab instead of \t. I guess the "correct" behavior is whatever most other similar libraries are doing.

@mungojam
Copy link

mungojam commented Jan 4, 2019

I've avoided the issue by removing the speech marks now. That seems to deal fine with spaces so I shouldn't really need to use speech marks for file paths

@ChrisC413
Copy link

This was a breaking change for me. I use a local .env file to store a password for local testing, the password contains a #

@bbc2
Copy link
Collaborator Author

bbc2 commented Feb 12, 2019

I'm sorry it broke your use cases. I'll try to fix this soon. My plan:

  • Add test cases to ensure we don't break things again by accident.
  • Update the changelog to notify users that the new version can break some .env files.
  • Open an issue to determine how python-dotenv should parse and interpret those cases in the future.

johnbergvall pushed a commit to johnbergvall/python-dotenv that referenced this pull request Aug 13, 2021
… UTF-8 (theskumar#148)

* Fix deprecation warning for POSIX variable regex

This was also caught by Flake8 as:

    ./dotenv/main.py:19:2: W605 invalid escape sequence '\$'
    ./dotenv/main.py:19:4: W605 invalid escape sequence '\{'
    ./dotenv/main.py:19:8: W605 invalid escape sequence '\}'
    ./dotenv/main.py:19:12: W605 invalid escape sequence '\}'

* Turn get_stream into a context manager

This avoids the use of the `is_file` class variable by abstracting away
the difference between `StringIO` and a file stream.

* Deduplicate parsing code and abstract away lines

Parsing .env files is a critical part of this package.  To make it
easier to change it and test it, it is important that it is done in only
one place.

Also, code that uses the parser now doesn't depend on the fact that each
key-value binding spans exactly one line.  This will make it easier to
handle multiline bindings in the future.

* Parse newline, UTF-8, trailing comment, backslash

This adds support for:

* multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89
* backslashes in values, fixes theskumar#112
* trailing comments, fixes theskumar#141
* UTF-8 in unquoted values, fixes theskumar#147

Parsing is no longer line-based.  That's why `parse_line` was replaced
by `parse_binding`.  Thanks to the previous commit, users of
`parse_stream` don't have to deal with this change.

This supersedes a previous pull-request, theskumar#142, which would add support for
multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv
set`).

The key-value binding regular expression was inspired by
https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30

Parsing of escapes was fixed thanks to
https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants