Add support for newlines, backslashes, trailing comments and unquoted UTF-8 #148

bbc2 · 2018-10-28T14:54:09Z

This adds support for:

multiline values (i.e. containing newlines or escaped \n), fixes Cannot get multiline strings to work? #89
backslashes in values, fixes .env file having '\' does not work properly #112
trailing comments, fixes Parser does not support trailing comments #141
UTF-8 in unquoted values, fixes Non-quoted UTF-8 isn't parsed correctly #147

This supersedes a previous pull-request, #142, which would add support for
multiline values in Dotenv.parse but not in the CLI (dotenv get and dotenv set).

The internal change is significant but I have added a lot of test cases to reduce the risk of breaking anything. Previous test cases are still present, so I wouldn't expect any major backward incompatibility.

I have written detailed commit messages and made my code as clear as possible. Let me know if anything should be improved. I'll be happy to fix anything unsatisfactory.

coveralls · 2018-10-28T14:56:19Z

Coverage increased (+0.9%) to 90.244% when pulling 7bfa3d5 on bbc2:improve-parser into 3b7e60e on theskumar:master.

bbc2 · 2018-10-28T16:40:03Z

I added a test case so that coverage is kept at a high level. 📈

theskumar · 2018-10-31T11:49:06Z

This looks great! I'm happy to merge this in, possible you can make it upto date with master? Let me know so I can hold up any further merges to master.

The conflict seems to be due to 43af2c5

bbc2 · 2018-10-31T14:56:59Z

I've just rebased, this should be good to go!

I noticed two small issues when rebasing, which I've fixed and added test cases for:

Parsing could fail when a line would only contain a comment.
A special character in the name of a variable (it's legitimate but very unusual) would cause the parser to loop forever.

dotenv/main.py

This was also caught by Flake8 as: ./dotenv/main.py:19:2: W605 invalid escape sequence '\$' ./dotenv/main.py:19:4: W605 invalid escape sequence '\{' ./dotenv/main.py:19:8: W605 invalid escape sequence '\}' ./dotenv/main.py:19:12: W605 invalid escape sequence '\}'

This avoids the use of the `is_file` class variable by abstracting away the difference between `StringIO` and a file stream.

Parsing .env files is a critical part of this package. To make it easier to change it and test it, it is important that it is done in only one place. Also, code that uses the parser now doesn't depend on the fact that each key-value binding spans exactly one line. This will make it easier to handle multiline bindings in the future.

This adds support for: * multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89 * backslashes in values, fixes theskumar#112 * trailing comments, fixes theskumar#141 * UTF-8 in unquoted values, fixes theskumar#147 Parsing is no longer line-based. That's why `parse_line` was replaced by `parse_binding`. Thanks to the previous commit, users of `parse_stream` don't have to deal with this change. This supersedes a previous pull-request, theskumar#142, which would add support for multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv set`). The key-value binding regular expression was inspired by https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30 Parsing of escapes was fixed thanks to https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338

amancevice · 2018-12-03T14:18:08Z

Any plans to release soon with this patch included?

theskumar · 2018-12-05T06:56:54Z

Thanks @bbc2 and others for the patience. I've had some little busy schedule. I'm updating the README etc. and then make a release.

mungojam · 2019-01-04T11:30:52Z

This was a breaking change for us. We had an entry output = "C:\temp" in our .env file which was then broken by this change as the \t is now interpreted as a tab character. We didn't pick up on it initially because VS Code doesn't apply this new behaviour when parsing the .env file.

OlegSmelov · 2019-01-04T14:00:15Z

That's interesting, I thought Bash would interpret \t as a tab character, but it doesn't:

$ a="1\t2"
$ echo $a
1\t2

$ bash --version
GNU bash, version 4.4.12(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

However, Zsh outputs a tab instead of \t. I guess the "correct" behavior is whatever most other similar libraries are doing.

mungojam · 2019-01-04T15:31:05Z

I've avoided the issue by removing the speech marks now. That seems to deal fine with spaces so I shouldn't really need to use speech marks for file paths

ChrisC413 · 2019-02-08T16:14:28Z

This was a breaking change for me. I use a local .env file to store a password for local testing, the password contains a #

bbc2 · 2019-02-12T10:12:41Z

I'm sorry it broke your use cases. I'll try to fix this soon. My plan:

Add test cases to ensure we don't break things again by accident.
Update the changelog to notify users that the new version can break some .env files.
Open an issue to determine how python-dotenv should parse and interpret those cases in the future.

… UTF-8 (theskumar#148) * Fix deprecation warning for POSIX variable regex This was also caught by Flake8 as: ./dotenv/main.py:19:2: W605 invalid escape sequence '\$' ./dotenv/main.py:19:4: W605 invalid escape sequence '\{' ./dotenv/main.py:19:8: W605 invalid escape sequence '\}' ./dotenv/main.py:19:12: W605 invalid escape sequence '\}' * Turn get_stream into a context manager This avoids the use of the `is_file` class variable by abstracting away the difference between `StringIO` and a file stream. * Deduplicate parsing code and abstract away lines Parsing .env files is a critical part of this package. To make it easier to change it and test it, it is important that it is done in only one place. Also, code that uses the parser now doesn't depend on the fact that each key-value binding spans exactly one line. This will make it easier to handle multiline bindings in the future. * Parse newline, UTF-8, trailing comment, backslash This adds support for: * multiline values (i.e. containing newlines or escaped \n), fixes theskumar#89 * backslashes in values, fixes theskumar#112 * trailing comments, fixes theskumar#141 * UTF-8 in unquoted values, fixes theskumar#147 Parsing is no longer line-based. That's why `parse_line` was replaced by `parse_binding`. Thanks to the previous commit, users of `parse_stream` don't have to deal with this change. This supersedes a previous pull-request, theskumar#142, which would add support for multiline values in `Dotenv.parse` but not in the CLI (`dotenv get` and `dotenv set`). The key-value binding regular expression was inspired by https://github.com/bkeepers/dotenv/blob/d749366b6009126b115fb7b63e0509566365859a/lib/dotenv/parser.rb#L14-L30 Parsing of escapes was fixed thanks to https://stackoverflow.com/questions/4020539/process-escape-sequences-in-a-string-in-python/24519338#24519338

bbc2 force-pushed the improve-parser branch 3 times, most recently from a5f4853 to 46d5a74 Compare October 28, 2018 16:36

theskumar added enhancement help wanted labels Oct 31, 2018

theskumar mentioned this pull request Oct 31, 2018

Allow multi-line ENV values #142

Closed

bbc2 force-pushed the improve-parser branch from 46d5a74 to 4f29088 Compare October 31, 2018 14:50

robinmitra mentioned this pull request Nov 5, 2018

PMI-753: Read Google auth credentials from environment variables alphagov/verify-performance-scripts#12

Merged

OlegSmelov reviewed Nov 14, 2018

View reviewed changes

dotenv/main.py Outdated Show resolved Hide resolved

bbc2 added 4 commits November 14, 2018 21:18

Turn get_stream into a context manager

950a0e7

This avoids the use of the `is_file` class variable by abstracting away the difference between `StringIO` and a file stream.

bbc2 force-pushed the improve-parser branch from 4f29088 to 7bfa3d5 Compare November 14, 2018 20:19

theskumar merged commit d33366c into theskumar:master Dec 5, 2018

bbc2 deleted the improve-parser branch December 5, 2018 21:13

bbc2 mentioned this pull request Dec 28, 2018

Option for Docker compatibility #92

Closed

bbc2 mentioned this pull request Feb 16, 2019

Breaking changes in 0.10.0 and future changes #170

Closed

bbc2 mentioned this pull request May 21, 2019

Refactor parser to fix inconsistencies #180

Merged

vorpal-buildbot mentioned this pull request Jul 27, 2020

Scheduled weekly dependency update for week 30 PennyDreadfulMTG/Penny-Dreadful-Tools#7568

Closed

vorpal-buildbot mentioned this pull request Aug 3, 2020

Scheduled weekly dependency update for week 31 PennyDreadfulMTG/Penny-Dreadful-Tools#7600

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for newlines, backslashes, trailing comments and unquoted UTF-8 #148

Add support for newlines, backslashes, trailing comments and unquoted UTF-8 #148

bbc2 commented Oct 28, 2018

coveralls commented Oct 28, 2018 •

edited

bbc2 commented Oct 28, 2018

theskumar commented Oct 31, 2018

bbc2 commented Oct 31, 2018

amancevice commented Dec 3, 2018

theskumar commented Dec 5, 2018

mungojam commented Jan 4, 2019 •

edited

OlegSmelov commented Jan 4, 2019

mungojam commented Jan 4, 2019

ChrisC413 commented Feb 8, 2019

bbc2 commented Feb 12, 2019

Add support for newlines, backslashes, trailing comments and unquoted UTF-8 #148

Add support for newlines, backslashes, trailing comments and unquoted UTF-8 #148

Conversation

bbc2 commented Oct 28, 2018

coveralls commented Oct 28, 2018 • edited

bbc2 commented Oct 28, 2018

theskumar commented Oct 31, 2018

bbc2 commented Oct 31, 2018

amancevice commented Dec 3, 2018

theskumar commented Dec 5, 2018

mungojam commented Jan 4, 2019 • edited

OlegSmelov commented Jan 4, 2019

mungojam commented Jan 4, 2019

ChrisC413 commented Feb 8, 2019

bbc2 commented Feb 12, 2019

coveralls commented Oct 28, 2018 •

edited

mungojam commented Jan 4, 2019 •

edited