Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix "Error: invalid byte sequence in UTF-8" error when retrieving catalogues with invalid encoding from PuppetDB on Puppet 3 server #38

Merged
merged 12 commits into from Oct 14, 2020
Merged
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
9 changes: 9 additions & 0 deletions lib/puppet/catalog-diff/comparer.rb
Expand Up @@ -138,6 +138,15 @@ def str_diff(cont1, cont2)
sum2 = Digest::MD5.hexdigest(str2)
end

unless str1.valid_encoding?
Puppet::debug("content parameter in old resource has invalid #{str1.encoding} encoding.")
str1.encode!('UTF-8', 'UTF-8', :invalid => :replace)
end
unless str2.valid_encoding?
Puppet::debug("content parameter in new resource has invalid #{str2.encoding} encoding.")
str2.encode!('UTF-8', 'UTF-8', :invalid => :replace)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm used to dealing with this by forcing locales in the environment. Does this have any side effects, such as deteriorating characters in a C environment for ex?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a unit test to this PR that reproduces the problem. This test adds latin1 data to the content of a file and test to show the diff. Without this fix the test will fail with mentioned error on line 91.
Even if I try to run the test like LC_ALL=C pdk test unit --verbose it gives me an error without the fix so I don't think it can be solved only by setting environment variables. I set default encoding to UTF-8 in the test on line 86 and 87 to make the test run consistently independent of LC_ALL, so to test with other environment variables you might want to remove these lines, but it still failed for me without these while running with LC_ALL=C.
The fix passes all tests so I think it shouldn't deteriorate characters that has worked fine in earlier versions. The fix only replaces invalid characters with � in diffs where the program else would have crashed. So it makes the program more robust to handle other encodings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It could be interesting to have a wrapper function for that so we can just call:

validate_encoding(str1)
validate_encoding(str2)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored the code into that function now. Much better! Unit test still runs fine and I also tested diffing different branches on a Puppet 5 master as well as diffing between a Puppet 3 and 5 Master and it works fine. So this should be solved now! 🙂


return nil unless str1 && str2
return nil if sum1 == sum2

Expand Down