fix: Handle various HTML encodings in documentation parsing #58
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes encoding errors that occur when parsing Home Manager and nix-darwin documentation served with non-UTF-8 encodings.
Problem
Users were encountering intermittent errors like:
This happened because:
resp.textforced Python's requests library to decode with the declared encoding, which could failSolution
Changed
BeautifulSoup(resp.text, "html.parser")toBeautifulSoup(resp.content, "html.parser"):resp.contentprovides raw bytes instead of pre-decoded textChanges
Core Fix
mcp_nixos/server.py: Useresp.contentinstead ofresp.textinparse_html_options()Test Coverage
Added comprehensive encoding tests:
Test Updates
Updated all existing tests to use
mock_resp.content(bytes) instead ofmock_resp.text(string):tests/test_server.py- 9 tests updated + 3 new encoding teststests/test_edge_cases.py- 3 tests updatedtests/test_evals.py- 1 test updatedtests/test_flakes.py- 7 tests updatedtests/test_mcp_tools.py- 1 test updatedtests/test_plain_text_output.py- 5 tests updatedTesting
✅ All 344 tests passing
✅ Code formatted with ruff
✅ All linting checks pass
✅ Type checking passes (mypy)
Test the fix
Impact
This is a bug fix with no breaking changes. The fix makes the MCP server more robust when fetching documentation from various sources with different encoding configurations.