Conversation
Implements the OCR pipeline (task 04) with httpx-based async HTTP client, exponential backoff retry logic, file validation, and structured response parsing into OCRResult. Adds ocr_endpoint to config, httpx and pytest-asyncio dependencies, and 32 tests at 100% module coverage. Closes #7 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Broaden retry catch from TimeoutException to TransportError (ConnectError, ReadError, etc.) - Validate 'pages' key exists in API response to prevent silent empty results - Wrap KeyError/TypeError from malformed page entries into OCRError - Handle response.json() decode failures with clear OCRError - Read file bytes once before retry loop, wrap OSError into OCRError - Use is_file() instead of exists() in _validate_file - Add file context to retry warning logs, ERROR log on final failure - Extract mock_ocr_client fixture to reduce test boilerplate - Add tests for ConnectError retry, non-JSON response, malformed API response, missing page keys Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add pydantic.ValidationError to the caught exception types in _parse_response so invalid API values (e.g. page_number=0) are wrapped as OCRError instead of escaping as raw ValidationError - Move OCRResult construction inside the try block for full coverage - Add test for invalid page_number triggering OCRError Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
src/docproc/ocr.py— async OCR extraction via DeepFellow easyOCR API withhttpxocr_endpointfield toDeepfellowConfigandconfig-example.yamlhttpx>=0.28.0as explicit dependency,pytest-asyncio>=0.25.0dev dep withasyncio_mode = "auto"0.1.3→0.1.4Test plan
uv run ruff check src/ tests/— passesuv run ruff format --check src/ tests/— passesuv run ty check src/— passesuv run pytest— 105 tests pass, 96.48% coverage (≥80%)Closes #7
🤖 Generated with Claude Code