feat(import) Add pagination, rate limiting, retry, and truncation warnings#518
Merged
feat(import) Add pagination, rate limiting, retry, and truncation warnings#518
Conversation
why: Users importing large numbers of repositories had to guess a high number for --limit. Using 0 as "no limit" is a common CLI convention. what: - Change ImportOptions validation from limit < 1 to limit < 0 - Convert limit=0 to sys.maxsize in __post_init__() - Update --limit help text to document 0 = no limit - Update tests to reflect new validation boundary
why: Rate-limited requests previously failed immediately. Automatic retry with backoff lets imports succeed through transient rate limits. what: - Add max_retries and retry_base_delay params to HTTPClient.__init__() - Wrap HTTPClient.get() request in retry loop for HTTP 429 responses - Add _calculate_retry_delay() using Retry-After header or exponential backoff - Extend mock_urlopen to support HTTPError objects in response sequences - Add parametrized retry tests and standalone backoff/cap tests
why: GitLab rate limit headers were silently ignored, providing no warning before hitting the limit. GitHub already had this logging. what: - Add _log_rate_limit() method to GitLabImporter using ratelimit-remaining/limit - Capture response headers in _paginate_repos() and _fetch_search() - Call _log_rate_limit() after each API request - Add parametrized tests for all header scenarios
why: Users saw "Found 100 repositories" with no indication that hundreds more were silently truncated, leading to incomplete imports. what: - GitLab: Extract x-total and x-next-page headers to detect truncation - GitLab: Add _warn_truncation() method with two warning variants - GitHub search: Use total_count from JSON body to detect truncation - GitHub user/org: Use mid-page limit hit as "more available" signal - Add parametrized truncation warning tests for both providers
why: PR #518 adds several bug fixes and features that need changelog entries. what: - Document silent truncation fix for GitLab/GitHub results - Document HTTP 429 retry with exponential backoff - Document --limit 0 as "no limit" convention - Document GitLab rate-limit header logging
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #518 +/- ##
==========================================
+ Coverage 83.01% 83.45% +0.44%
==========================================
Files 29 29
Lines 3556 3633 +77
Branches 705 723 +18
==========================================
+ Hits 2952 3032 +80
+ Misses 388 386 -2
+ Partials 216 215 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
why: When count reached the limit on the last item of a full page, the for-loop ended naturally — the count guard never fired, so more_available stayed False and no truncation warning was emitted. what: - Add post-loop boundary check in _paginate_repos - Add test_github_truncation_at_page_boundary covering the edge case
why: ImportOptions is not frozen, so object.__setattr__ is unnecessary. what: - Replace object.__setattr__(self, "limit", sys.maxsize) with self.limit = sys.maxsize
why: Every path through the retry loop already returns or raises. The last_exc tracking and post-loop _handle_http_error call were unreachable dead code. what: - Remove last_exc variable declaration and assignment - Remove post-loop _handle_http_error(last_exc) call - Keep minimal fallback raise required by mypy (cannot statically prove range(max_retries + 1) is non-empty)
why: The comment "Capture pagination metadata from first page" was misleading because x-next-page is updated on every page, not just the first. Only x-total is captured from the first response. what: - Change comment to "Capture x-total from first response" at both locations (_fetch_search and _paginate_repos)
why: The changelog claimed all providers print the same message format, but GitHub user/org mode uses "more may be available" (no total count) rather than "Showing N of M". what: - Replace specific message quote with generic description
…able why: Live testing of all permutations revealed two undocumented behaviors: slash-notation for direct subgroup targeting and the workspace nesting rules relative to target depth. what: - Add "Subgroup targeting" section with slash notation examples - Add workspace structure table covering all target/flatten combinations - Clarify that --flatten-groups is a no-op when target is already a leaf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--limit 0means "no limit": Common CLI convention so users don't have to guess a large number when importing many reposRetry-Afterheader or exponential backoff, instead of failing immediatelyratelimit-remaining/ratelimit-limitheaders after each API request (GitHub already had this)--limitcaps results: Both GitLab and GitHub importers now warn when results are silently truncated, showing "Showing N of M repositories"Changes
base.py: Addmax_retries,retry_base_delaytoHTTPClient; retry loop with_calculate_retry_delay()for 429sgitlab.py: Add_log_rate_limit(),_warn_truncation(); capture response headers in pagination methodsgithub.py: Add truncation detection usingtotal_count(search) and mid-page limit hit (user/org)cli/import_cmd/_common.py: Allowlimit=0→sys.maxsizeinImportOptionsTest plan
--limit 0validation: limit < 0 rejected, 0 converts tosys.maxsizeRetry-After, exponential backoff, max retries exhausted--limit 1shows truncation warning,--limit 0fetches all 15