Skip to content

feat(import) Add pagination, rate limiting, retry, and truncation warnings#518

Merged
tony merged 11 commits intomasterfrom
import-pagination-rate-limit-retry
Feb 19, 2026
Merged

feat(import) Add pagination, rate limiting, retry, and truncation warnings#518
tony merged 11 commits intomasterfrom
import-pagination-rate-limit-retry

Conversation

@tony
Copy link
Member

@tony tony commented Feb 18, 2026

Summary

  • --limit 0 means "no limit": Common CLI convention so users don't have to guess a large number when importing many repos
  • Retry with exponential backoff on HTTP 429: Rate-limited requests now automatically retry (up to 3 times) using Retry-After header or exponential backoff, instead of failing immediately
  • GitLab rate limit header logging: Log ratelimit-remaining/ratelimit-limit headers after each API request (GitHub already had this)
  • Truncation warnings when --limit caps results: Both GitLab and GitHub importers now warn when results are silently truncated, showing "Showing N of M repositories"

Changes

  • base.py: Add max_retries, retry_base_delay to HTTPClient; retry loop with _calculate_retry_delay() for 429s
  • gitlab.py: Add _log_rate_limit(), _warn_truncation(); capture response headers in pagination methods
  • github.py: Add truncation detection using total_count (search) and mid-page limit hit (user/org)
  • cli/import_cmd/_common.py: Allow limit=0sys.maxsize in ImportOptions
  • 9 files changed, 867 insertions(+), 39 deletions(-)

Test plan

  • --limit 0 validation: limit < 0 rejected, 0 converts to sys.maxsize
  • Retry: parametrized tests for 429 with Retry-After, exponential backoff, max retries exhausted
  • Rate limit logging: parametrized tests for all GitLab header scenarios
  • Truncation warnings: parametrized tests for both GitLab and GitHub providers
  • Integration verified: GitLab test groups populated with 5 repos each; --limit 1 shows truncation warning, --limit 0 fetches all 15

tony added 5 commits February 18, 2026 17:11
why: Users importing large numbers of repositories had to guess a high number
for --limit. Using 0 as "no limit" is a common CLI convention.
what:
- Change ImportOptions validation from limit < 1 to limit < 0
- Convert limit=0 to sys.maxsize in __post_init__()
- Update --limit help text to document 0 = no limit
- Update tests to reflect new validation boundary
why: Rate-limited requests previously failed immediately. Automatic retry
with backoff lets imports succeed through transient rate limits.
what:
- Add max_retries and retry_base_delay params to HTTPClient.__init__()
- Wrap HTTPClient.get() request in retry loop for HTTP 429 responses
- Add _calculate_retry_delay() using Retry-After header or exponential backoff
- Extend mock_urlopen to support HTTPError objects in response sequences
- Add parametrized retry tests and standalone backoff/cap tests
why: GitLab rate limit headers were silently ignored, providing no warning
before hitting the limit. GitHub already had this logging.
what:
- Add _log_rate_limit() method to GitLabImporter using ratelimit-remaining/limit
- Capture response headers in _paginate_repos() and _fetch_search()
- Call _log_rate_limit() after each API request
- Add parametrized tests for all header scenarios
why: Users saw "Found 100 repositories" with no indication that hundreds more
were silently truncated, leading to incomplete imports.
what:
- GitLab: Extract x-total and x-next-page headers to detect truncation
- GitLab: Add _warn_truncation() method with two warning variants
- GitHub search: Use total_count from JSON body to detect truncation
- GitHub user/org: Use mid-page limit hit as "more available" signal
- Add parametrized truncation warning tests for both providers
why: PR #518 adds several bug fixes and features that need changelog entries.
what:
- Document silent truncation fix for GitLab/GitHub results
- Document HTTP 429 retry with exponential backoff
- Document --limit 0 as "no limit" convention
- Document GitLab rate-limit header logging
@codecov
Copy link

codecov bot commented Feb 18, 2026

Codecov Report

❌ Patch coverage is 86.59794% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.45%. Comparing base (2c5c16c) to head (8e6163b).
⚠️ Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
src/vcspull/_internal/remotes/base.py 78.04% 8 Missing and 1 partial ⚠️
src/vcspull/_internal/remotes/gitlab.py 90.24% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #518      +/-   ##
==========================================
+ Coverage   83.01%   83.45%   +0.44%     
==========================================
  Files          29       29              
  Lines        3556     3633      +77     
  Branches      705      723      +18     
==========================================
+ Hits         2952     3032      +80     
+ Misses        388      386       -2     
+ Partials      216      215       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tony added 6 commits February 18, 2026 18:06
why: When count reached the limit on the last item of a full page,
the for-loop ended naturally — the count guard never fired, so
more_available stayed False and no truncation warning was emitted.

what:
- Add post-loop boundary check in _paginate_repos
- Add test_github_truncation_at_page_boundary covering the edge case
why: ImportOptions is not frozen, so object.__setattr__ is unnecessary.
what:
- Replace object.__setattr__(self, "limit", sys.maxsize) with self.limit = sys.maxsize
why: Every path through the retry loop already returns or raises.
The last_exc tracking and post-loop _handle_http_error call were
unreachable dead code.

what:
- Remove last_exc variable declaration and assignment
- Remove post-loop _handle_http_error(last_exc) call
- Keep minimal fallback raise required by mypy (cannot statically
  prove range(max_retries + 1) is non-empty)
why: The comment "Capture pagination metadata from first page" was
misleading because x-next-page is updated on every page, not just
the first. Only x-total is captured from the first response.

what:
- Change comment to "Capture x-total from first response" at both
  locations (_fetch_search and _paginate_repos)
why: The changelog claimed all providers print the same message format,
but GitHub user/org mode uses "more may be available" (no total count)
rather than "Showing N of M".

what:
- Replace specific message quote with generic description
…able

why: Live testing of all permutations revealed two undocumented behaviors:
slash-notation for direct subgroup targeting and the workspace nesting
rules relative to target depth.

what:
- Add "Subgroup targeting" section with slash notation examples
- Add workspace structure table covering all target/flatten combinations
- Clarify that --flatten-groups is a no-op when target is already a leaf
@tony tony merged commit e9db410 into master Feb 19, 2026
9 checks passed
@tony tony deleted the import-pagination-rate-limit-retry branch February 19, 2026 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant