feat(composio): promote GitHub from catalog-only to native memory provider#2413
Conversation
…vider Adds `GitHubProvider` next to the existing `gmail` / `notion` / `slack` / `clickup` providers, joining them as the fifth native memory-ingest provider in `composio/providers/`. Until now `github/mod.rs` declared itself "curated tool catalog only — no native ComposioProvider implementation yet"; this change closes that gap so the connected user's assigned GitHub issues stream into the Memory Tree on the periodic scheduler. Implementation mirrors the ClickUp (tinyhumansai#2291, merged) and Linear (tinyhumansai#2402, in review) providers 1:1 — same `SyncState` semantics, same `persist_single_item` ingest path, same daily-budget discipline, same "fetch-what-the-user-sees" assignee-scoped fetch. ## Sync model 1. SyncState load + daily budget gate. 2. Resolve viewer login via GITHUB_GET_AUTHENTICATED_USER. 3. Re-check budget after the probe (per CodeRabbit lesson on tinyhumansai#2291 / tinyhumansai#2402 — never burn a list call when the probe just spent the last budget slot). 4. Page through GITHUB_SEARCH_ISSUES with `is:issue assignee:<login> sort:updated-desc`. GitHub's search caps results at 1000 so pagination is naturally bounded; we also stop early on cursor boundary or short page. 5. Per issue, persist as one memory document via persist_single_item. Composite `issue_id@updated_at` dedup key re-syncs edited issues. 6. Advance cursor to newest updated_at, record last_sync_at_ms, save state. Transport-error path in the pagination loop persists state before returning the error (CodeRabbit lesson on tinyhumansai#2402: a flap mid-pagination must not roll back budget accounting). ## Privacy posture `assignee:<viewer_login>` is constructed inside the provider — never accepted from a caller — so the privacy boundary can't be tunnelled around. Matches the discipline gmail / notion / clickup / linear already follow. ## Source-id convention `composio-github-issue-<global_id>`. GitHub's `id` is globally unique across all of GitHub so it's a stable upsert key. Document title surfaces the canonical `owner/repo#number` form (e.g. `GitHub tinyhumansai#2408: …`) so search hits read the way contributors refer to issues in conversation. ## Curated tool catalog `GITHUB_CURATED` (already in `github/tools.rs` since the catalog-only era) is unchanged — `GITHUB_GET_AUTHENTICATED_USER` and `GITHUB_SEARCH_ISSUES` were already curated, which is the minimum the sync path needs. No new actions added. ## Files Added: - composio/providers/github/provider.rs — GitHubProvider impl (~415) - composio/providers/github/sync.rs — payload helpers (~360) - composio/providers/github/tests.rs — 18 unit tests (~165) Modified: - composio/providers/github/mod.rs — declare new modules + re-export provider; remove stale "no native impl yet" disclaimer - composio/providers/mod.rs — has_native_provider arm + native_provider_sync_interval arm + new regression test `capability_matrix_includes_github_as_native_memory_provider` (catalog_for_toolkit already pointed at github::GITHUB_CURATED; no change needed there) - composio/providers/registry.rs — register_provider in init_default_providers - composio/providers/descriptions.rs — GitHub description updated to mention Memory Tree sync ## Verification - cargo check --lib clean (pre-existing warnings only) - cargo test --lib composio::providers::github 40/40 pass - cargo test --lib composio::providers 305/305 pass (no regression on gmail / notion / slack / clickup) - cargo test --lib capability_matrix_includes_github 1/1 pass - cargo fmt --check clean - cargo clippy --lib --no-deps no new warnings in github/ Closes tinyhumansai#2408
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
📝 WalkthroughWalkthroughGerman locale translations in ChangesGerman Locale Translations
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/openhuman/composio/providers/github/mod.rs (1)
24-25: ⚡ Quick winPrefer
*_test.rswiring for extracted module tests.Since tests were extracted, wire them via a sibling
*_test.rsfile to match the repository convention.♻️ Suggested update
#[cfg(test)] -mod tests; +#[path = "github_test.rs"] +mod tests;Also rename
src/openhuman/composio/providers/github/tests.rstosrc/openhuman/composio/providers/github/github_test.rs.As per coding guidelines: “When extracting Rust tests out of an implementation file, prefer a sibling
*_test.rsfile wired in with#[cfg(test)] #[path = "..._test.rs"] mod tests;.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/openhuman/composio/providers/github/mod.rs` around lines 24 - 25, Replace the current inline test module wiring (the existing #[cfg(test)] mod tests;) so it points to a sibling test file using a path attribute and rename the extracted tests file accordingly: create/rename src/openhuman/composio/providers/github/tests.rs to github_test.rs and change the module declaration to use #[cfg(test)] with #[path = "github_test.rs"] mod tests; so the tests are loaded from the sibling github_test.rs file instead of inline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/openhuman/composio/providers/github/mod.rs`:
- Around line 24-25: Replace the current inline test module wiring (the existing
#[cfg(test)] mod tests;) so it points to a sibling test file using a path
attribute and rename the extracted tests file accordingly: create/rename
src/openhuman/composio/providers/github/tests.rs to github_test.rs and change
the module declaration to use #[cfg(test)] with #[path = "github_test.rs"] mod
tests; so the tests are loaded from the sibling github_test.rs file instead of
inline.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4551cde9-34d5-4fa2-b112-369040495c1a
📒 Files selected for processing (7)
src/openhuman/composio/providers/descriptions.rssrc/openhuman/composio/providers/github/mod.rssrc/openhuman/composio/providers/github/provider.rssrc/openhuman/composio/providers/github/sync.rssrc/openhuman/composio/providers/github/tests.rssrc/openhuman/composio/providers/mod.rssrc/openhuman/composio/providers/registry.rs
|
Status note for anyone scanning this surface:
If anyone has a parallel GitHub-provider implementation in progress or planned, please flag in #2408 first so we can coordinate scope split. |
# Conflicts: # src/openhuman/composio/providers/github/mod.rs # src/openhuman/composio/providers/mod.rs # src/openhuman/memory_sync/composio/providers/github/provider.rs # src/openhuman/memory_sync/composio/providers/github/sync.rs # src/openhuman/memory_sync/composio/providers/github/tests.rs
Merge introduced duplicate settings.mascot.customGif* entries.
Summary
GitHubProvidernext to the existinggmail/notion/slack/clickupproviders — GitHub becomes the fifth native memory-ingest provider incomposio/providers/. Until nowgithub/mod.rsself-declared "curated tool catalog only — no native ComposioProvider implementation yet"; this PR closes that gap.SyncStatesemantics, samepersist_single_itemingest path, same daily-budget discipline, same "fetch-what-the-user-sees" assignee-scoped fetch.assignee:<viewer_login>qualifier is constructed inside the provider — never accepted from a caller — so the boundary can't be tunnelled around.Problem
composio/providers/already has working memory-ingest providers for gmail, notion, slack, clickup, and (pending merge of #2402) linear.github/was structurally one level lower — onlytools.rspopulated, noComposioProviderimpl. For an open-source AI assistant whose own contributors live on GitHub, that's the highest-leverage missing provider: every maintainer / reviewer / external contributor can dogfood it within minutes of merge.Solution
Promote
composio/providers/github/to a full provider by adding the three companion files that the other native providers all have:Plus five registration touchpoints (same shape ClickUp landed in #2291 and Linear in #2402):
composio/providers/mod.rs::has_native_provider: add"github".composio/providers/mod.rs::native_provider_sync_interval: add"github" => ….composio/providers/mod.rs::catalog_for_toolkit: no change — already points togithub::GITHUB_CURATED.composio/providers/registry.rs::init_default_providers: addregister_provider(Arc::new(GitHubProvider::new())).composio/providers/descriptions.rs: update GitHub description to mention Memory Tree sync.Sync model (mirroring ClickUp / Linear)
SyncState::load("github", connection_id)from the shared KV store.DEFAULT_DAILY_REQUEST_LIMIT = 500).GITHUB_GET_AUTHENTICATED_USER— needed for theassignee:<login>search filter.GITHUB_SEARCH_ISSUESwith queryis:issue assignee:<viewer_login> sort:updated-desc. GitHub's Search API caps results at 1000 so pagination is naturally bounded; we also stop early on cursor boundary or short page (< per_page).persist_single_item. Dedupe by compositeissue_id@updated_atso edits re-ingest.updated_at, recordlast_sync_at_ms, save state.Transport-error path in the pagination loop persists state before propagating the error (CodeRabbit's other lesson from #2402: a flap mid-pagination must not roll back budget accounting).
Source-id convention
composio-github-issue-<global_id>— GitHub'sidis globally unique across all of GitHub so it's a stable upsert key. Document title surfaces the canonicalowner/repo#numberform (e.g.GitHub tinyhumansai/openhuman#2408: …) so search hits read the way contributors refer to issues in conversation.Curated tool catalog
GITHUB_CURATED(already ingithub/tools.rsfrom the catalog-only era) is unchanged.GITHUB_GET_AUTHENTICATED_USERandGITHUB_SEARCH_ISSUESwere already curated — the minimum the sync path needs — so this PR adds zero new tool actions.Submission Checklist
owner/repo#numbercomposition from bothrepository.full_nameandrepository_urlshapes, viewer-login strict extractor that refuses to fall back to non-login fields, viewer-id metadata extraction), trait metadata stability, capability matrix registration (capability_matrix_includes_github_as_native_memory_provider), anddefault_impl_matches_new(observable equivalence — not a no-op test).sync()async happy path (covered behind a ComposioProviderContextthe existing test harness doesn't stand up — same as gmail / notion / slack / clickup / linear tests don't exercise the livesync()end-to-end either). Helper layer is unit-tested directly.Closes #2408in## Related.Impact
SyncStateKV namespaces (composio-sync-statekeyed by(toolkit, connection_id)), their registered tool catalogs, and all catalog-only toolkits are unchanged. The github tool catalog is unchanged — only adds the missing provider impl around it.MAX_PAGES_PER_SYNC = 20,PAGE_SIZE = 50steady-state (100for the initial backfill),DailyBudget = 500 req/day, and GitHub's own 1000-result Search cap. Per sync pass: 1 viewer probe + up to 10 search pages.assignee:<viewer_login>) prevents accidental ingest of other contributors' issues. Viewer-login extractor is strict (no fallback toid/namefields) so the filter can't silently scope to the wrong identity. Composio handles credentials; no new secret-handling code.Related
composio/providers/clickup/(feat(composio): add ClickUp provider for Memory Tree ingest #2291, merged),composio/providers/linear/(feat(composio): add Linear provider for Memory Tree ingest #2402, in review).composio/providers/traits.rs.composio/providers/sync_state.rs.AI Authored PR Metadata
Linear Issue
Commit & Branch
feat/github-memory-providerupstream/mainat fetch timeValidation Run
pnpm --filter openhuman-app format:check— Rust-only change.pnpm typecheck— Rust-only change.cargo test --lib composio::providers::github(40/40 pass — combines 18 intests.rsand 22 insync.rsinline tests);cargo test --lib composio::providers(305/305 pass — no regression on gmail / notion / slack / clickup or any catalog-only toolkit).cargo fmt --checkclean;cargo check --libclean (pre-existing warnings only);cargo clippy --lib --no-depsno new warnings incomposio/providers/github/.app/src-tauri/src/**changes.Validation Blocked
Behavior Changes
ConnectionCreatedhook.Parity Contract
SyncStateKV namespaces are unchanged.GITHUB_CURATED(already ingithub/tools.rs) is byte-identical — no new tool actions added or removed.ComposioProvidertrait contract — daily budget, dedup-by-id, cursor-based pagination, idempotentpersist_single_itemupserts.catalog_for_toolkit("github")continues to returngithub::GITHUB_CURATED(was already wired that way pre-PR; only the trait-impl side ofgithub/is new).Duplicate / Superseded PR Handling
GitHubProviderimpl.Summary by CodeRabbit