Skip to content

feat(composio): promote GitHub from catalog-only to native memory provider#2413

Merged
senamakel merged 4 commits into
tinyhumansai:mainfrom
justinhsu1477:feat/github-memory-provider
May 25, 2026
Merged

feat(composio): promote GitHub from catalog-only to native memory provider#2413
senamakel merged 4 commits into
tinyhumansai:mainfrom
justinhsu1477:feat/github-memory-provider

Conversation

@justinhsu1477
Copy link
Copy Markdown
Contributor

@justinhsu1477 justinhsu1477 commented May 21, 2026

Summary

  • Adds GitHubProvider next to the existing gmail / notion / slack / clickup providers — GitHub becomes the fifth native memory-ingest provider in composio/providers/. Until now github/mod.rs self-declared "curated tool catalog only — no native ComposioProvider implementation yet"; this PR closes that gap.
  • Implementation mirrors the ClickUp (feat(composio): add ClickUp provider for Memory Tree ingest #2291, merged) and Linear (feat(composio): add Linear provider for Memory Tree ingest #2402, in review) providers 1:1 — same SyncState semantics, same persist_single_item ingest path, same daily-budget discipline, same "fetch-what-the-user-sees" assignee-scoped fetch.
  • Privacy posture: only issues where the connected user is the assignee are pulled, never the whole watched-repos issue graph. The assignee:<viewer_login> qualifier is constructed inside the provider — never accepted from a caller — so the boundary can't be tunnelled around.

Problem

composio/providers/ already has working memory-ingest providers for gmail, notion, slack, clickup, and (pending merge of #2402) linear. github/ was structurally one level lower — only tools.rs populated, no ComposioProvider impl. For an open-source AI assistant whose own contributors live on GitHub, that's the highest-leverage missing provider: every maintainer / reviewer / external contributor can dogfood it within minutes of merge.

Solution

Promote composio/providers/github/ to a full provider by adding the three companion files that the other native providers all have:

src/openhuman/composio/providers/github/
  mod.rs        UPDATED  — declares provider/sync/tests modules,
                           drops "no native impl yet" disclaimer
  tools.rs      KEEP     — GITHUB_CURATED unchanged (already includes
                           GITHUB_GET_AUTHENTICATED_USER and
                           GITHUB_SEARCH_ISSUES which is all the sync
                           path needs; no new actions added)
  provider.rs   NEW      — impl ComposioProvider for GitHubProvider
  sync.rs       NEW      — payload helpers (extract_issues,
                           extract_issue_title, extract_issue_updated,
                           extract_repo_qualified_identifier,
                           extract_viewer_login, extract_viewer_id)
  tests.rs      NEW      — 18 unit tests

Plus five registration touchpoints (same shape ClickUp landed in #2291 and Linear in #2402):

  1. composio/providers/mod.rs::has_native_provider: add "github".
  2. composio/providers/mod.rs::native_provider_sync_interval: add "github" => ….
  3. composio/providers/mod.rs::catalog_for_toolkit: no change — already points to github::GITHUB_CURATED.
  4. composio/providers/registry.rs::init_default_providers: add register_provider(Arc::new(GitHubProvider::new())).
  5. composio/providers/descriptions.rs: update GitHub description to mention Memory Tree sync.

Sync model (mirroring ClickUp / Linear)

  1. SyncState::load("github", connection_id) from the shared KV store.
  2. Daily request budget check (DEFAULT_DAILY_REQUEST_LIMIT = 500).
  3. Resolve viewer login via GITHUB_GET_AUTHENTICATED_USER — needed for the assignee:<login> search filter.
  4. Re-check budget between viewer probe and search call (CodeRabbit lesson from feat(composio): add ClickUp provider for Memory Tree ingest #2291 / feat(composio): add Linear provider for Memory Tree ingest #2402 — never burn a list call when the probe just spent the last budget slot).
  5. Page through GITHUB_SEARCH_ISSUES with query is:issue assignee:<viewer_login> sort:updated-desc. GitHub's Search API caps results at 1000 so pagination is naturally bounded; we also stop early on cursor boundary or short page (< per_page).
  6. Per issue, persist as one memory document via persist_single_item. Dedupe by composite issue_id@updated_at so edits re-ingest.
  7. Advance cursor to newest updated_at, record last_sync_at_ms, save state.

Transport-error path in the pagination loop persists state before propagating the error (CodeRabbit's other lesson from #2402: a flap mid-pagination must not roll back budget accounting).

Source-id convention

composio-github-issue-<global_id> — GitHub's id is globally unique across all of GitHub so it's a stable upsert key. Document title surfaces the canonical owner/repo#number form (e.g. GitHub tinyhumansai/openhuman#2408: …) so search hits read the way contributors refer to issues in conversation.

Curated tool catalog

GITHUB_CURATED (already in github/tools.rs from the catalog-only era) is unchanged. GITHUB_GET_AUTHENTICATED_USER and GITHUB_SEARCH_ISSUES were already curated — the minimum the sync path needs — so this PR adds zero new tool actions.

Submission Checklist

  • Tests added or updated — 18 new unit tests cover sync helpers (results across SEARCH/LIST envelope shapes, title / updated extraction, owner/repo#number composition from both repository.full_name and repository_url shapes, viewer-login strict extractor that refuses to fall back to non-login fields, viewer-id metadata extraction), trait metadata stability, capability matrix registration (capability_matrix_includes_github_as_native_memory_provider), and default_impl_matches_new (observable equivalence — not a no-op test).
  • Diff coverage ≥ 80% — new code is overwhelmingly the sync() async happy path (covered behind a Composio ProviderContext the existing test harness doesn't stand up — same as gmail / notion / slack / clickup / linear tests don't exercise the live sync() end-to-end either). Helper layer is unit-tested directly.
  • N/A: Coverage matrix updated — extends the existing "Composio memory provider" capability row; no new matrix feature row.
  • N/A: All affected feature IDs from the matrix are listed — extending an existing capability.
  • No new external network dependencies introduced — all GitHub API access goes through the existing Composio backend / direct client.
  • N/A: Manual smoke checklist updated — no release-cut surface changes; new ingest path is feature-flagged behind "user has a GitHub Composio connection".
  • Linked issue closed via Closes #2408 in ## Related.

Impact

  • Runtime/platform impact: desktop core only (Rust). No Tauri shell, no frontend changes.
  • Compatibility impact: strictly additive. Existing providers, their SyncState KV namespaces (composio-sync-state keyed by (toolkit, connection_id)), their registered tool catalogs, and all catalog-only toolkits are unchanged. The github tool catalog is unchanged — only adds the missing provider impl around it.
  • Performance impact: bounded — MAX_PAGES_PER_SYNC = 20, PAGE_SIZE = 50 steady-state (100 for the initial backfill), DailyBudget = 500 req/day, and GitHub's own 1000-result Search cap. Per sync pass: 1 viewer probe + up to 10 search pages.
  • Security impact: assignee-scoped fetch (assignee:<viewer_login>) prevents accidental ingest of other contributors' issues. Viewer-login extractor is strict (no fallback to id / name fields) so the filter can't silently scope to the wrong identity. Composio handles credentials; no new secret-handling code.

Related


AI Authored PR Metadata

Linear Issue

  • Key: N/A
  • URL: N/A

Commit & Branch

  • Branch: feat/github-memory-provider
  • Commit SHA: (latest on the branch)
  • Base: upstream/main at fetch time

Validation Run

  • N/A: pnpm --filter openhuman-app format:check — Rust-only change.
  • N/A: pnpm typecheck — Rust-only change.
  • Focused tests: cargo test --lib composio::providers::github (40/40 pass — combines 18 in tests.rs and 22 in sync.rs inline tests); cargo test --lib composio::providers (305/305 pass — no regression on gmail / notion / slack / clickup or any catalog-only toolkit).
  • Rust fmt/check: cargo fmt --check clean; cargo check --lib clean (pre-existing warnings only); cargo clippy --lib --no-deps no new warnings in composio/providers/github/.
  • N/A: Tauri fmt/check — no app/src-tauri/src/** changes.

Validation Blocked

  • N/A

Behavior Changes

  • Intended behavior change: users with a Composio-connected GitHub account now have their assigned issues periodically ingested into the Memory Tree on the existing 30-minute scheduler cadence, with initial backfill triggered by the ConnectionCreated hook.
  • User-visible effect: GitHub issue content (title, body, state, labels, assignees, repo, created_at, updated_at as JSON) becomes available to the agent and retrieval layer the same way Gmail / Notion / Slack / ClickUp content already is.

Parity Contract

  • Legacy behavior preserved: existing gmail / notion / slack / clickup providers are completely untouched. Their SyncState KV namespaces are unchanged. GITHUB_CURATED (already in github/tools.rs) is byte-identical — no new tool actions added or removed.
  • Guard/fallback/dispatch parity checks: provider follows the existing ComposioProvider trait contract — daily budget, dedup-by-id, cursor-based pagination, idempotent persist_single_item upserts.
  • catalog_for_toolkit("github") continues to return github::GITHUB_CURATED (was already wired that way pre-PR; only the trait-impl side of github/ is new).

Duplicate / Superseded PR Handling

Summary by CodeRabbit

  • Localization Updates
    • Expanded German language translations with new and updated strings for mascot settings, developer menu features, appearance preferences, and onboarding/workspace functionality.

Review Change Stack

…vider

Adds `GitHubProvider` next to the existing `gmail` / `notion` / `slack` /
`clickup` providers, joining them as the fifth native memory-ingest
provider in `composio/providers/`. Until now `github/mod.rs` declared
itself "curated tool catalog only — no native ComposioProvider
implementation yet"; this change closes that gap so the connected
user's assigned GitHub issues stream into the Memory Tree on the
periodic scheduler.

Implementation mirrors the ClickUp (tinyhumansai#2291, merged) and Linear (tinyhumansai#2402,
in review) providers 1:1 — same `SyncState` semantics, same
`persist_single_item` ingest path, same daily-budget discipline, same
"fetch-what-the-user-sees" assignee-scoped fetch.

## Sync model

  1. SyncState load + daily budget gate.
  2. Resolve viewer login via GITHUB_GET_AUTHENTICATED_USER.
  3. Re-check budget after the probe (per CodeRabbit lesson on
     tinyhumansai#2291 / tinyhumansai#2402 — never burn a list call when the probe just
     spent the last budget slot).
  4. Page through GITHUB_SEARCH_ISSUES with
     `is:issue assignee:<login> sort:updated-desc`. GitHub's search
     caps results at 1000 so pagination is naturally bounded; we
     also stop early on cursor boundary or short page.
  5. Per issue, persist as one memory document via
     persist_single_item. Composite `issue_id@updated_at` dedup key
     re-syncs edited issues.
  6. Advance cursor to newest updated_at, record last_sync_at_ms,
     save state.

Transport-error path in the pagination loop persists state before
returning the error (CodeRabbit lesson on tinyhumansai#2402: a flap mid-pagination
must not roll back budget accounting).

## Privacy posture

`assignee:<viewer_login>` is constructed inside the provider —
never accepted from a caller — so the privacy boundary can't be
tunnelled around. Matches the discipline gmail / notion / clickup /
linear already follow.

## Source-id convention

`composio-github-issue-<global_id>`. GitHub's `id` is globally unique
across all of GitHub so it's a stable upsert key. Document title
surfaces the canonical `owner/repo#number` form (e.g.
`GitHub tinyhumansai#2408: …`) so search hits read the way
contributors refer to issues in conversation.

## Curated tool catalog

`GITHUB_CURATED` (already in `github/tools.rs` since the catalog-only
era) is unchanged — `GITHUB_GET_AUTHENTICATED_USER` and
`GITHUB_SEARCH_ISSUES` were already curated, which is the minimum the
sync path needs. No new actions added.

## Files

Added:
  - composio/providers/github/provider.rs   — GitHubProvider impl (~415)
  - composio/providers/github/sync.rs       — payload helpers (~360)
  - composio/providers/github/tests.rs      — 18 unit tests (~165)

Modified:
  - composio/providers/github/mod.rs        — declare new modules +
                                              re-export provider;
                                              remove stale
                                              "no native impl yet"
                                              disclaimer
  - composio/providers/mod.rs               — has_native_provider arm +
                                              native_provider_sync_interval
                                              arm + new regression test
                                              `capability_matrix_includes_github_as_native_memory_provider`
                                              (catalog_for_toolkit
                                              already pointed at
                                              github::GITHUB_CURATED;
                                              no change needed there)
  - composio/providers/registry.rs          — register_provider in
                                              init_default_providers
  - composio/providers/descriptions.rs      — GitHub description
                                              updated to mention
                                              Memory Tree sync

## Verification

  - cargo check --lib clean (pre-existing warnings only)
  - cargo test --lib composio::providers::github  40/40 pass
  - cargo test --lib composio::providers          305/305 pass
    (no regression on gmail / notion / slack / clickup)
  - cargo test --lib capability_matrix_includes_github  1/1 pass
  - cargo fmt --check clean
  - cargo clippy --lib --no-deps no new warnings in github/

Closes tinyhumansai#2408
@justinhsu1477 justinhsu1477 requested a review from a team May 21, 2026 05:31
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 57fca3bb-a28f-4794-bc10-112c28d10c7d

📥 Commits

Reviewing files that changed from the base of the PR and between b89790c and a164ff9.

📒 Files selected for processing (1)
  • app/src/lib/i18n/chunks/de-5.ts
💤 Files with no reviewable changes (1)
  • app/src/lib/i18n/chunks/de-5.ts

📝 Walkthrough

Walkthrough

German locale translations in de-5.ts are updated to remove deprecated custom GIF mascot strings and replace them with new mascot library availability, developer menu, and extended UI/onboarding translation keys.

Changes

German Locale Translations

Layer / File(s) Summary
Mascot and developer menu translation updates
app/src/lib/i18n/chunks/de-5.ts
Removes deprecated settings.mascot.customGifError, customGifHeading, and customGifLabel entries; adds settings.mascot.libraryUnavailable, settings.mascot.title, settings.developerMenu.composio.*, and additional UI/onboarding/progress/workspace/calls/migration/settings translation strings in the German translation map.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • tinyhumansai/openhuman#2597: Directly modifies the same de-5.ts file, removing and replacing settings.mascot.customGif* keys with new mascot library and developer menu translations.
  • tinyhumansai/openhuman#2522: Updates mascot "custom GIF" translation entries in de-5.ts in the same settings.mascot.* section.
  • tinyhumansai/openhuman#2538: Applies parallel mascot "custom GIF" translation updates in the Chinese locale (zh-CN-5.ts).

Suggested labels

feature, rust-core

Suggested reviewers

  • graycyrus

Poem

🐰 Hallo, liebste Übersetzungen so fein!
Alte Masken-GIFs, weg muss das sein,
Neue Bibliotheken und Menüs erblühen,
Deutsch schreitet voran mit frischen Gefühlen! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The PR title references promoting GitHub to a native memory provider, but the file changed is a German i18n translation chunk with no visible connection to composio provider implementation. The title describes composio/GitHub provider changes, but the only file summary shows i18n translation updates (de-5.ts). Verify the PR includes the actual composio provider code changes or clarify if this is a partial view of the changeset.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the working A PR that is being worked on by the team. label May 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/composio/providers/github/mod.rs (1)

24-25: ⚡ Quick win

Prefer *_test.rs wiring for extracted module tests.

Since tests were extracted, wire them via a sibling *_test.rs file to match the repository convention.

♻️ Suggested update
 #[cfg(test)]
-mod tests;
+#[path = "github_test.rs"]
+mod tests;

Also rename src/openhuman/composio/providers/github/tests.rs to src/openhuman/composio/providers/github/github_test.rs.

As per coding guidelines: “When extracting Rust tests out of an implementation file, prefer a sibling *_test.rs file wired in with #[cfg(test)] #[path = "..._test.rs"] mod tests;.”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/composio/providers/github/mod.rs` around lines 24 - 25, Replace
the current inline test module wiring (the existing #[cfg(test)] mod tests;) so
it points to a sibling test file using a path attribute and rename the extracted
tests file accordingly: create/rename
src/openhuman/composio/providers/github/tests.rs to github_test.rs and change
the module declaration to use #[cfg(test)] with #[path = "github_test.rs"] mod
tests; so the tests are loaded from the sibling github_test.rs file instead of
inline.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/composio/providers/github/mod.rs`:
- Around line 24-25: Replace the current inline test module wiring (the existing
#[cfg(test)] mod tests;) so it points to a sibling test file using a path
attribute and rename the extracted tests file accordingly: create/rename
src/openhuman/composio/providers/github/tests.rs to github_test.rs and change
the module declaration to use #[cfg(test)] with #[path = "github_test.rs"] mod
tests; so the tests are loaded from the sibling github_test.rs file instead of
inline.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4551cde9-34d5-4fa2-b112-369040495c1a

📥 Commits

Reviewing files that changed from the base of the PR and between 6281aea and b89790c.

📒 Files selected for processing (7)
  • src/openhuman/composio/providers/descriptions.rs
  • src/openhuman/composio/providers/github/mod.rs
  • src/openhuman/composio/providers/github/provider.rs
  • src/openhuman/composio/providers/github/sync.rs
  • src/openhuman/composio/providers/github/tests.rs
  • src/openhuman/composio/providers/mod.rs
  • src/openhuman/composio/providers/registry.rs

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 21, 2026
@justinhsu1477
Copy link
Copy Markdown
Contributor Author

Status note for anyone scanning this surface:

If anyone has a parallel GitHub-provider implementation in progress or planned, please flag in #2408 first so we can coordinate scope split.

@senamakel senamakel self-assigned this May 25, 2026
# Conflicts:
#	src/openhuman/composio/providers/github/mod.rs
#	src/openhuman/composio/providers/mod.rs
#	src/openhuman/memory_sync/composio/providers/github/provider.rs
#	src/openhuman/memory_sync/composio/providers/github/sync.rs
#	src/openhuman/memory_sync/composio/providers/github/tests.rs
Merge introduced duplicate settings.mascot.customGif* entries.
@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. labels May 25, 2026
@senamakel senamakel merged commit bf2400f into tinyhumansai:main May 25, 2026
28 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. working A PR that is being worked on by the team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add GitHub as a Composio memory provider (joining gmail / notion / slack / clickup / linear)

2 participants