Skip to content

vanlabs-dev/cstack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

139 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cstack

Multi-tenant Microsoft 365 operations toolkit for MSP engineers. Self-hosted, ML-augmented, open source.

cstack treats a fleet of M365 tenants the way an SRE treats a fleet of services: audit them with rules, score their traffic with per-tenant ML models, and surface findings with engineer-grade narratives instead of vendor portal screenshots.

What it does

The first tool in the toolkit, SignalGuard, ships two halves.

The CA audit half evaluates every tenant against a 15-rule catalogue (block legacy auth, MFA on admins, risk-based sign-in, break-glass exclusions, and others), plus a coverage-matrix layer that flags weak (user-segment x app-segment) cells, plus an exclusion-hygiene analyser that catches stale, orphaned, or undocumented CA policy exclusions. Every finding is deduplicated by content hash and persisted to DuckDB.

The anomaly half watches Entra sign-in events. A pooled Isolation Forest per tenant flags rows that look unlike the tenant's normal pattern, layered with four hybrid attack-pattern rules. SHAP attributions explain the top three contributing features on every flagged row. MLflow tracks every training run; a champion/challenger alias system gates promotion. Calibrated against synthetic fixtures at precision 0.245-0.275, recall 0.889-0.926 on injected attack scenarios. The detector targets SMB-tier tenants without Entra ID P2 licensing, where Microsoft's Identity Protection is unavailable.

A per-user IF topology and a per-user-anchored off-hours-admin rule ship as feature-flagged opt-ins (CSTACK_ML_TRAINING_TOPOLOGY=per_user, CSTACK_ML_OFF_HOURS_ADMIN_ENABLED=true). Sprint 3.5 added the infrastructure; Sprint 3.5b gated the activation pending real-tenant calibration in Sprint 7 because synthetic data could not demonstrate their precision lift.

Every finding (audit or anomaly) gets a four-section LLM narrative explaining why it fired, what it means, how to remediate, and when it might be a false positive. Narratives are content-addressed cached so two tenants with the same finding share a single generation. The provider layer abstracts Anthropic, OpenAI, and local Ollama behind one Protocol; tests register fakes via the same factory.

Status

V0.6.0-alpha.1 baseline (containerized stack, pre-live-tenant). See CHANGELOG.md for the per-sprint summary. Live tenant integration is Sprint 7, paused pending tenant access. Today the codebase ships:

  • 266 Python tests across 8 packages and 2 apps
  • 78 web tests across 28 files (Vitest + RTL, jsdom)
  • 19 HTTP endpoints (15 read, 4 action), OpenAPI 3.1 contract committed
  • 7 dashboard screens (Next.js 15 + Tailwind 4), tablet responsive at 768px
  • 20-example hand-curated golden set + rubric-based LLM-as-judge eval harness
  • $4.30 of real Anthropic API spend during Sprint 6 calibration

Everything runs against three synthetic fixture tenants. None of it has touched a production Microsoft 365 tenant yet.

Tools

The cstack toolkit currently ships one tool with more planned.

  • SignalGuard identity security: CA audit + per-tenant behavioural sign-in anomaly detection with explainable ML scoring and LLM-narrated findings. Complete (against fixtures).
  • Future: LicenseLens, Driftwatch, ChangeRadar, CompliancePulse. Planned for V1.

Architecture at a glance

                   +----------------------- cstack monorepo -----------------------+
                   |                                                              |
fixtures load-all  |   +-- packages/ ----------------------------------+          |
or live extract -->|   | schemas, storage, graph-client, fixtures      |          |
                   |   | audit-{core,coverage,rules,exclusions}        |          |
                   |   | ml-{features,mlops,anomaly}                   |          |
                   |   | llm-{provider,narrative,eval}                 |          |
                   |   +----------------------+------------------------+          |
                   |                          |                                   |
                   |                          v                                   |
                   |   +--- DuckDB ---+   +--- mlruns ---+                        |
                   |   | tenants,     |   | per-tenant   |                        |
                   |   | ca_policies, |   | IF + SHAP    |                        |
                   |   | findings,    |   | @champion /  |                        |
                   |   | signins,     |   | @challenger  |                        |
                   |   | anomaly_     |   +--------------+                        |
                   |   | scores,      |                                           |
                   |   | narrative_   |                                           |
                   |   | cache        |                                           |
                   |   +------+-------+                                           |
                   |          |                                                   |
                   |          v                                                   |
                   |   +------+-----------------+   +----- Anthropic / OpenAI /   |
                   |   | apps/signalguard-api   |--+      Ollama (LLM provider)   |
                   |   | (FastAPI, X-API-Key)   |                                 |
                   |   +------+-----------------+                                 |
                   |          |                                                   |
                   |          v                                                   |
                   |   +------+-----------------+                                 |
                   |   | apps/signalguard-web   |                                 |
                   |   | (Next.js 15 + Tailwind)|                                 |
                   |   +------------------------+                                 |
                   |                                                              |
                   +--------------------------------------------------------------+

For the deeper data flow including LLM cache lookup and bias mitigation, see docs/ARCHITECTURE.md.

Screenshots

Eight screens captured against fixture tenants. Full reference with captions: docs/SCREENSHOTS.md.

How it's built

  • Python 3.12 via uv workspaces. 8 internal packages, 2 apps (CLI + API).
  • Next.js 15 + Tailwind 4 for the dashboard. Server Components first, TanStack Query for client interactions, typed @hey-api client generated from OpenAPI 3.1.
  • FastAPI + DuckDB for the backend. Per-request DuckDB connections, RFC 7807 problem-details on every error, correlation ids on every request and log line.
  • scikit-learn + SHAP + MLflow for the anomaly detector. Pipeline of StandardScaler + IsolationForest, SHAP only on flagged rows for runtime budget, MLflow registry aliases for promotion gating.
  • Provider-agnostic LLM layer with adapters for Anthropic Claude, OpenAI, and Ollama behind a single Protocol. Content-addressed prompt cache, budget caps, pointwise + pairwise eval harness with position-swap bias mitigation.

The full stack rationale lives in docs/ARCHITECTURE.md.

Running locally

The fastest path is the Docker stack. Prerequisites: Docker Desktop (or any Docker Compose v2 runtime).

git clone <this repo>
cd cstack
docker compose -f infra/docker/compose.yaml up --build

First run takes 60 to 90 seconds (fixtures load, audit runs on all three tenants, anomaly model trains on tenant-a). Subsequent runs come up in seconds.

Visit http://localhost:3000 and enter dev-secret when the API key gate prompts. See infra/docker/README.md for troubleshooting, the fixtures-only override, and how to enable the LLM narrative pass.

Running from source

For local hacking on the Python or TypeScript code without rebuilding the container on every change. Prerequisites: Python 3.12, Node 22 LTS, uv, pnpm.

uv sync
pnpm install

uv run cstack fixtures load-all
uv run cstack audit all --tenant tenant-b --no-narratives

echo 'SIGNALGUARD_API_DEV_API_KEY=dev-secret' >> .env
uv run signalguard-api --port 8000

# In a second shell.
pnpm --filter signalguard-web dev
# Visit http://localhost:3000/dashboard, enter "dev-secret".

Optional: drop --no-narratives and add ANTHROPIC_API_KEY=sk-ant-... to .env to generate LLM narratives during the audit. Default budget is $1 per run.

Optional: ASN feature extraction uses a MaxMind GeoLite2-ASN database when CSTACK_GEOIP_ASN_DB points at a valid .mmdb file. The Docker stack handles this automatically via the geoipupdate service; for local-from-source dev, either download the database manually (free account at https://www.maxmind.com/) and point the env var at it, or skip it — the ASN lookup falls back to a deterministic prefix table that the synthesizer's fixture IPs already use.

To run the anomaly detector end-to-end against fixtures:

uv run cstack signins extract --tenant tenant-a --scenario baseline
uv run cstack anomaly train --tenant tenant-a --lookback-days 365
uv run cstack anomaly promote --tenant tenant-a --force
uv run cstack signins extract --tenant tenant-a --scenario replay-attacks
uv run cstack anomaly score --tenant tenant-a
uv run cstack anomaly alerts --tenant tenant-a --n 20

The CLI is a thin layer over the same packages the API uses; see apps/cstack-cli/ for the full subcommand catalogue.

Documentation

Start at docs/INDEX.md. The major docs:

Engineering decisions worth calling out

  • Per-tenant pooled IF with planned cold-start fallback for sub-P2 tenants. Per-user models would be more sensitive but cold-start a new user every join. Sprint 3.5 will layer per-user models with a pooled fallback for users below the sample threshold.
  • Content-addressed prompt cache for cross-tenant narrative reuse. Cache key is SHA-256(rule_id, canonicalised(evidence), prompt_version, model), excluding tenant id; identical findings across tenants share one generation.
  • Custom LLM provider abstraction (not LiteLLM). One Protocol, three adapters, ~250 lines. LiteLLM ships its own opinions about retries and observability that conflict with ours; owning the abstraction means we can ship adapter-level fixes immediately (Claude 4.7 deprecating temperature mid-sprint was a real test of this).
  • Pairwise LLM-as-judge eval harness with bias mitigation. Different judge model from generator (sonnet judges opus output), position-swap on every pairwise comparison with the result downgraded to tie when the judge flips on swap, low-temperature judging. Pointwise scoring alone misled us in Sprint 6 calibration; the pairwise check caught it.
  • OpenAPI-first contract. The web client is generated from apps/signalguard-api/openapi.json and CI fails on drift. The web app cannot ship a request shape the backend does not support.

License

MIT.

Contributing

cstack is a personal portfolio project that welcomes external contribution. See docs/CONTRIBUTING.md for the local dev workflow, conventional-commit rules, and the project's hard rules on code style and tone.

Security

Security issues go to leunis@vanlabs.dev per SECURITY.md, not GitHub issues.

About

MSP toolkit for Microsoft 365 operations

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors