Multi-tenant Microsoft 365 operations toolkit for MSP engineers. Self-hosted, ML-augmented, open source.
cstack treats a fleet of M365 tenants the way an SRE treats a fleet of services: audit them with rules, score their traffic with per-tenant ML models, and surface findings with engineer-grade narratives instead of vendor portal screenshots.
The first tool in the toolkit, SignalGuard, ships two halves.
The CA audit half evaluates every tenant against a 15-rule catalogue (block legacy auth, MFA on admins, risk-based sign-in, break-glass exclusions, and others), plus a coverage-matrix layer that flags weak (user-segment x app-segment) cells, plus an exclusion-hygiene analyser that catches stale, orphaned, or undocumented CA policy exclusions. Every finding is deduplicated by content hash and persisted to DuckDB.
The anomaly half watches Entra sign-in events. A pooled Isolation Forest per tenant flags rows that look unlike the tenant's normal pattern, layered with four hybrid attack-pattern rules. SHAP attributions explain the top three contributing features on every flagged row. MLflow tracks every training run; a champion/challenger alias system gates promotion. Calibrated against synthetic fixtures at precision 0.245-0.275, recall 0.889-0.926 on injected attack scenarios. The detector targets SMB-tier tenants without Entra ID P2 licensing, where Microsoft's Identity Protection is unavailable.
A per-user IF topology and a per-user-anchored off-hours-admin rule
ship as feature-flagged opt-ins (CSTACK_ML_TRAINING_TOPOLOGY=per_user,
CSTACK_ML_OFF_HOURS_ADMIN_ENABLED=true). Sprint 3.5 added the
infrastructure; Sprint 3.5b gated the activation pending real-tenant
calibration in Sprint 7 because synthetic data could not demonstrate
their precision lift.
Every finding (audit or anomaly) gets a four-section LLM narrative explaining why it fired, what it means, how to remediate, and when it might be a false positive. Narratives are content-addressed cached so two tenants with the same finding share a single generation. The provider layer abstracts Anthropic, OpenAI, and local Ollama behind one Protocol; tests register fakes via the same factory.
V0.6.0-alpha.1 baseline (containerized stack, pre-live-tenant). See CHANGELOG.md for the per-sprint summary. Live tenant integration is Sprint 7, paused pending tenant access. Today the codebase ships:
- 266 Python tests across 8 packages and 2 apps
- 78 web tests across 28 files (Vitest + RTL, jsdom)
- 19 HTTP endpoints (15 read, 4 action), OpenAPI 3.1 contract committed
- 7 dashboard screens (Next.js 15 + Tailwind 4), tablet responsive at 768px
- 20-example hand-curated golden set + rubric-based LLM-as-judge eval harness
- $4.30 of real Anthropic API spend during Sprint 6 calibration
Everything runs against three synthetic fixture tenants. None of it has touched a production Microsoft 365 tenant yet.
The cstack toolkit currently ships one tool with more planned.
- SignalGuard identity security: CA audit + per-tenant behavioural sign-in anomaly detection with explainable ML scoring and LLM-narrated findings. Complete (against fixtures).
- Future: LicenseLens, Driftwatch, ChangeRadar, CompliancePulse. Planned for V1.
+----------------------- cstack monorepo -----------------------+
| |
fixtures load-all | +-- packages/ ----------------------------------+ |
or live extract -->| | schemas, storage, graph-client, fixtures | |
| | audit-{core,coverage,rules,exclusions} | |
| | ml-{features,mlops,anomaly} | |
| | llm-{provider,narrative,eval} | |
| +----------------------+------------------------+ |
| | |
| v |
| +--- DuckDB ---+ +--- mlruns ---+ |
| | tenants, | | per-tenant | |
| | ca_policies, | | IF + SHAP | |
| | findings, | | @champion / | |
| | signins, | | @challenger | |
| | anomaly_ | +--------------+ |
| | scores, | |
| | narrative_ | |
| | cache | |
| +------+-------+ |
| | |
| v |
| +------+-----------------+ +----- Anthropic / OpenAI / |
| | apps/signalguard-api |--+ Ollama (LLM provider) |
| | (FastAPI, X-API-Key) | |
| +------+-----------------+ |
| | |
| v |
| +------+-----------------+ |
| | apps/signalguard-web | |
| | (Next.js 15 + Tailwind)| |
| +------------------------+ |
| |
+--------------------------------------------------------------+
For the deeper data flow including LLM cache lookup and bias mitigation, see docs/ARCHITECTURE.md.
Eight screens captured against fixture tenants. Full reference with captions: docs/SCREENSHOTS.md.
- Python 3.12 via uv workspaces. 8 internal packages, 2 apps (CLI + API).
- Next.js 15 + Tailwind 4 for the dashboard. Server Components first, TanStack
Query for client interactions, typed
@hey-apiclient generated from OpenAPI 3.1. - FastAPI + DuckDB for the backend. Per-request DuckDB connections, RFC 7807 problem-details on every error, correlation ids on every request and log line.
- scikit-learn + SHAP + MLflow for the anomaly detector. Pipeline of StandardScaler + IsolationForest, SHAP only on flagged rows for runtime budget, MLflow registry aliases for promotion gating.
- Provider-agnostic LLM layer with adapters for Anthropic Claude, OpenAI, and Ollama behind a single Protocol. Content-addressed prompt cache, budget caps, pointwise + pairwise eval harness with position-swap bias mitigation.
The full stack rationale lives in docs/ARCHITECTURE.md.
The fastest path is the Docker stack. Prerequisites: Docker Desktop (or any Docker Compose v2 runtime).
git clone <this repo>
cd cstack
docker compose -f infra/docker/compose.yaml up --buildFirst run takes 60 to 90 seconds (fixtures load, audit runs on all three tenants, anomaly model trains on tenant-a). Subsequent runs come up in seconds.
Visit http://localhost:3000 and enter dev-secret when the API key gate
prompts. See infra/docker/README.md for
troubleshooting, the fixtures-only override, and how to enable the LLM
narrative pass.
For local hacking on the Python or TypeScript code without rebuilding the container on every change. Prerequisites: Python 3.12, Node 22 LTS, uv, pnpm.
uv sync
pnpm install
uv run cstack fixtures load-all
uv run cstack audit all --tenant tenant-b --no-narratives
echo 'SIGNALGUARD_API_DEV_API_KEY=dev-secret' >> .env
uv run signalguard-api --port 8000
# In a second shell.
pnpm --filter signalguard-web dev
# Visit http://localhost:3000/dashboard, enter "dev-secret".Optional: drop --no-narratives and add ANTHROPIC_API_KEY=sk-ant-... to .env
to generate LLM narratives during the audit. Default budget is $1 per run.
Optional: ASN feature extraction uses a MaxMind GeoLite2-ASN database
when CSTACK_GEOIP_ASN_DB points at a valid .mmdb file. The Docker
stack handles this automatically via the geoipupdate service; for
local-from-source dev, either download the database manually (free
account at https://www.maxmind.com/) and point the env var at it,
or skip it — the ASN lookup falls back to a deterministic prefix table
that the synthesizer's fixture IPs already use.
To run the anomaly detector end-to-end against fixtures:
uv run cstack signins extract --tenant tenant-a --scenario baseline
uv run cstack anomaly train --tenant tenant-a --lookback-days 365
uv run cstack anomaly promote --tenant tenant-a --force
uv run cstack signins extract --tenant tenant-a --scenario replay-attacks
uv run cstack anomaly score --tenant tenant-a
uv run cstack anomaly alerts --tenant tenant-a --n 20The CLI is a thin layer over the same packages the API uses; see apps/cstack-cli/ for the full subcommand catalogue.
Start at docs/INDEX.md. The major docs:
- ARCHITECTURE.md system design, repo layout, data flow
- API.md REST API, auth model, error format, OpenAPI pointer
- MLOPS.md anomaly detection lifecycle, calibration results
- LLM_OPS.md narrative generation and eval harness
- DESIGN_TOKENS.md visual decisions, single source of truth
- DESIGN_SYSTEM.md component patterns, screen blueprints
- RULES.md CA audit rule catalogue
- SCREENSHOTS.md UI reference
- CONTRIBUTING.md local dev, conventions
- SPRINT_NOTES.md per-sprint calibration outcomes
- BACKLOG.md parked work
- Per-tenant pooled IF with planned cold-start fallback for sub-P2 tenants. Per-user models would be more sensitive but cold-start a new user every join. Sprint 3.5 will layer per-user models with a pooled fallback for users below the sample threshold.
- Content-addressed prompt cache for cross-tenant narrative reuse. Cache
key is
SHA-256(rule_id, canonicalised(evidence), prompt_version, model), excluding tenant id; identical findings across tenants share one generation. - Custom LLM provider abstraction (not LiteLLM). One Protocol, three
adapters, ~250 lines. LiteLLM ships its own opinions about retries and
observability that conflict with ours; owning the abstraction means we can
ship adapter-level fixes immediately (Claude 4.7 deprecating
temperaturemid-sprint was a real test of this). - Pairwise LLM-as-judge eval harness with bias mitigation. Different judge model from generator (sonnet judges opus output), position-swap on every pairwise comparison with the result downgraded to tie when the judge flips on swap, low-temperature judging. Pointwise scoring alone misled us in Sprint 6 calibration; the pairwise check caught it.
- OpenAPI-first contract. The web client is generated from
apps/signalguard-api/openapi.jsonand CI fails on drift. The web app cannot ship a request shape the backend does not support.
MIT.
cstack is a personal portfolio project that welcomes external contribution. See docs/CONTRIBUTING.md for the local dev workflow, conventional-commit rules, and the project's hard rules on code style and tone.
Security issues go to leunis@vanlabs.dev per SECURITY.md, not GitHub issues.