Skip to content

docs(audit): comprehensive codebase audit report (fixes #250)#251

Merged
xlabtg merged 4 commits intoxlabtg:mainfrom
konard:issue-250-294bf2f3de08
Apr 22, 2026
Merged

docs(audit): comprehensive codebase audit report (fixes #250)#251
xlabtg merged 4 commits intoxlabtg:mainfrom
konard:issue-250-294bf2f3de08

Conversation

@konard
Copy link
Copy Markdown

@konard konard commented Apr 22, 2026

Summary

Delivers the audit requested in #250: a thorough, read-only sweep of the
repository looking for real bugs, logical inconsistencies, and conflicts
— with concrete, evidence-backed remediations.

The full report lives at AUDIT_REPORT.md.

Per owner request on this PR, every finding also ships as a
ready-to-file GitHub Issue template under
improvements/work/ (23 templates + index).

Scope covered

  • src/autonomous/** — loop, manager, policy engine, goal parser, integration
  • src/memory/agent/autonomous-tasks.ts + migration 1.20.0.sql
  • src/webui/** — server, setup-server, middleware (auth, CSRF), all routes
  • src/index.ts start/stop lifecycle
  • Cross-cutting: admin_ids handling, TON tool registration, graceful shutdown

Headline findings (full list + remediations in the report)

🔴 Critical (4)

  • C1DEFAULT_POLICY_CONFIG.restrictedTools uses names ("wallet:send", "contract:deploy", "system:exec") that don't match any real tool. The real tools are ton_send, jetton_send, exec. Result: autonomous tasks can spend TON up to budget without any escalation gate. src/autonomous/policy-engine.ts:34.
  • C2AutonomousTaskManager is a local variable inside startAgent(); stopAgent() never calls stopAll(). Autonomous loops outlive shutdown and race the DB close. src/index.ts:333-414 vs src/index.ts:1487-1583.
  • C3 — Pause + resume constructs a new AutonomousLoop (and PolicyEngine), wiping recentActions, rate-limit timestamps, and the uncertain counter. The toolCallsPerHour cap, loop-detection, and uncertainty escalation can all be bypassed via pause/resume. src/autonomous/manager.ts:84-126, src/autonomous/loop.ts:68-80.
  • C4 — Full WebUI auth token printed to stdout: log.info(\URL: ${url}/auth/exchange?token=${this.authToken}`). A 7-day session token ends up in any aggregated log. src/webui/server.ts:503`.

🟠 High (7)

  • JSON.parse in rowToTask/Checkpoint/LogEntry has no try/catch — one bad row DoSes the listing (src/memory/agent/autonomous-tasks.ts:119-163).
  • Production notify() only logs; escalations never reach the user via Telegram or WebUI (src/autonomous/integration.ts:110-115).
  • deps_planWithTimeout leaks a setTimeout per iteration (src/autonomous/loop.ts:359-370).
  • Pause races with in-flight step's .finally — paused tasks can land in failed (src/autonomous/manager.ts:84-118).
  • task_checkpoints grows unbounded for active tasks (src/autonomous/loop.ts:306-320).
  • admin_ids[0] ?? 0 silently runs autonomous tools as user 0 when admin_ids is empty (src/autonomous/integration.ts:91).
  • Setup wizard (POST /api/setup/launch, port 7777) writes the auth token to config.yaml unauthenticated and unrate-limited (src/webui/setup-server.ts:132-161).

🟡 Medium (8) / 🟢 Low (4) — see the report; includes missing global iteration cap, unbounded rate-limit timestamp arrays, aggressive default TON budget, empty-reason escalations, paused-forever tasks, inconsistent path-traversal checks, management API default host 0.0.0.0, and misc.

Ready-to-file issue templates

Each of the 23 findings now has its own ready-made template under
improvements/work/, with the structure
requested in #250:

---
title, labels, milestone, severity, category, effort, priority
---
## Источник
## Описание
## Местоположение
## Влияние
## Предложенное исправление
## Критерии приёмки
## Оценка

Index: improvements/work/README.md.

Tier Count Files
🔴 Critical (P1) 4 audit-c1..c4-*.md
🟠 High (P1/P2) 7 audit-h1..h7-*.md
🟡 Medium (P2/P3) 8 audit-m1..m8-*.md
🟢 Low (P3) 4 audit-l1..l4-*.md

Each file is self-contained — gh issue create --body-file (after
stripping the front-matter) will open the corresponding issue. See
improvements/work/README.md for a ready-to-run shell loop.

Risk assessment

🟠 Conditional Go. Safe in read-only configurations today; autonomous
mode touching real TON balances should wait for the P1 fixes
(C1, C2, C3, C4, H1, H2, H7), which are all small changes.

Methodology

Three parallel focused sub-audits (WebUI/security, autonomous core,
CLI/lifecycle/config). Each finding was spot-checked directly by
reading the referenced lines before inclusion. No runtime execution —
this is static analysis against commit d6b09c0 on branch
issue-250-294bf2f3de08.

Follow-up

Each P1 finding should land as its own PR with a regression test so
the list stays verifiable. The Validation plan section at the bottom
of the report enumerates the concrete checks (unit tests, integration
tests, log grep). The improvements/work/ templates are the ready
inputs for those issues.

Test plan

  • Audit report renders correctly as markdown
  • improvements/work/ contains 23 templates + README, one per finding
  • Each template has the requested front-matter and body structure
  • Maintainers review and triage findings
  • Open follow-up issues / PRs for the P1 items (C1–C4, H1/H2/H7)

Fixes #250

konard added 2 commits April 22, 2026 18:44
Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: #250
Delivers the audit asked for in issue #250: a read-only sweep of the
autonomous core, policy engine, WebUI surface, lifecycle, and memory
layer, ranked by severity with concrete remediations.

Four critical findings, most security-relevant:
- Default `restrictedTools` list never matches real tool names — TON
  sends bypass escalation.
- `AutonomousTaskManager` is a local variable; never stopped on agent
  shutdown.
- Pause/resume resets rate-limits and loop-detection (policy bypass).
- Full WebUI auth token printed to stdout at startup.

See AUDIT_REPORT.md for full list, evidence, and action plan.
@konard konard changed the title [WIP] [AUDIT] Комплексный аудит кодовой базы через Claude — поиск багов, логических несоответствий и конфликтов docs(audit): comprehensive codebase audit report (fixes #250) Apr 22, 2026
@konard konard marked this pull request as ready for review April 22, 2026 18:52
@konard
Copy link
Copy Markdown
Author

konard commented Apr 22, 2026

Solution summary

Audit delivered. PR #251 is now open (non-draft, mergeable) with AUDIT_REPORT.md at the repo root and a full-context description: #251

Summary of what was delivered

The report catalogs 23 findings across critical/high/medium/low tiers, each with file:line evidence, impact, and remediation. The four critical items are the ones to flag:

  • C1DEFAULT_POLICY_CONFIG.restrictedTools uses "wallet:send", "contract:deploy", "system:exec" but the actual tool names are ton_send, jetton_send, exec — so the escalation gate for TON sends is effectively disabled (src/autonomous/policy-engine.ts:34).
  • C2autonomousManager is a local variable inside startAgent(); stopAgent() never calls stopAll(), so loops race DB shutdown (src/index.ts:333-414 vs :1487-1583).
  • C3 — Pause/resume creates a fresh AutonomousLoop + PolicyEngine, wiping rate-limits, recentActions, and uncertainty state — a pause/resume bypasses those policies (src/autonomous/manager.ts:84-126).
  • C4 — Full auth token printed in plaintext to stdout at WebUI startup (src/webui/server.ts:503).

Recommendation in the report: 🟠 Conditional Go — safe read-only, but land the P1 fixes before enabling autonomous mode against a real TON wallet.


This summary was automatically extracted from the AI working session output.

@konard
Copy link
Copy Markdown
Author

konard commented Apr 22, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $3.846644

📊 Context and tokens usage:

Claude Opus 4.7:

  • 133.0K / 1M (13%) input tokens, 23.4K / 128K (18%) output tokens

Total: (162.1K + 3.1M cached) input tokens, 23.4K output tokens, $3.163384 cost

Claude Haiku 4.5:

  • 20.5K / 64K (32%) output tokens

Total: (194.3K + 3.4M cached) input tokens, 20.5K output tokens, $0.683261 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Main model: Claude Opus 4.7 (claude-opus-4-7)
  • Additional models:
    • Claude Haiku 4.5 (claude-haiku-4-5-20251001)

📎 Log file uploaded as Gist (3170KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Copy Markdown
Author

konard commented Apr 22, 2026

🤖 AI Work Session Started

Starting automated work session at 2026-04-22T19:01:30.137Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

@konard konard marked this pull request as draft April 22, 2026 19:01
@konard konard marked this pull request as ready for review April 22, 2026 19:02
@konard
Copy link
Copy Markdown
Author

konard commented Apr 22, 2026

Solution summary

PR #251 is now back to ready-for-review, mergeable, and clean.

Summary of this session:

Verified the audit deliverable is complete and accurate. The existing AUDIT_REPORT.md (commit 86eea7f) already satisfies issue #250 with:

  • 23 findings across Critical/High/Medium/Low with file:line evidence and remediations
  • Spot-checked all 4 critical findings against the actual source (policy-engine.ts:34, index.ts:333/1487, manager.ts:85, webui/server.ts:503) — all verified accurate
  • Confirmed real tool names ton_send/jetton_send/exec exist at src/agent/tools/* while restrictedTools uses fictitious "wallet:send"/"contract:deploy"/"system:exec" (C1 is real)

The PR had been auto-converted to draft when this session started; it is now marked ready-for-review again.

PR: #251


This summary was automatically extracted from the AI working session output.

@konard
Copy link
Copy Markdown
Author

konard commented Apr 22, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $0.564772

📊 Context and tokens usage:

  • 56.2K / 1M (6%) input tokens, 4.0K / 128K (3%) output tokens

Total: (41.9K + 407.0K cached) input tokens, 4.0K output tokens, $0.564772 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.7 (claude-opus-4-7)

📎 Log file uploaded as Gist (511KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Copy Markdown
Author

konard commented Apr 22, 2026

✅ Ready to merge

This pull request is now ready to be merged:

  • CI workflows exist but were not triggered for this commit
  • No merge conflicts
  • No pending changes

Monitored by hive-mind with --auto-restart-until-mergeable flag

@xlabtg
Copy link
Copy Markdown
Owner

xlabtg commented Apr 22, 2026

Okay, create a ready-made document for each problem to create an issue based on it and put it in the improvements folder. Create a work folder in it and put all these ready-made templates there.

@konard konard marked this pull request as draft April 22, 2026 19:29
@konard
Copy link
Copy Markdown
Author

konard commented Apr 22, 2026

🤖 AI Work Session Started

Starting automated work session at 2026-04-22T19:29:10.604Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait for the session to finish, and provide your feedback.

Per owner request on PR #251, add one self-contained GitHub Issue
template per audit finding from AUDIT_REPORT.md under
improvements/work/. 23 templates total (4 critical, 7 high, 8 medium,
4 low) plus a README index.

Each template carries YAML front-matter (title, labels, milestone,
severity, category, effort, priority) and a body with source,
description, location, impact, proposed remediation, and acceptance
criteria — matching the format requested in issue #250.

Refs: #250
@konard konard marked this pull request as ready for review April 22, 2026 19:36
@konard
Copy link
Copy Markdown
Author

konard commented Apr 22, 2026

Done — 23 ready-to-file issue templates now live under improvements/work/, one per finding (C1–C4, H1–H7, M1–M8, L1–L4) plus a README index.

Each file follows the structure requested in #250:

  • YAML front-matter with title, labels, milestone, severity, category, effort, priority
  • Body sections: Источник / Описание / Местоположение / Влияние / Предложенное исправление / Критерии приёмки / Оценка

The README also includes a ready-to-run gh issue create loop for bulk-filing.

Commit: 2491a56. PR is back to ready-for-review and mergeable.

@konard
Copy link
Copy Markdown
Author

konard commented Apr 22, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $2.257797

📊 Context and tokens usage:

  • 102.0K / 1M (10%) input tokens, 33.9K / 128K (26%) output tokens

Total: (106.4K + 1.5M cached) input tokens, 33.9K output tokens, $2.257797 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.7 (claude-opus-4-7)

📎 Log file uploaded as Gist (973KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@xlabtg xlabtg merged commit 09e5576 into xlabtg:main Apr 22, 2026
konard added a commit to konard/xlabtg-teleton-agent that referenced this pull request Apr 23, 2026
Mirror the precedent set by AUDIT_REPORT.md (issue xlabtg#250 / PR xlabtg#251) which
lives at the repo root. The 24 templates in improvements/work2/ already
reference ../../FULL_AUDIT_REPORT.md, and the PR description (and
README) link to ./FULL_AUDIT_REPORT.md — both expected the file at the
repo root.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants