Skip to content

Fix: correct terminal command handling logic#124

Merged
asdek merged 1 commit intovxcontrol:feature/project_improvementsfrom
Priyanka-2725:fix/terminal-bug
Feb 22, 2026
Merged

Fix: correct terminal command handling logic#124
asdek merged 1 commit intovxcontrol:feature/project_improvementsfrom
Priyanka-2725:fix/terminal-bug

Conversation

@Priyanka-2725
Copy link

@Priyanka-2725 Priyanka-2725 commented Feb 22, 2026

Fix: Goroutine and Resource Leak in Terminal Exec Timeout Handling

Description of the Change

Problem

getExecResult() in backend/pkg/tools/terminal.go contained a goroutine leak triggered every time a command execution timed out.

When ctx.Done() fired, the function returned early but left a background goroutine permanently blocked on io.Copy(&dst, resp.Reader). The goroutine had no way to be interrupted and was never joined, so it persisted in memory until process termination.

Two additional bugs compounded the issue:

  • Data race — the goroutine captured and wrote to the outer err variable while the function's main goroutine could be reading it, violating the Go memory model.
  • Unbounded accumulation — in long-running PentAGI sessions with many tool invocations, each timeout added one more orphaned goroutine and one unclosed Docker reader. This manifests as gradual memory growth, rising runtime.NumGoroutine() counts, and eventual Docker connection pool exhaustion.
// BEFORE — goroutine leaks on timeout
done := make(chan struct{})
go func() {
    _, err = io.Copy(&dst, resp.Reader) // data race on outer err
    close(done)
}()

select {
case <-done:
case <-ctx.Done():
    // returns here — goroutine still blocked on resp.Reader forever
}

Solution

Replaced the done channel pattern with a buffered errChan and explicit synchronization on timeout:

  • Buffered channel (size 1) ensures the goroutine's send never blocks regardless of which select branch wins.
  • Isolated copyErr variable inside the goroutine eliminates the data race on the outer err.
  • resp.Close() on timeout unblocks the underlying net.Conn read inside io.Copy immediately.
  • <-errChan after close synchronously waits for the goroutine to fully exit before the function returns — guaranteed on every code path.
// AFTER — goroutine is guaranteed to exit before function returns
errChan := make(chan error, 1)

go func() {
    _, copyErr := io.Copy(&dst, resp.Reader) // isolated variable, no race
    errChan <- copyErr
}()

select {
case err := <-errChan:
    // normal completion
case <-ctx.Done():
    resp.Close() // unblocks the Read() inside io.Copy
    <-errChan    // synchronously waits for goroutine to exit
    result := fmt.Sprintf("temporary output: %s", dst.String())
    return "", fmt.Errorf("timeout value is too low, use greater value if you need so: %w: %s", ctx.Err(), result)
}

Closes #


Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • 🔧 Configuration change
  • 🧪 Test update
  • 🛡️ Security update

Areas Affected

  • Core Services (Frontend UI/Backend API)
  • AI Agents (Researcher/Developer/Executor)
  • Security Tools Integration
  • Memory System (Vector Store/Knowledge Base)
  • Monitoring Stack (Grafana/OpenTelemetry)
  • Analytics Platform (Langfuse)
  • External Integrations (LLM/Search APIs)
  • Documentation
  • Infrastructure/DevOps

Testing and Verification

Test Configuration

PentAGI Version: latest (master)
Docker Version: N/A (unit tests use net.Pipe mock)
Host OS: Windows 11 + WSL2 Ubuntu (Go 1.24.1)
LLM Provider: N/A
Enabled Features: N/A

Test Steps

  1. Five targeted unit tests were written for getExecResult() using net.Pipe() as a real in-process connection, so resp.Close() genuinely unblocks the Read() call inside io.Copy.
  2. Tests were run in WSL2 Ubuntu via go test ./pkg/tools/ -run TestGetExecResult -v -count=1.
  3. Goroutine counts (runtime.NumGoroutine()) were measured before and after timeout scenarios to assert no leak.

Test Results

=== RUN   TestGetExecResult_Success
--- PASS: TestGetExecResult_Success (0.05s)
=== RUN   TestGetExecResult_Timeout_NoGoroutineLeak
--- PASS: TestGetExecResult_Timeout_NoGoroutineLeak (0.25s)
=== RUN   TestGetExecResult_Timeout_PartialOutput
--- PASS: TestGetExecResult_Timeout_PartialOutput (0.30s)
=== RUN   TestGetExecResult_AttachError
--- PASS: TestGetExecResult_AttachError (0.00s)
=== RUN   TestGetExecResult_MultipleTimeouts
--- PASS: TestGetExecResult_MultipleTimeouts (0.86s)
PASS
ok      pentagi/pkg/tools       1.581s
Test What it validates
TestGetExecResult_Success Normal path returns full output; goroutine count does not grow
TestGetExecResult_Timeout_NoGoroutineLeak runtime.NumGoroutine() returns to baseline after a single timeout — core regression test
TestGetExecResult_Timeout_PartialOutput Partial data written before deadline is included in the error message
TestGetExecResult_AttachError ContainerExecAttach failure propagates with correct error wrapping
TestGetExecResult_MultipleTimeouts 10 consecutive timeouts leave goroutine count flat at baseline ± 2

Security Considerations

No security model changes. The fix eliminates unclosed Docker readers accumulating in memory, which reduces the attack surface for resource exhaustion under sustained load.


Performance Impact

Positive. The fix prevents goroutine and memory accumulation over time. In the timeout path, the only addition is one resp.Close() call (which was already deferred) and a single channel receive that resolves as soon as the goroutine unblocks — negligible overhead.


Documentation Updates

  • README.md updates
  • API documentation updates
  • Configuration documentation updates
  • GraphQL schema updates
  • Other: Inline code comments added to explain the synchronization contract

Deployment Notes

No environment variable, configuration, schema, or migration changes required. Drop-in fix — fully backward compatible.


Checklist

Code Quality

  • My code follows the project's coding standards
  • I have added/updated necessary documentation
  • I have added tests to cover my changes
  • All new and existing tests pass
  • I have run go fmt and go vet (for Go code)
  • I have run npm run lint (for TypeScript/JavaScript code)

Security

  • I have considered security implications
  • Changes maintain or improve the security model
  • Sensitive information has been properly handled

Compatibility

  • Changes are backward compatible
  • Breaking changes are clearly marked and documented
  • Dependencies are properly updated

Documentation

  • Documentation is clear and complete
  • Comments are added for non-obvious code
  • API changes are documented

Additional Notes

The root pattern — spinning a goroutine around io.Copy without a guaranteed interruption mechanism — is a common mistake when combining context cancellation with blocking I/O in Go. The fix follows the established correct pattern: close the underlying connection to unblock the read, then synchronously wait on a buffered channel to confirm the goroutine has exited before returning.

The defer resp.Close() that already existed in the function is retained as a safe no-op double-close guard for the success path — HijackedResponse.Close() wraps net.Conn.Close() which is safe to call multiple times.

@asdek asdek changed the base branch from master to feature/project_improvements February 22, 2026 19:08
@asdek asdek merged commit 68c8e77 into vxcontrol:feature/project_improvements Feb 22, 2026
@asdek
Copy link
Contributor

asdek commented Feb 22, 2026

thanks for this PR! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants