Skip to content

Native memory leak when not using async with GitHub #285

@galshi

Description

@galshi

Context

Without async with GitHub() as github:, every async API call creates and closes a new httpx.AsyncClient. Each construction loads the CA bundle into native OpenSSL memory. In long-running processes this memory accumulates. It did stop growing at some point in production for unknown reasons; the only observed correlation was that the change appeared after making POST-style API calls (for example posting issue comments or pull request reviews), but there are no reliable reproduction steps beyond that. Still worth fixing because the behavior is undocumented and the memory growth is real and observable.

Less noticeable with GET requests because hishel's ETag caching short-circuits some of the client lifecycle. Most visible with GraphQL and POST requests which hishel cannot cache.

Not visible in tracemalloc or objgraph. Only shows up in memray --native. That is how the leak was found.

Root Cause

githubkit/core.py lines 289–298:

@asynccontextmanager
async def get_async_client(self) -> AsyncGenerator[httpx.AsyncClient, None]:
    if client := self.__async_client.get():
        print("Creating new async client")  # Added this
        yield client
    else:
        print("Reusing existing async client")  # Added this
        client = self._create_async_client()  # new SSL context every call
        try:
            yield client
        finally:
            await client.aclose()

self.__async_client is a ContextVar set only inside __aenter__. Without the context manager, every call hits the else branch. _create_async_client() calls ssl.create_default_context(cafile=certifi.where()) which loads the CA bundle into native heap.

This was verified by adding prints to both branches (see comments in the code sample). In reproduction the leak run repeatedly printed Creating new async client, while the async with run printed Reusing existing async client once and then reused the client.

Memray Trace

The native allocations trace through OpenSSL certificate parsing during the TLS handshake on each new client:

_ssl_MemoryBIO_write_impl  (_ssl.c)
SSL_read_ex
X509_new_ex
ASN1_item_d2i_ex
ASN1_item_new_ex
CRYPTO_malloc / CRYPTO_zalloc   <-- repeated allocations
EVP_CipherInit_ex
OSSL_DECODER_from_bio
d2i_EC_PUBKEY
EVP_PKEY_new
EC_GROUP_dup / EC_GROUP_copy
CRYPTO_zalloc

Each new client parses certificates and allocates EC key and ASN.1 structures in native heap that are not tracked by Python's memory management.

Reproduction

import asyncio
import resource
from pathlib import Path
from githubkit import GitHub

def rss_mb() -> float:
    pages = int(Path("/proc/self/statm").read_text().split()[1])
    return pages * resource.getpagesize() / 1024 / 1024

TOKEN = "<token>"
TURN = 20
async def memory_leak():
    github = GitHub(TOKEN)
    for i in range(TURN):
        await github.async_graphql("{ viewer { login } }")
        print(f"[leak]    turn={i} rss={rss_mb():.1f}MB")
        await asyncio.sleep(1)

async def no_leak():
    async with GitHub(TOKEN) as github:
        for i in range(TURN:
            await github.async_graphql("{ viewer { login } }")
            print(f"[no-leak] turn={i} rss={rss_mb():.1f}MB")
            await asyncio.sleep(1)

asyncio.run(memory_leak())
asyncio.run(no_leak())

Output

[leak]    turn=0 rss=72.4MB
[leak]    turn=1 rss=72.9MB
[leak]    turn=2 rss=73.8MB
[leak]    turn=3 rss=74.6MB
[leak]    turn=4 rss=75.4MB
[leak]    turn=5 rss=76.3MB
[leak]    turn=6 rss=77.1MB
[leak]    turn=7 rss=78.0MB
[leak]    turn=8 rss=78.9MB
[leak]    turn=9 rss=79.8MB
[leak]    turn=10 rss=80.6MB
[leak]    turn=11 rss=81.5MB
[leak]    turn=12 rss=82.3MB
[leak]    turn=13 rss=83.1MB
[leak]    turn=14 rss=84.0MB
[leak]    turn=15 rss=84.8MB
[leak]    turn=16 rss=85.6MB
[leak]    turn=17 rss=86.6MB
[leak]    turn=18 rss=87.4MB
[leak]    turn=19 rss=88.3MB
[no-leak] turn=0 rss=89.1MB
[no-leak] turn=1 rss=89.1MB
[no-leak] turn=2 rss=89.1MB
[no-leak] turn=3 rss=89.1MB
[no-leak] turn=4 rss=89.1MB
[no-leak] turn=5 rss=89.1MB
[no-leak] turn=6 rss=89.1MB
[no-leak] turn=7 rss=89.3MB
[no-leak] turn=8 rss=89.3MB
[no-leak] turn=9 rss=89.3MB
[no-leak] turn=10 rss=89.3MB
[no-leak] turn=11 rss=89.3MB
[no-leak] turn=12 rss=89.3MB
[no-leak] turn=13 rss=89.3MB
[no-leak] turn=14 rss=89.3MB
[no-leak] turn=15 rss=89.4MB
[no-leak] turn=16 rss=89.4MB
[no-leak] turn=17 rss=89.4MB
[no-leak] turn=18 rss=89.4MB
[no-leak] turn=19 rss=89.4MB

Why It's Hard To Notice

The quickstart and most examples show github = GitHub(token) with direct await calls. The async with pattern appears in some docs but with no explanation of why it matters. No warnings, no existing issues about this. A Python profiler will show nothing wrong.

Suggested Fixes

Emit a ResourceWarning in the else branch of get_async_client() when called without prior __aenter__.

Document that async with is required for connection reuse and that skipping it causes native memory growth on every uncached request.

Environment

Python 3.14
githubkit 0.15.0
OS WSL2 (devcontainer), Amazon Linux (EC2)
Profiler memray --native

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdocumentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions