-
-
Notifications
You must be signed in to change notification settings - Fork 38
Description
Context
Without async with GitHub() as github:, every async API call creates and closes a new httpx.AsyncClient. Each construction loads the CA bundle into native OpenSSL memory. In long-running processes this memory accumulates. It did stop growing at some point in production for unknown reasons; the only observed correlation was that the change appeared after making POST-style API calls (for example posting issue comments or pull request reviews), but there are no reliable reproduction steps beyond that. Still worth fixing because the behavior is undocumented and the memory growth is real and observable.
Less noticeable with GET requests because hishel's ETag caching short-circuits some of the client lifecycle. Most visible with GraphQL and POST requests which hishel cannot cache.
Not visible in tracemalloc or objgraph. Only shows up in memray --native. That is how the leak was found.
Root Cause
githubkit/core.py lines 289–298:
@asynccontextmanager
async def get_async_client(self) -> AsyncGenerator[httpx.AsyncClient, None]:
if client := self.__async_client.get():
print("Creating new async client") # Added this
yield client
else:
print("Reusing existing async client") # Added this
client = self._create_async_client() # new SSL context every call
try:
yield client
finally:
await client.aclose()self.__async_client is a ContextVar set only inside __aenter__. Without the context manager, every call hits the else branch. _create_async_client() calls ssl.create_default_context(cafile=certifi.where()) which loads the CA bundle into native heap.
This was verified by adding prints to both branches (see comments in the code sample). In reproduction the leak run repeatedly printed Creating new async client, while the async with run printed Reusing existing async client once and then reused the client.
Memray Trace
The native allocations trace through OpenSSL certificate parsing during the TLS handshake on each new client:
_ssl_MemoryBIO_write_impl (_ssl.c)
SSL_read_ex
X509_new_ex
ASN1_item_d2i_ex
ASN1_item_new_ex
CRYPTO_malloc / CRYPTO_zalloc <-- repeated allocations
EVP_CipherInit_ex
OSSL_DECODER_from_bio
d2i_EC_PUBKEY
EVP_PKEY_new
EC_GROUP_dup / EC_GROUP_copy
CRYPTO_zalloc
Each new client parses certificates and allocates EC key and ASN.1 structures in native heap that are not tracked by Python's memory management.
Reproduction
import asyncio
import resource
from pathlib import Path
from githubkit import GitHub
def rss_mb() -> float:
pages = int(Path("/proc/self/statm").read_text().split()[1])
return pages * resource.getpagesize() / 1024 / 1024
TOKEN = "<token>"
TURN = 20
async def memory_leak():
github = GitHub(TOKEN)
for i in range(TURN):
await github.async_graphql("{ viewer { login } }")
print(f"[leak] turn={i} rss={rss_mb():.1f}MB")
await asyncio.sleep(1)
async def no_leak():
async with GitHub(TOKEN) as github:
for i in range(TURN:
await github.async_graphql("{ viewer { login } }")
print(f"[no-leak] turn={i} rss={rss_mb():.1f}MB")
await asyncio.sleep(1)
asyncio.run(memory_leak())
asyncio.run(no_leak())Output
[leak] turn=0 rss=72.4MB
[leak] turn=1 rss=72.9MB
[leak] turn=2 rss=73.8MB
[leak] turn=3 rss=74.6MB
[leak] turn=4 rss=75.4MB
[leak] turn=5 rss=76.3MB
[leak] turn=6 rss=77.1MB
[leak] turn=7 rss=78.0MB
[leak] turn=8 rss=78.9MB
[leak] turn=9 rss=79.8MB
[leak] turn=10 rss=80.6MB
[leak] turn=11 rss=81.5MB
[leak] turn=12 rss=82.3MB
[leak] turn=13 rss=83.1MB
[leak] turn=14 rss=84.0MB
[leak] turn=15 rss=84.8MB
[leak] turn=16 rss=85.6MB
[leak] turn=17 rss=86.6MB
[leak] turn=18 rss=87.4MB
[leak] turn=19 rss=88.3MB
[no-leak] turn=0 rss=89.1MB
[no-leak] turn=1 rss=89.1MB
[no-leak] turn=2 rss=89.1MB
[no-leak] turn=3 rss=89.1MB
[no-leak] turn=4 rss=89.1MB
[no-leak] turn=5 rss=89.1MB
[no-leak] turn=6 rss=89.1MB
[no-leak] turn=7 rss=89.3MB
[no-leak] turn=8 rss=89.3MB
[no-leak] turn=9 rss=89.3MB
[no-leak] turn=10 rss=89.3MB
[no-leak] turn=11 rss=89.3MB
[no-leak] turn=12 rss=89.3MB
[no-leak] turn=13 rss=89.3MB
[no-leak] turn=14 rss=89.3MB
[no-leak] turn=15 rss=89.4MB
[no-leak] turn=16 rss=89.4MB
[no-leak] turn=17 rss=89.4MB
[no-leak] turn=18 rss=89.4MB
[no-leak] turn=19 rss=89.4MB
Why It's Hard To Notice
The quickstart and most examples show github = GitHub(token) with direct await calls. The async with pattern appears in some docs but with no explanation of why it matters. No warnings, no existing issues about this. A Python profiler will show nothing wrong.
Suggested Fixes
Emit a ResourceWarning in the else branch of get_async_client() when called without prior __aenter__.
Document that async with is required for connection reuse and that skipping it causes native memory growth on every uncached request.
Environment
| Python | 3.14 |
| githubkit | 0.15.0 |
| OS | WSL2 (devcontainer), Amazon Linux (EC2) |
| Profiler | memray --native |