Skip to content

aiohttp ignores HTTP_PROXY/HTTPS_PROXY env vars : CLI failing with DNS errors in proxy-gated environments #96

@iayanpahwa

Description

@iayanpahwa

Summary

The zyte-api CLI and the AsyncZyteAPI / ZyteAPI Python clients fail with a
ClientConnectorDNSError in any environment where outbound HTTP traffic is routed
through a system proxy, even when HTTP_PROXY / HTTPS_PROXY environment variables
are correctly set. The root cause is that aiohttp.ClientSession is created without
trust_env=True, so it bypasses the proxy entirely and attempts direct DNS resolution,
which fails.

Affected environments

Any environment where internet access goes through an HTTP/HTTPS proxy, including:

  • Corporate networks with SSL inspection proxies
  • Docker containers without host networking (proxy injected via env vars)
  • CI/CD runners (GitHub Actions, GitLab CI, etc.) in restricted networks
  • AI agent sandboxes (e.g. Claude.ai computer use, similar agentic platforms)
  • Cloud VPC environments with NAT gateways or egress proxies

Steps to reproduce

  1. Set up an environment with HTTPS_PROXY / HTTP_PROXY pointing to a working proxy.
  2. Confirm outbound connectivity works via curl (which respects proxy env vars by default).
  3. Run the CLI:
zyte-api urls.txt --api-key YOUR_KEY --output results.jsonl

Expected: requests route through the proxy, reach api.zyte.com successfully.

Actual:

ERROR:zyte_api:Cannot connect to host api.zyte.com:443 ssl:default
[Temporary failure in name resolution]

Successful URLs: 0 of N
Success ratio:   0.0%
Exception types: [(<class 'aiohttp.client_exceptions.ClientConnectorDNSError'>, N)]

Note: --api-url https://api.zyte.com/v1/ does not fix this — the issue is not
the URL but aiohttp's failure to route through the proxy.

Root cause

In zyte_api/_utils.py, create_session() creates the aiohttp.ClientSession without
trust_env=True:

def create_session(
    connection_pool_size: int = 100, **kwargs: Any
) -> aiohttp.ClientSession:
    kwargs.setdefault("timeout", _AIO_API_TIMEOUT)
    if "connector" not in kwargs:
        kwargs["connector"] = TCPConnector(limit=connection_pool_size, force_close=True)
    return aiohttp.ClientSession(**kwargs)  # <-- trust_env not set

aiohttp does support reading proxy env vars, but only when trust_env=True is
explicitly passed. Without it, it opens TCP connections directly, performing its own
local DNS resolution — which fails in proxy-only environments.

This is confirmed by the aiohttp docs:
https://docs.aiohttp.org/en/stable/client_advanced.html#proxy-support

"To use proxy env variables, pass trust_env=True to ClientSession"

Confirmation

Monkey-patching create_session to inject trust_env=True resolves the issue
completely:

import zyte_api._utils as _u
_orig = _u.create_session
def _patched(*a, **kw):
    kw['trust_env'] = True
    return _orig(*a, **kw)
_u.create_session = _patched

After this patch, the CLI runs successfully (2/2 URLs, 100% success rate) in an
environment where it previously failed with 0% success.

Proposed fix

1: set trust_env=True

In zyte_api/_utils.py:

def create_session(
    connection_pool_size: int = 100, **kwargs: Any
) -> aiohttp.ClientSession:
    kwargs.setdefault("timeout", _AIO_API_TIMEOUT)
    kwargs.setdefault("trust_env", True)          # <-- add this line
    if "connector" not in kwargs:
        kwargs["connector"] = TCPConnector(limit=connection_pool_size, force_close=True)
    return aiohttp.ClientSession(**kwargs)

Using setdefault means callers who explicitly pass trust_env=False retain control.
This is the minimal, low-risk change.

or

2: Expose as a CLI flag

Add --trust-env / --no-trust-env to the CLI argument parser, defaulting to True.
This gives users explicit control and makes the behaviour discoverable. This could be
combined with Option A (default on, opt-out via flag).

or

3: Auto-detect proxy env vars

Check whether HTTP_PROXY / HTTPS_PROXY / ALL_PROXY are set in the environment
and automatically enable trust_env=True only in that case. Slightly more complex but
makes the behaviour fully transparent.

Recommendation

Option A is the right default. trust_env=True has no downside in environments without
a proxy : aiohttp simply finds no proxy vars and behaves identically to today. The
current default of False actively breaks users in proxy environments with no obvious
error message pointing to the cause.

Environment

  • zyte-api version: 0.8.1
  • Python: 3.12
  • aiohttp version: (whatever ships with 0.8.1)
  • OS: Linux (Ubuntu 24, sandboxed container)
  • Proxy: HTTP CONNECT proxy via HTTPS_PROXY env var

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions