Skip to content

feat: improve typing for ibis.<backend>.connect() #11188

Open
@NickCrews

Description

@NickCrews

Is your feature request related to a problem?

The current implementation, with the dynamic __getattr__ stuff in the top-level "ibis/init.py", makes it impossible for my IDE to understand that the thing returned is a DuckDbBackend, PostgresBackend, etc.

What is the motivation behind your request?

No response

Describe the solution you'd like

Currently we build the "module proxy" dynamically:

Details
def load_backend(name: str) -> BaseBackend:
    """Load backends in a lazy way with `ibis.<backend-name>`.

    This also registers the backend options.

    Examples
    --------
    >>> import ibis
    >>> con = ibis.sqlite.connect(...)

    When accessing the `sqlite` attribute of the `ibis` module, this function
    is called, and a backend with the `sqlite` name is tried to load from
    the `ibis.backends` entrypoints. If successful, the `ibis.sqlite`
    attribute is "cached", so this function is only called the first time.

    """

    entry_points = {ep for ep in util.backend_entry_points() if ep.name == name}

    if not entry_points:
        msg = f"module 'ibis' has no attribute '{name}'. "
        if name in _KNOWN_BACKENDS:
            msg += f"""If you are trying to access the '{name}' backend,
                    try installing it first with `pip install 'ibis-framework[{name}]'`"""
        raise AttributeError(msg)

    if len(entry_points) > 1:
        raise RuntimeError(
            f"{len(entry_points)} packages found for backend '{name}': "
            f"{entry_points}\n"
            "There should be only one, please uninstall the unused packages "
            "and just leave the one that needs to be used."
        )

    import types

    import ibis

    (entry_point,) = entry_points
    try:
        module = entry_point.load()
    except ImportError as exc:
        raise ImportError(
            f"Failed to import the {name} backend due to missing dependencies.\n\n"
            f"You can pip or conda install the {name} backend as follows:\n\n"
            f'  python -m pip install -U "ibis-framework[{name}]"  # pip install\n'
            f"  conda install -c conda-forge ibis-{name}           # or conda install"
        ) from exc
    backend = module.Backend()
    # The first time a backend is loaded, we register its options, and we set
    # it as an attribute of `ibis`, so `__getattr__` is not called again for it
    backend.register_options()

    # We don't want to expose all the methods on an unconnected backend to the user.
    # In lieu of a full redesign, we create a proxy module and add only the methods
    # that are valid to call without a connect call. These are:
    #
    # - connect
    # - compile
    # - has_operation
    # - _from_url
    #
    # We also copy over the docstring from `do_connect` to the proxy `connect`
    # method, since that's where all the backend-specific kwargs are currently
    # documented. This is all admittedly gross, but it works and doesn't
    # require a backend redesign yet.

    def connect(*args, **kwargs):
        return backend.connect(*args, **kwargs)

    connect.__doc__ = backend.do_connect.__doc__
    connect.__wrapped__ = backend.do_connect
    connect.__module__ = f"ibis.{name}"

    proxy = types.ModuleType(f"ibis.{name}")
    setattr(ibis, name, proxy)
    proxy.connect = connect
    proxy.compile = backend.compile
    proxy.has_operation = backend.has_operation
    proxy.name = name
    proxy._from_url = backend._from_url

    # Add any additional methods that should be exposed at the top level
    for attr in getattr(backend, "_top_level_methods", ()):
        setattr(proxy, attr, getattr(backend, attr))

    return proxy

I think we can maybe get this to work by either:

  1. The way I'd prefer: making all of these needed methods be class methods/attributes on the backend, then returning that class. eg ibis.duckdb returns the class DuckdbBackend. Then users can call the classmethod .connect() on it, or other classmethods such as from_connection(), from_url(), etc. This would require refactoring a lot of the backends, and probably would be breaking for some people. No idea how deep that rabbit hole would go, but this really seems the cleanest from my perspective.
  2. Hoist the dynamically generate proxy to a statically typed generic class eg class BackendProxy(Generic[BackendT])

I am looking for:

  1. in general your level of support here. eg you are psyched to help drive this forward vs you are willing to give detailed reviews but probably won't do any adjustments yourself vs you don't think this is a priority and are going to be busy with other things
  2. what would be dealmakers/dealbreakers for you. eg users doing ibis.duckdb.connect() should not be affected
  3. any tips/hazards that you can foresee in this quest.

What version of ibis are you running?

main

What backend(s) are you using, if any?

duckdb and postgres

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureFeatures or general enhancements

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions