Skip to content

feat: make current_catalog and current_schema reflect backend state, separate from frontend's "default" catalog and schema #11187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
NickCrews opened this issue May 6, 2025 · 0 comments
Labels
feature Features or general enhancements

Comments

@NickCrews
Copy link
Contributor

Is your feature request related to a problem?

Currently, when you pass a database or schema into the constructor of the backend, it goes and actually changes the state of the backend so that this is the default. In other words, we are trying to keep the frontend's state in sync with the backend's. This has a few problems:

  1. Every time you call .table(), you have to do a round trip to actually go and SELECT CURRENT_SCHEMA(). In my usage with hosted postgres on neon.tech, this leads to a difference of .7sec to execute conn.sql("SELECT * FROM "my_table").head().execute() vs 1.3 sec to execute con.table("my_table").head().execute().

  2. It also lends itself to race conditions (albeit unlikely):

  1. in postgres, you go and fetch CURRENT_SCHEMA(), then start building up an expression
  2. someone else on a different connection modifies current_schema()
  3. By the time you actually execute the expression, your state is out of sync.
  1. Because .current_database is an attribute and not a method, it hides the fact that some expensive computation is happening. I would love to see this a method. This is orthogonal to the other problems I bring up here, and we can solve this separately. In general, I would love it so that as much as possible, attributes on a backend never trigger a roundtrip to the backend. Methods may or may not trigger them.

What is the motivation behind your request?

I have some web scraping code that processes a bunch of URLs, and every time an item completes (each URL takes a few seconds to process) I want to conn.insert() the result so that I don't lose results if the fetching errors. A difference of 1.3 vs .7 seconds adds up over the several thousand URLs I need to process. I could re-structure this of course, but I want the naive approach to work.

Describe the solution you'd like

a) make current_database and current_catalog be functions
b) add default_catalog: str | None and default_database: str | None attributes to all backends (where applicable). These are set during .do_connect(). Users can also modify these after the connection is created, eg conn.default_database = "foo". Then, in .table(), .insert(), etc, we change the logic from (using the Postgres.get_schema() as an example)

    def get_schema(
        self,
        name: str,
        *,
        catalog: str | None = None,
        database: str | None = None,
    ):
        dbs = [database or self.current_database]
        # ...

to

    def get_schema(
        self,
        name: str,
        *,
        catalog: str | None = None,
        database: str | None = None,
    ):
        dbs = [database or self.default_database]
        # ...

I am almost sure I am overlooking some other important implications of this change. Thanks for helping think through them with me.

What version of ibis are you running?

main

What backend(s) are you using, if any?

postgres

Code of Conduct

  • I agree to follow this project's Code of Conduct
@NickCrews NickCrews added the feature Features or general enhancements label May 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements
Projects
Status: backlog
Development

No branches or pull requests

1 participant