You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, when you pass a database or schema into the constructor of the backend, it goes and actually changes the state of the backend so that this is the default. In other words, we are trying to keep the frontend's state in sync with the backend's. This has a few problems:
Every time you call .table(), you have to do a round trip to actually go and SELECT CURRENT_SCHEMA(). In my usage with hosted postgres on neon.tech, this leads to a difference of .7sec to execute conn.sql("SELECT * FROM "my_table").head().execute() vs 1.3 sec to execute con.table("my_table").head().execute().
It also lends itself to race conditions (albeit unlikely):
in postgres, you go and fetch CURRENT_SCHEMA(), then start building up an expression
someone else on a different connection modifies current_schema()
By the time you actually execute the expression, your state is out of sync.
Because .current_database is an attribute and not a method, it hides the fact that some expensive computation is happening. I would love to see this a method. This is orthogonal to the other problems I bring up here, and we can solve this separately. In general, I would love it so that as much as possible, attributes on a backend never trigger a roundtrip to the backend. Methods may or may not trigger them.
What is the motivation behind your request?
I have some web scraping code that processes a bunch of URLs, and every time an item completes (each URL takes a few seconds to process) I want to conn.insert() the result so that I don't lose results if the fetching errors. A difference of 1.3 vs .7 seconds adds up over the several thousand URLs I need to process. I could re-structure this of course, but I want the naive approach to work.
Describe the solution you'd like
a) make current_database and current_catalog be functions
b) add default_catalog: str | None and default_database: str | None attributes to all backends (where applicable). These are set during .do_connect(). Users can also modify these after the connection is created, eg conn.default_database = "foo". Then, in .table(), .insert(), etc, we change the logic from (using the Postgres.get_schema() as an example)
Is your feature request related to a problem?
Currently, when you pass a database or schema into the constructor of the backend, it goes and actually changes the state of the backend so that this is the default. In other words, we are trying to keep the frontend's state in sync with the backend's. This has a few problems:
Every time you call .table(), you have to do a round trip to actually go and
SELECT CURRENT_SCHEMA()
. In my usage with hosted postgres on neon.tech, this leads to a difference of .7sec to executeconn.sql("SELECT * FROM "my_table").head().execute()
vs 1.3 sec to executecon.table("my_table").head().execute()
.It also lends itself to race conditions (albeit unlikely):
.current_database
is an attribute and not a method, it hides the fact that some expensive computation is happening. I would love to see this a method. This is orthogonal to the other problems I bring up here, and we can solve this separately. In general, I would love it so that as much as possible, attributes on a backend never trigger a roundtrip to the backend. Methods may or may not trigger them.What is the motivation behind your request?
I have some web scraping code that processes a bunch of URLs, and every time an item completes (each URL takes a few seconds to process) I want to
conn.insert()
the result so that I don't lose results if the fetching errors. A difference of 1.3 vs .7 seconds adds up over the several thousand URLs I need to process. I could re-structure this of course, but I want the naive approach to work.Describe the solution you'd like
a) make current_database and current_catalog be functions
b) add
default_catalog: str | None
anddefault_database: str | None
attributes to all backends (where applicable). These are set during.do_connect()
. Users can also modify these after the connection is created, egconn.default_database = "foo"
. Then, in.table()
,.insert()
, etc, we change the logic from (using the Postgres.get_schema() as an example)to
I am almost sure I am overlooking some other important implications of this change. Thanks for helping think through them with me.
What version of ibis are you running?
main
What backend(s) are you using, if any?
postgres
Code of Conduct
The text was updated successfully, but these errors were encountered: