Description
Is your feature request related to a problem?
I find that joins tend to be more 'wordy' in ibis than they are in other APIs like polars or pyspark and I think it would be quite simple to change that.
What is the motivation behind your request?
It is a very common pattern to use left joins to add information to a table that's currently "under construction". Here is an example:
product_attributes = ... # columns product_id, title, description, size, color, ...
df = (
order_items
.filter(...)
.group_by(...)
.agg(...)
)
filtered_product_attributes = product_attributes.filter(...)
df = (
df
.join(product_attributes, ['product_id'], how='left')
.select(
df,
filtered_product_attributes['title'],
filtered_product_attributes['description'],
filtered_product_attributes['size'],
...
)
)
My two main issues with this are that
- To avoid ending up with a spurious
product_id_right
column I have to do a projection, which means I have to explicitly list all the columns offiltered_product_attributes
that I want to keep. - Both the left and the right table must be assigned to a python variable so that I can do the projection. This means I have to break the method chaining to do a join which (I find) makes the code less readable. It also means that I can't do the filtering on
product_attributes
in-place in the join because I need to refer to the filtered table again in the projection.
Describe the solution you'd like
I would like the code to look like this:
product_attributes = ... # columns product_id, title, description, size, color, ...
df = (
order_items
.filter(...)
.group_by(...)
.agg(...)
.join(
product_attributes.filter(...),
['product_id'],
how='left',
coalesce=True,
)
)
where the effect of the coalesce
option is to remove the _right
version of the product_id
column that we joined on. (The name is borrowed from the polars API.) This makes the projection unnecessary, which means I can filter product_attributes
in-place and do not have to explicitly list the columns I want to keep. Both the polars and the pyspark APIs can automatically remove the duplicates of the columns the you joined on, and their joins are more concise and readable as a result.
What version of ibis are you running?
10.5.0
What backend(s) are you using, if any?
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Type
Projects
Status