Description
Is your feature request related to a problem?
In SQL, you can create aggregates over scalars. However, in ibis, the ir.Scalar Values don't have any of the aggregation functions:
import ibis
ibis.options.interactive = True
# aggregating over constants works
sql = """
WITH t as (
SELECT 1 AS a, 5 AS b UNION ALL
SELECT 1 AS a, 7 AS b UNION ALL
SELECT 2 AS a, 3 AS b
)
SELECT AVG(b) AS avg_b, AVG(NULL) as avg_null, AVG(43) AS avg_const
FROM t
GROUP BY a
"""
from_sql = ibis.duckdb.connect().sql(sql)
print(from_sql.execute())
# works
# You don't even need to perform the aggregation over a table/group/etc
sql = """
SELECT AVG(NULL) as avg_null, AVG(43) AS avg_const
"""
from_sql = ibis.duckdb.connect().sql(sql)
print(from_sql.execute())
# works
t = ibis.memtable({"a": [1, 1, 2], "b": [5, 7, 3]})
t.group_by("a").aggregate(
avg_b=t.b.mean(),
avg_null=ibis.null(float).mean(), # AttributeError: 'FloatingScalar' object has no attribute 'mean'
avg_const=ibis.literal(43).mean(), # AttributeError: 'IntegerScalar' object has no attribute 'mean'
)
What is the motivation behind your request?
I have some dataclass like
@dataclass
class Address:
street1: ir.StringValue
street2: ir.StringValue
city: ir.StringValue
...
which I use as an abstraction layer to smooth over the differences in schemas that my vaiours tables have. Then I have some factory method, where the street2 field is optional. If not given, it infers NULL:
def make_address(t: ibis.Table):
if "street2" is not in t.columns
street2 = ibis.null(str)
else:
street2 = t.street2
....
return Address(...)
Then I want to be able to use these as
def canonicalize_addresses(t: ibis.Table):
address = make_address(t)
quality = rank_quality(address)
aggs = {
street1: address.street1.first(order_by=quality),
street2: address.street2.first(order_by=quality),
....,
},
return t.group_by("address_id").agg(**aggs)
In other words, once I get the Address dataclass back, I want to be able to treat these as values from the relation. This abstraction works great except for when I have to perform aggregations like in the above.
Describe the solution you'd like
Several options
- Don't change anything. I will need to add in an extra .select() somehwere in all using code so that the
ibis.null()
is turned from a ir.Scalar to to a ir.Column. - Add a
as_column()
method to Scalars, Columns, and Tables, similar to the existing .as_scalar() and .as_table() methods. Then I will have to explicitly call this function. - Add aggregation methods to ir.Scalar objects. IDK if we should make it so that
ibis.literal(42).mean().execute()
gives an error (similar to current behavior) or perhaps we should do what SQL does, and have this result in 42.
What version of ibis are you running?
main
What backend(s) are you using, if any?
duckdb, but I think this should affect all SQL backends the same. I think duckdbs handling of SELECT AVG(42)
should be the same in all backends?
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Type
Projects
Status