Closed
Description
Please describe the issue
I cannot figure out a way to define udfs with struct input.
This is my attempt with spark:
df = pd.DataFrame({"a": [1, 2, 3]})
sdf = spark.createDataFrame(df)
sdf.createOrReplaceTempView("df")
con = ibis.pyspark.connect(spark)
tbl = con.table("df")
SelectorType = ibis.expr.datatypes.Struct({"a": str})
@ibis.udf.scalar.pandas
def udf(x: SelectorType) -> int:
return 1
tbl.mutate(struct=ibis.struct({"a": _.a})).mutate(struct2=udf(_.struct)).execute()
I get the following signature error:
SignatureValidationError: udf_3(r0 := DatabaseTable: df
a int64
r1 := Project[r0]
a: r0.a
struct: StructColumn(names=['a'], values=[r0.a])
struct: r1.struct) has failed due to the following errors:
`x`: r0 := DatabaseTable: df
a int64
r1 := Project[r0]
a: r0.a
struct: StructColumn(names=['a'], values=[r0.a])
struct: r1.struct of type <class 'ibis.expr.types.structs.StructColumn'> is not matching ValueOf(dtype='Struct([('a', String(length=None, nullable=True))], nullable=True))
Expected signature: udf_3(x: struct<a: string>)
Would it be possible to provide an example in the official docs of how to build a custom UDF with struct input and/or output types?
Thanks!
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Type
Projects
Status
done