Skip to content

[SPARK-52497][DOCS] Add docs for SQL user-defined functions #51281

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

allisonwang-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds docs for SQL UDFs.

Why are the changes needed?

Add documentation for a new Spark 4 feature.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manually verify the documentation build

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the DOCS label Jun 25, 2025
@allisonwang-db
Copy link
Contributor Author

cc @cloud-fan @srielau


When `TEMPORARY` is specified, the function is only available for the current session. Otherwise, it is persisted in the catalog and available across sessions. The `OR REPLACE` option allows updating an existing function definition, while `IF NOT EXISTS` prevents errors when creating a function that already exists.

The function parameters must be specified with their data types. The return type can be either a scalar data type or a table with an optional schema definition.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, the return table schema definition is optional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, the entire RETURN clause is optional, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RETURNS is optional, RETURN is not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the RETURNS clause is optional for scalar UDFs, and the RETURNS TABLE schema is optional for TVFs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it should be The return type can be either a scalar data type or a table with an schema definition. If not specified the return type will be inferred from the function body?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is mentioned below (in the RETURNS clause section). Here it simply means the function return type can be a scalar or a table value.

- **RETURNS [data_type](sql-ref-datatypes.md)**

The return data type of the scalar function. This clause is optional. The data type will be derived from the SQL function body if it is not provided.


### Syntax

```sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know in Spark we use sql, but it makes no sense to do that for the syntax.. I wish there were a BNF ....

allisonwang-db and others added 2 commits June 26, 2025 11:29
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
@xinrong-meng
Copy link
Member

LGTM! python3.9: not found in CI shall we rebase master branch?

@cloud-fan
Copy link
Contributor

The CI issue is unrelated as it's a doc only PR, thanks, merging to master/4.0!

@cloud-fan cloud-fan closed this in 82860bf Jun 30, 2025
cloud-fan pushed a commit that referenced this pull request Jun 30, 2025
### What changes were proposed in this pull request?

This PR adds docs for SQL UDFs.

### Why are the changes needed?

Add documentation for a new Spark 4 feature.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Manually verify the documentation build

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #51281 from allisonwang-db/spark-52497-sql-udf-docs.

Lead-authored-by: Allison Wang <allison.wang@databricks.com>
Co-authored-by: Allison Wang <allisonwang@apache.org>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants