Skip to content

[presto] Move out M2Y from RegressionState for regr_slope and regr_intercept functions #25475

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

kletkavrubashku
Copy link
Contributor

@kletkavrubashku kletkavrubashku commented Jul 1, 2025

Description

Currently we don't enforce intermediate/return type are the same in Coordinator and Prestissimo Worker.
Velox creates vectors for intermediate/return results based on a plan that comes from Coordinator. Then Prestissimo tries to use those vector and not crash.

In practise we had a crash some time ago due to such a mismatch (D74199165).
And I added validation to Velox to catch such kind of mismatches early: facebookincubator/velox#13322
But we wasn't able to enable it in prod, because the validation failed for "regr_slope" and "regr_intercept" functions.

What's changed?

In this diff I'm fixing "regr_slope" and "regr_intercept" intermediate types. Basically in Java AggregationState for all these functions is the same:

    AggregationFunction("regr_slope")
    AggregationFunction("regr_intercept")
    AggregationFunction("regr_sxy")
    AggregationFunction("regr_sxx")
    AggregationFunction("regr_syy")
    AggregationFunction("regr_r2")
    AggregationFunction("regr_count")
    AggregationFunction("regr_avgy")
    AggregationFunction("regr_avgx")

But in Prestissimo the state storage is more optimal:

    AggregationFunction("regr_slope")
    AggregationFunction("regr_intercept")

These 2 aggregation functions don't have M2Y field. And this is more efficient, because we don't waste memory and CPU on the field, that aren't needed.

So I moved M2Y to extended class, the same as it works in Velox: https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/CovarianceAggregates.cpp?fbclid=IwY2xjawLRTetleHRuA2FlbQIxMQBicmlkETFiT0N3UFR0M2VKOHl6MHRhAR6KRQ1VUQdCkZXzwj14sMQrVZ-R9QBH1utuGJb5U_lyGzDwt8PwV317QRVNJg_aem_-ePxZ-fHO5MNgfUmayVJFA#L326-L337

No major changes, mostly just reorganized the code.

Next steps

In this diff I'm trying to apply the same optimization to Java. With this fix, the signatures will become the same in Java and Prestissimo and we will be able to enable the validation

Differential Revision: D77625566

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@kletkavrubashku kletkavrubashku requested a review from a team as a code owner July 1, 2025 23:27
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D77625566

…tercept functions (prestodb#25475)

Summary:

## Context
Currently we don't enforce intermediate/return type are the same in Coordinator and Prestissimo Worker.
Velox creates vectors for intermediate/return results based on a plan that comes from Coordinator. Then Prestissimo tries to use those vector and not crash.

In practise we had a crash some time ago due to such a mismatch (D74199165).
And I added validation to Velox to catch such kind of mismatches early: facebookincubator/velox#13322
But we wasn't able to enable it in prod, because the validation failed for "regr_slope" and "regr_intercept" functions.

## What's changed?
In this diff I'm fixing "regr_slope" and "regr_intercept" intermediate types. Basically in Java `AggregationState` for all these functions is the same:
```
    AggregationFunction("regr_slope")
    AggregationFunction("regr_intercept")
    AggregationFunction("regr_sxy")
    AggregationFunction("regr_sxx")
    AggregationFunction("regr_syy")
    AggregationFunction("regr_r2")
    AggregationFunction("regr_count")
    AggregationFunction("regr_avgy")
    AggregationFunction("regr_avgx")
```
But in Prestissimo the state storage is more optimal:
```
    AggregationFunction("regr_slope")
    AggregationFunction("regr_intercept")
```
These 2 aggregation functions don't have M2Y field. And this is more efficient, because we don't waste memory and CPU on the field, that aren't needed.

So I moved M2Y to extended class, the same as it works in Velox: https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/CovarianceAggregates.cpp?fbclid=IwY2xjawLRTetleHRuA2FlbQIxMQBicmlkETFiT0N3UFR0M2VKOHl6MHRhAR6KRQ1VUQdCkZXzwj14sMQrVZ-R9QBH1utuGJb5U_lyGzDwt8PwV317QRVNJg_aem_-ePxZ-fHO5MNgfUmayVJFA#L326-L337

No major changes, mostly just reorganized the code.

## Next steps
In this diff I'm trying to apply the same optimization to Java. With this fix, the signatures will become the same in Java and Prestissimo and we will be able to enable the validation

Differential Revision: D77625566
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D77625566

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants