-
Notifications
You must be signed in to change notification settings - Fork 50
feat: create deploy_remote_function
and deploy_udf
functions to immediately deploy functions to BigQuery
#1832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This commit refactors the implementation of immediate deployment for remote functions and UDFs to eliminate code duplication introduced in a previous commit. Changes: - The `remote_function` and `udf` methods in `bigframes.functions._function_session.FunctionSession` now accept an optional `deploy_immediately: bool` parameter (defaulting to `False`). The previous `deploy_remote_function` and `deploy_udf` methods in `FunctionSession` have been removed, and their logic is now incorporated into the unified methods. - The public API functions `bigframes.pandas.deploy_remote_function` and `bigframes.pandas.deploy_udf` now call the corresponding `FunctionSession` methods with `deploy_immediately=True`. - The public API functions `bigframes.pandas.remote_function` and `bigframes.pandas.udf` call the `FunctionSession` methods with `deploy_immediately=False` (relying on the default). - Unit tests in `tests/unit/functions/test_remote_function.py` have been updated to patch the unified `FunctionSession` methods and verify the correct `deploy_immediately` boolean is passed based on which public API function is called. Note: The underlying provisioning logic in `FunctionSession` currently deploys functions immediately regardless of the `deploy_immediately` flag. This flag serves as an indicator of intent and allows for future enhancements to support true lazy deployment if desired, without further API changes.
This commit corrects a previous refactoring attempt to eliminate code duplication and properly separates immediate-deployment functions from standard (potentially lazy) functions. Changes: - `bigframes.functions._function_session.FunctionSession` now has distinct methods: `remote_function`, `udf`, `deploy_remote_function`, and `deploy_udf`. The `deploy_immediately` flag has been removed from this class. - `deploy_remote_function` and `deploy_udf` methods in `FunctionSession` are responsible for ensuring immediate deployment by calling the underlying provisioning logic directly. The standard `remote_function` and `udf` methods in `FunctionSession` also currently call this provisioning logic, meaning all functions are deployed immediately as of now, but the structure allows for future lazy evaluation for standard functions without changing the deploy variants' contract. - Public API functions in `bigframes.pandas` (`remote_function`, `udf`, `deploy_remote_function`, `deploy_udf`) now correctly delegate to their corresponding distinct methods in `FunctionSession` (via the `Session` object). - Unit tests in `tests/unit/functions/test_remote_function.py` have been updated to mock and verify calls to the correct distinct methods on `bigframes.session.Session`. This resolves the issue of using a boolean flag to control deployment type and instead relies on calling specific, dedicated methods for immediate deployment, aligning with your request.
|
||
# If the user forces the cloud function service argument to None, throw | ||
# an exception | ||
if cloud_function_service_account is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks duplicated with remote_function
This commit simplifies the implementation of `deploy_remote_function` and `deploy_udf` within `bigframes.functions._function_session.FunctionSession`. Given that the standard `remote_function` and `udf` methods in `FunctionSession` already perform immediate deployment of resources (as the underlying provisioning logic they call is immediate), the `deploy_remote_function` and `deploy_udf` methods in the same class are simplified to directly call `self.remote_function(...)` and `self.udf(...)` respectively. This change makes the distinction between the `deploy_` variants and the standard variants in `FunctionSession` primarily a matter of semantic clarity and intent at that level; both paths currently result in immediate deployment. The public API in `bigframes.pandas` continues to offer distinct `deploy_` functions that call these `FunctionSession.deploy_` methods, preserving your user-facing API and its documented behavior of immediate deployment. No changes were needed for the public API in `bigframes.pandas` or the unit tests, as they were already aligned with calling distinct methods on the `Session` object, which in turn calls the now-simplified `FunctionSession` methods.
def deploy_remote_function( | ||
self, | ||
*, | ||
input_types: Union[None, type, Sequence[type]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it makes more sense just to use *args
and **kwargs
here since we're just passing through to remote_function
?
For detailed argument descriptions, see :func:`~bigframes.pandas.remote_function`. | ||
""" | ||
return global_session.with_default_session( | ||
bigframes.session.Session.deploy_remote_function, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're actually missing that method right now.
mock_session_deploy_rf_method.assert_called_once() | ||
# Check some key args passed to the session's deploy_remote_function method | ||
args, kwargs = mock_session_deploy_rf_method.call_args | ||
# The first arg to Session.deploy_remote_function is the actual python function | ||
assert callable(args[0]) | ||
assert args[0].__name__ == "my_remote_func" | ||
# Other args are passed as kwargs by with_default_session | ||
assert kwargs.get("cloud_function_service_account") == "test_sa@example.com" | ||
assert kwargs.get("reuse") is True # Default reuse is True | ||
assert kwargs.get("name") is None # Default name is None | ||
|
||
# Test that the function is still callable locally | ||
assert my_remote_func(10) == 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe better to assert that the create function query was called?
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕