Skip to content

feat: create deploy_remote_function and deploy_udf functions to immediately deploy functions to BigQuery #1832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tswast
Copy link
Collaborator

@tswast tswast commented Jun 17, 2025

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

This commit refactors the implementation of immediate deployment
for remote functions and UDFs to eliminate code duplication introduced
in a previous commit.

Changes:
- The `remote_function` and `udf` methods in
  `bigframes.functions._function_session.FunctionSession` now accept
  an optional `deploy_immediately: bool` parameter (defaulting to `False`).
  The previous `deploy_remote_function` and `deploy_udf` methods in
  `FunctionSession` have been removed, and their logic is now
  incorporated into the unified methods.
- The public API functions `bigframes.pandas.deploy_remote_function`
  and `bigframes.pandas.deploy_udf` now call the corresponding
  `FunctionSession` methods with `deploy_immediately=True`.
- The public API functions `bigframes.pandas.remote_function` and
  `bigframes.pandas.udf` call the `FunctionSession` methods
  with `deploy_immediately=False` (relying on the default).
- Unit tests in `tests/unit/functions/test_remote_function.py` have
  been updated to patch the unified `FunctionSession` methods and
  verify the correct `deploy_immediately` boolean is passed based on
  which public API function is called.

Note: The underlying provisioning logic in `FunctionSession` currently
deploys functions immediately regardless of the `deploy_immediately`
flag. This flag serves as an indicator of intent and allows for
future enhancements to support true lazy deployment if desired, without
further API changes.
This commit corrects a previous refactoring attempt to eliminate
code duplication and properly separates immediate-deployment functions
from standard (potentially lazy) functions.

Changes:
- `bigframes.functions._function_session.FunctionSession` now has
  distinct methods: `remote_function`, `udf`,
  `deploy_remote_function`, and `deploy_udf`. The
  `deploy_immediately` flag has been removed from this class.
- `deploy_remote_function` and `deploy_udf` methods in
  `FunctionSession` are responsible for ensuring immediate
  deployment by calling the underlying provisioning logic directly.
  The standard `remote_function` and `udf` methods in
  `FunctionSession` also currently call this provisioning logic,
  meaning all functions are deployed immediately as of now, but the
  structure allows for future lazy evaluation for standard functions
  without changing the deploy variants' contract.
- Public API functions in `bigframes.pandas` (`remote_function`,
  `udf`, `deploy_remote_function`, `deploy_udf`) now correctly
  delegate to their corresponding distinct methods in `FunctionSession`
  (via the `Session` object).
- Unit tests in `tests/unit/functions/test_remote_function.py` have
  been updated to mock and verify calls to the correct distinct
  methods on `bigframes.session.Session`.

This resolves the issue of using a boolean flag to control
deployment type and instead relies on calling specific, dedicated
methods for immediate deployment, aligning with your request.
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. labels Jun 17, 2025

# If the user forces the cloud function service argument to None, throw
# an exception
if cloud_function_service_account is None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks duplicated with remote_function

This commit simplifies the implementation of
`deploy_remote_function` and `deploy_udf` within
`bigframes.functions._function_session.FunctionSession`.

Given that the standard `remote_function` and `udf` methods in
`FunctionSession` already perform immediate deployment of resources
(as the underlying provisioning logic they call is immediate),
the `deploy_remote_function` and `deploy_udf` methods in the
same class are simplified to directly call `self.remote_function(...)`
and `self.udf(...)` respectively.

This change makes the distinction between the `deploy_` variants and
the standard variants in `FunctionSession` primarily a matter of
semantic clarity and intent at that level; both paths currently
result in immediate deployment. The public API in `bigframes.pandas`
continues to offer distinct `deploy_` functions that call these
`FunctionSession.deploy_` methods, preserving your user-facing API
and its documented behavior of immediate deployment.

No changes were needed for the public API in `bigframes.pandas` or
the unit tests, as they were already aligned with calling distinct
methods on the `Session` object, which in turn calls the
now-simplified `FunctionSession` methods.
def deploy_remote_function(
self,
*,
input_types: Union[None, type, Sequence[type]] = None,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it makes more sense just to use *args and **kwargs here since we're just passing through to remote_function?

For detailed argument descriptions, see :func:`~bigframes.pandas.remote_function`.
"""
return global_session.with_default_session(
bigframes.session.Session.deploy_remote_function,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're actually missing that method right now.

Comment on lines +106 to +118
mock_session_deploy_rf_method.assert_called_once()
# Check some key args passed to the session's deploy_remote_function method
args, kwargs = mock_session_deploy_rf_method.call_args
# The first arg to Session.deploy_remote_function is the actual python function
assert callable(args[0])
assert args[0].__name__ == "my_remote_func"
# Other args are passed as kwargs by with_default_session
assert kwargs.get("cloud_function_service_account") == "test_sa@example.com"
assert kwargs.get("reuse") is True # Default reuse is True
assert kwargs.get("name") is None # Default name is None

# Test that the function is still callable locally
assert my_remote_func(10) == 20
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe better to assert that the create function query was called?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery-dataframes API. size: l Pull request size is large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant