-
Notifications
You must be signed in to change notification settings - Fork 115
Utility functions for workspace, lakehouse, and tables #527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@Kay-Personal please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
Contributor License AgreementContribution License AgreementThis Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
|
src/sempy_labs/_helper_functions.py
Outdated
@@ -17,7 +17,7 @@ | |||
from IPython.display import display, HTML | |||
import requests | |||
import sempy_labs._authentication as auth | |||
|
|||
from pyspark.sql import DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this import should be within the function itself. otherwise if SLL is opened in a pure python notebook, SLL will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. Removed, and will place this within the functions from now on.
src/sempy_labs/_helper_functions.py
Outdated
Parameters | ||
---------- | ||
path : str | ||
The path or name of the delta table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have it say 'The abfss path or name of the delta table'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated accordingly
src/sempy_labs/_helper_functions.py
Outdated
Defaults to None which leaves the description blank. | ||
Returns | ||
------- | ||
Tuple[str, UUID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple[str, uuid.UUID]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx. Updated.
src/sempy_labs/_helper_functions.py
Outdated
|
||
def _get_or_create_workspace( | ||
workspace: Optional[str | UUID] = None, | ||
capacity_id: Optional[UUID] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capacity: Optional[str | UUID] = None,
We can resolve a capacity name or ID to an ID using resolve_capacity_id() (see #533)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Code updated and parameter description updated.
src/sempy_labs/_helper_functions.py
Outdated
The Fabric workspace name or ID. | ||
Defaults to None which resolves to the workspace of the attached lakehouse | ||
or if no lakehouse attached, resolves to the workspace of the notebook. | ||
capacity_id : uuid.UUID, default=None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capacity : str | uuid.UUID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/sempy_labs/_helper_functions.py
Outdated
# Otherwise create a new workspace. | ||
try: | ||
# But only if a human-friendly name was provided. If it's a Guid, raise an exception. | ||
UUID(workspace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can also use if _is_valid_uuid(workspace):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Code simplified
src/sempy_labs/_helper_functions.py
Outdated
|
||
except WorkspaceNotFoundException: | ||
# Otherwise create a new workspace. | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to avoid use try/except when possible because they can mask other issues. We could do something like
filter_condition = urllib.parse.quote(workspace)
dfW = fabric.list_workspaces(filter=f"name eq '{filter_condition}'")
if dfW.empty:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. Refactored.
src/sempy_labs/_helper_functions.py
Outdated
Defaults to None which leaves the description blank. | ||
Returns | ||
------- | ||
Tuple[str, UUID] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tuple[str, uuid.UUID]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/sempy_labs/_helper_functions.py
Outdated
return (lakehouse, lakehouse_id) | ||
|
||
|
||
def _save_as_delta_table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, updated the description of your function to make clear that it accepts both, Pandas and PySpark dataframes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the _save_as_delta_table version
src/sempy_labs/_helper_functions.py
Outdated
dataframe.write.mode("overwrite").format("delta").save(filePath) | ||
print(f"{icons.green_dot} Delta table '{delta_table_name}' created and {dataframe.count()} rows inserted.") | ||
|
||
def _insert_into_delta_table( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we already have this function as save_as_delta_table(write_mode='append')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still relevant? I've made many changes to these functions since which have made the functions work for both pyspark and pure python.
src/sempy_labs/_helper_functions.py
Outdated
Tuple[str, uuid.UUID] | ||
A tuple holding the name and ID of the workspace. | ||
""" | ||
import urllib.parse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this import already exists at the top of the file
src/sempy_labs/_helper_functions.py
Outdated
# Otherwise create a new workspace. | ||
if _is_valid_uuid(workspace): | ||
# But only if a human-friendly name was provided. If it's a Guid, raise an exception. | ||
raise ValueError("For new workspaces, the workspace parameter must be string, not a Guid. Please provide a workspace name.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the icons.red_dot
src/sempy_labs/_helper_functions.py
Outdated
raise ValueError("For new workspaces, the workspace parameter must be string, not a Guid. Please provide a workspace name.") | ||
elif not workspace: | ||
# And also make sure the workspace parameter isn't empty. | ||
raise ValueError("For new workspaces, the workspace parameter cannot be None or empty. Please provide a workspace name.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add the icons.red_dot
src/sempy_labs/_helper_functions.py
Outdated
# And also make sure the workspace parameter isn't empty. | ||
raise ValueError("For new workspaces, the workspace parameter cannot be None or empty. Please provide a workspace name.") | ||
|
||
if _is_valid_uuid(capacity): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the resolve_capacity_id() logic. it handles this.
src/sempy_labs/_helper_functions.py
Outdated
# Make sure the workspace exists. Raises WorkspaceNotFoundException otherwise. | ||
(workspace_name, workspace_id) = resolve_workspace_name_and_id(workspace) | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of using try can we do something like
dfI = fabric.list_items(type='Lakehouse', workspace=workspace)
and then filter that to see if the lakehouse exists. I prefer not to use try/except if we don't have to
No description provided.