Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making Segmented Ref Profiles upload as ZIP #1469

Closed

Conversation

murilommen
Copy link
Contributor

  • This PR intends to use songbird's latest Header to generate the URL on logAsync, which will be used to write potentially all kinds of profiles in the future

  • It also extracts some methods out of WhyLabsWriter.write to make it easier to understand

  • I have reviewed the Guidelines for Contributing and the Code of Conduct.

- This PR intends to use songbird's latest Header to generate the URL
on logAsync, which will be used to write potentially all kinds of profiles
in the future
- It also extracts some methods out of WhyLabsWriter.write to make it
easier to understand
@murilommen murilommen force-pushed the dev/murilommen/zip-segmented-reference-with-songbird branch from f5001a9 to 1fa4953 Compare February 16, 2024 20:15
@@ -475,6 +478,27 @@ def get_writables(self) -> Optional[List[Writable]]:
)
return results

def in_memory_zip(self) -> Optional[bytes]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be too big for in memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, in case we're dealing with a gigantic SegmentedResultSet, which can contain multiple SegmentedDatasetProfiles with a number of columns.
I'll turn it into a static method on WhyLabsWriter then, so other types of Writables can leverage it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be an in-memory zipped file, that will be persisted to tmp_dir. same logic as before, only extracted out to its own method

@murilommen murilommen force-pushed the dev/murilommen/zip-segmented-reference-with-songbird branch from 7979c21 to 4acbefc Compare February 21, 2024 18:13
@murilommen murilommen force-pushed the dev/murilommen/zip-segmented-reference-with-songbird branch from 239f922 to d54d30e Compare February 23, 2024 22:58
@patch("whylogs.api.writer.whylabs.WhyLabsWriter._do_upload", return_value=(True, "Success"))
@patch("whylogs.api.writer.whylabs.WhyLabsWriter._get_dataset_timestamp", return_value=1234567890)
@patch("whylogs.api.writer.whylabs.WhyLabsWriter._upload_zipped_files", return_value=(True, "Success"))
def test_write_segmented_reference_result_set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably leave this test out... it won't survive the refactoring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this "all mock" test has saved me some trouble, especially to understand exactly what are the method's calls. I'd vote for leaving it and just changing the patches -- or deleting it -- after the refactor

@@ -448,6 +481,15 @@ def tag_custom_performance_column(
)
return False, str(e)

def _tag_custom_perf_metrics(self, view: Union[DatasetProfileView, SegmentedDatasetProfileView]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should handle SegmentedDatasetProfileView or not be Union?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is called within _get_uncompounded_view, which is a part of the flow for DatasetProfileView and SegmentedDatasetProfileView. but inside of it, it makes an isinstance check for DatasetProfileView only. very confusing, so I didn't want to modify it. out of the scope of this PR

@murilommen murilommen force-pushed the dev/murilommen/zip-segmented-reference-with-songbird branch from 5a4ec3e to 1a80dfd Compare February 28, 2024 22:03
@FelipeAdachi
Copy link
Contributor

Closing because an alternative PR was merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants