Skip to content

Separation of data preparation and plotting for criterion_plot() #600

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

r3kste
Copy link
Contributor

@r3kste r3kste commented May 26, 2025

Summary of changes

Separated the logic for data preparation and plotting in criterion_plot(). This is done to easily enable the addition of plotting backends.

  1. Dataclass LineData stores required data for plotting a single line.
  2. Dataclass CriterionPlotData stores all backend agnostic data needed for criterion_plot() [lines and multistart_lines if applicable]
  3. Dataclass PlotConfig stores global settings of plots

Additionally, we could add a new parameter such as return_data - which would return the plot_data instead of plotting. This can be discussed, if it is worth looking into.

Copy link

codecov bot commented May 27, 2025

Codecov Report

Attention: Patch coverage is 91.58879% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/optimagic/visualization/history_plots.py 91.42% 9 Missing ⚠️
Files with missing lines Coverage Δ
src/optimagic/optimization/optimize_result.py 94.16% <100.00%> (+0.04%) ⬆️
src/optimagic/visualization/history_plots.py 92.78% <91.42%> (+1.93%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@r3kste r3kste force-pushed the separate_data_from_plot branch from 7fb293a to 839ca0c Compare May 27, 2025 15:45
Copy link
Member

@timmens timmens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very nice already. I have a few remarks.

Let me know if you have any questions; thank you!!

@r3kste
Copy link
Contributor Author

r3kste commented Jun 3, 2025

I have made the suggested changes and I believe that the failing test is unrelated.

@r3kste r3kste requested a review from timmens June 3, 2025 13:47
@r3kste
Copy link
Contributor Author

r3kste commented Jun 16, 2025

mypy is failing due to addition of type hints in criterion_plot(). I am not entirely sure how to tackle this.

Copy link
Member

@timmens timmens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. Regarding the type-check errors, I had a look locally.

  1. You can ignore (on a per-line basis) the mypy errors that are introduced by passing "invalid" data to the History class
  2. There seem to be a few bugs that are caught by mypy for which we do not have tests. For example this line is flagged fun = res.multistart_info.exploration_results.tolist()[::-1] + stacked.fun by mypy, saying the [...].exploration_results has no method tolist(). This seems to be correct. For these cases, can you check whether it is possible to write a simple test case that would be triggered in the current version of the code, and then fix it?
  3. As mentioned elsewhere, for the problem where mypy thinks multistart_info is None, you can do if res.multistart_info directly, which should work. We will have to see if this is as readable as before.

I have a few more comments on _OptimizeData:

  1. I would like to see it defined before it is used in the return type hint of _retrieve_optimization_data. This also applies for the other dataclasses and functions.
  2. Is there no way to get the start_params in the case of _retrieve_optimization_data_from_results_object?
  3. Although I proposed it, I am not entirely happy with the class name _OptimizeData. It is too closely related to OptimizeResult. Can you propose some alternatives that make clear that it is simply a data container with intermediate results containing the histories?
  4. Given that it is a data container it should be a frozen. In the current setup this is problematic with the name field. You could just pass the name argument to the retrieval functions. Alternatively, instead of returning a list of optimize data, you can also just return a dict[str, _OptimizeData].

def _extract_criterion_plot_data(
data: list["_OptimizeData"],
max_evaluations: int | None,
palette: Iterator[str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
palette: Iterator[str],
palette: itertools.cycle[str],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes a TypeError: 'type' object is not subscriptable. The problem is that we can't subscript [str] for itertools.cycle. So, I think that we could:

  1. Use as palette_cycle: itertools.cycle. However, this doesn't convey that it is of type str.
  2. Leave as palette_cycle: Iterator[str]

Personally I believe using Iterator[str] would be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, very good point! Another approach would be to encapsulate the cycle type hint with strings, like "itertools.cycle[str]". I think I prefer this, simply because we can convey the information that palette is a cycle and not just a finite Iterator.

@r3kste r3kste force-pushed the separate_data_from_plot branch from cb9f65b to baeae7a Compare June 18, 2025 10:15
@r3kste
Copy link
Contributor Author

r3kste commented Jun 18, 2025

Thanks for the review. I have pushed the suggested changes and mypy seems to be passing.

3. Although I proposed it, I am not entirely happy with the class name _OptimizeData. It is too closely related to OptimizeResult. Can you propose some alternatives that make clear that it is simply a data container with intermediate results containing the histories?

How about _CriterionOptimizeData or _CriterionHistory ?

There seem to be a few bugs that are caught by mypy for which we do not have tests. For example this line is flagged fun = res.multistart_info.exploration_results.tolist()[::-1] + stacked.fun by mypy, saying the [...].exploration_results has no method tolist(). This seems to be correct. For these cases, can you check whether it is possible to write a simple test case that would be triggered in the current version of the code, and then fix it?

def run_explorations(
internal_problem: InternalOptimizationProblem,
sample: NDArray[np.float64],
n_cores: int,
step_id: int,
) -> dict[str, NDArray[np.float64]]:

According to the type hints of run_explorations(), it looks like res.multistart_info.expoloration_results is actually of type NDArray[np.float64] and not list[float]. The mypy error is caused because of this and can be fixed by correcting the type hint of exploration_results.

I believe this is not a bug and the usage of tolist() seems to be correct, as I couldn't find a case where exploration_results is a list.

diff --git a/src/optimagic/optimization/optimize_result.py b/src/optimagic/optimization/optimize_result.py
index f2895cf..65860b2 100644
@@ -218,7 +219,7 @@ class MultistartInfo:
     start_parameters: list[PyTree]
     local_optima: list[OptimizeResult]
     exploration_sample: list[PyTree]
-    exploration_results: list[float]
+    exploration_results: NDArray[np.float64]

@r3kste r3kste force-pushed the separate_data_from_plot branch from 9886891 to fe77e81 Compare June 28, 2025 20:28
Copy link
Member

@timmens timmens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good now. I only have a few minor comments.

As a general comment: I would try to avoid mentioning types in the docstrings altogether. Sometimes this is done for return types.

Thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this will be handled in a separate PR.

Comment on lines +225 to +226
class _CriterionHistory:
"""Data retrieved from optimization result."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class _CriterionHistory:
"""Data retrieved from optimization result."""
class _PlottingMultistartHistory:
"""Data container for an optimization history and metadata. Contains local histories in the multistart case.
This dataclass is only used internally!
"""

"""Data retrieved from optimization result."""

history: History
direction: Direction
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The direction field can be deleted. It is not used and can be recovered from the history field.

Comment on lines +276 to +277
msg = "results must be (or contain) an OptimizeResult or a path to a log"
f"file, but is type {type(res)}."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether the second line will be assigned to msg? Shouldn't it be:

Suggested change
msg = "results must be (or contain) an OptimizeResult or a path to a log"
f"file, but is type {type(res)}."
msg = (
"results must be (or contain) an OptimizeResult or a path to a log "
f"file, but is type {type(res)}."
)

monotone=False,
show_exploration=False,
):
results: list[OptimizeResult | str | Path] | dict[str, OptimizeResult | str | Path],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the single optimization case not supported?

history: History
direction: Direction
name: str | None
start_params: list[Any]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
start_params: list[Any]
start_params: PyTree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be imported from optimagic.typing I believe.

def _extract_criterion_plot_data(
data: list[_CriterionHistory],
max_evaluations: int | None,
palette_cycle: Iterator[str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I prefer the following:

Suggested change
palette_cycle: Iterator[str],
palette_cycle: "itertools.cycle[str]",

You are right in that this does not work without enclosing-quotation marks; however, the type checkers and IDE can interpret it correctly with quotation marks and it communicates that palette_cycle is indeed a cycle object (infinite Iterator) and not just any finite Iterator.

Comment on lines +477 to +488
@dataclass(frozen=True)
class CriterionPlotData:
"""Backend agnostic data for criterion plot.

Attributes:
lines: Main optimization paths.
multistart_lines: Multistart optimization paths, if applicable.

"""

lines: list[LineData]
multistart_lines: list[LineData]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I advocated for CriterionPlotData in the beginning for various reasons, I believe given the current state it adds net-complexity. I would therefore argue that we should:

  1. Remove the CriterionPlotData dataclass
  2. Rewrite _extract_criterion_plot_data to return lines and multistart_lines
  3. Adjust the rest of the module (see also comment below)

legend: dict[str, Any]


def _plotly_criterion_plot(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Rename this to _plotly_line_plot, taking only the arguments lines: list[LineData] and plot_config: ....
  • Adjust call to this function

@@ -187,3 +195,99 @@ def test_harmonize_inputs_to_dict_str_input():
def test_harmonize_inputs_to_dict_path_input():
path = Path("test.db")
assert _harmonize_inputs_to_dict(results=path, names=None) == {"0": path}


def _compare_criterion_history_with_result(data, res, res_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Adjust function name with new class name
  • Use type hints
  • The isinstance check for res is not necessary
  • The isinstance check for data should be done outside of this function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants