-
Notifications
You must be signed in to change notification settings - Fork 45
Separation of data preparation and plotting for criterion_plot()
#600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
... and 5 files with indirect coverage changes 🚀 New features to boost your workflow:
|
7fb293a
to
839ca0c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks very nice already. I have a few remarks.
Let me know if you have any questions; thank you!!
I have made the suggested changes and I believe that the failing test is unrelated. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. Regarding the type-check errors, I had a look locally.
- You can ignore (on a per-line basis) the mypy errors that are introduced by passing "invalid" data to the
History
class - There seem to be a few bugs that are caught by mypy for which we do not have tests. For example this line is flagged
fun = res.multistart_info.exploration_results.tolist()[::-1] + stacked.fun
by mypy, saying the[...].exploration_results
has no methodtolist()
. This seems to be correct. For these cases, can you check whether it is possible to write a simple test case that would be triggered in the current version of the code, and then fix it? - As mentioned elsewhere, for the problem where mypy thinks
multistart_info
is None, you can doif res.multistart_info
directly, which should work. We will have to see if this is as readable as before.
I have a few more comments on _OptimizeData
:
- I would like to see it defined before it is used in the return type hint of
_retrieve_optimization_data
. This also applies for the other dataclasses and functions. - Is there no way to get the
start_params
in the case of_retrieve_optimization_data_from_results_object
? - Although I proposed it, I am not entirely happy with the class name
_OptimizeData
. It is too closely related toOptimizeResult
. Can you propose some alternatives that make clear that it is simply a data container with intermediate results containing the histories? - Given that it is a data container it should be a frozen. In the current setup this is problematic with the
name
field. You could just pass thename
argument to the retrieval functions. Alternatively, instead of returning a list of optimize data, you can also just return a dict[str, _OptimizeData].
def _extract_criterion_plot_data( | ||
data: list["_OptimizeData"], | ||
max_evaluations: int | None, | ||
palette: Iterator[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
palette: Iterator[str], | |
palette: itertools.cycle[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes a TypeError: 'type' object is not subscriptable
. The problem is that we can't subscript [str]
for itertools.cycle
. So, I think that we could:
- Use as
palette_cycle: itertools.cycle
. However, this doesn't convey that it is of typestr
. - Leave as
palette_cycle: Iterator[str]
Personally I believe using Iterator[str]
would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, very good point! Another approach would be to encapsulate the cycle type hint with strings, like "itertools.cycle[str]"
. I think I prefer this, simply because we can convey the information that palette
is a cycle
and not just a finite Iterator.
cb9f65b
to
baeae7a
Compare
Thanks for the review. I have pushed the suggested changes and mypy seems to be passing.
How about
optimagic/src/optimagic/optimization/multistart.py Lines 291 to 296 in 9a40583
According to the type hints of I believe this is not a bug and the usage of diff --git a/src/optimagic/optimization/optimize_result.py b/src/optimagic/optimization/optimize_result.py
index f2895cf..65860b2 100644
@@ -218,7 +219,7 @@ class MultistartInfo:
start_parameters: list[PyTree]
local_optima: list[OptimizeResult]
exploration_sample: list[PyTree]
- exploration_results: list[float]
+ exploration_results: NDArray[np.float64] |
9886891
to
fe77e81
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good now. I only have a few minor comments.
As a general comment: I would try to avoid mentioning types in the docstrings altogether. Sometimes this is done for return types.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, this will be handled in a separate PR.
class _CriterionHistory: | ||
"""Data retrieved from optimization result.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class _CriterionHistory: | |
"""Data retrieved from optimization result.""" | |
class _PlottingMultistartHistory: | |
"""Data container for an optimization history and metadata. Contains local histories in the multistart case. | |
This dataclass is only used internally! | |
""" |
"""Data retrieved from optimization result.""" | ||
|
||
history: History | ||
direction: Direction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The direction
field can be deleted. It is not used and can be recovered from the history
field.
msg = "results must be (or contain) an OptimizeResult or a path to a log" | ||
f"file, but is type {type(res)}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether the second line will be assigned to msg
? Shouldn't it be:
msg = "results must be (or contain) an OptimizeResult or a path to a log" | |
f"file, but is type {type(res)}." | |
msg = ( | |
"results must be (or contain) an OptimizeResult or a path to a log " | |
f"file, but is type {type(res)}." | |
) |
monotone=False, | ||
show_exploration=False, | ||
): | ||
results: list[OptimizeResult | str | Path] | dict[str, OptimizeResult | str | Path], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the single optimization case not supported?
history: History | ||
direction: Direction | ||
name: str | None | ||
start_params: list[Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start_params: list[Any] | |
start_params: PyTree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be imported from optimagic.typing
I believe.
def _extract_criterion_plot_data( | ||
data: list[_CriterionHistory], | ||
max_evaluations: int | None, | ||
palette_cycle: Iterator[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the following:
palette_cycle: Iterator[str], | |
palette_cycle: "itertools.cycle[str]", |
You are right in that this does not work without enclosing-quotation marks; however, the type checkers and IDE can interpret it correctly with quotation marks and it communicates that palette_cycle
is indeed a cycle
object (infinite Iterator) and not just any finite Iterator.
@dataclass(frozen=True) | ||
class CriterionPlotData: | ||
"""Backend agnostic data for criterion plot. | ||
|
||
Attributes: | ||
lines: Main optimization paths. | ||
multistart_lines: Multistart optimization paths, if applicable. | ||
|
||
""" | ||
|
||
lines: list[LineData] | ||
multistart_lines: list[LineData] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I advocated for CriterionPlotData
in the beginning for various reasons, I believe given the current state it adds net-complexity. I would therefore argue that we should:
- Remove the
CriterionPlotData
dataclass - Rewrite
_extract_criterion_plot_data
to returnlines
andmultistart_lines
- Adjust the rest of the module (see also comment below)
legend: dict[str, Any] | ||
|
||
|
||
def _plotly_criterion_plot( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Rename this to
_plotly_line_plot
, taking only the argumentslines: list[LineData]
andplot_config: ...
. - Adjust call to this function
@@ -187,3 +195,99 @@ def test_harmonize_inputs_to_dict_str_input(): | |||
def test_harmonize_inputs_to_dict_path_input(): | |||
path = Path("test.db") | |||
assert _harmonize_inputs_to_dict(results=path, names=None) == {"0": path} | |||
|
|||
|
|||
def _compare_criterion_history_with_result(data, res, res_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Adjust function name with new class name
- Use type hints
- The isinstance check for res is not necessary
- The isinstance check for data should be done outside of this function
Summary of changes
Separated the logic for data preparation and plotting in
criterion_plot()
. This is done to easily enable the addition of plotting backends.LineData
stores required data for plotting a single line.CriterionPlotData
stores all backend agnostic data needed forcriterion_plot()
[lines and multistart_lines if applicable]PlotConfig
stores global settings of plotsAdditionally, we could add a new parameter such as
return_data
- which would return theplot_data
instead of plotting. This can be discussed, if it is worth looking into.