Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss function clarity #28

Merged
merged 29 commits into from Feb 7, 2022
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
210b570
enabling general np.ndarray as params in scans, not just 1D
MSRudolph Dec 6, 2021
645b3e8
docstring length
MSRudolph Dec 6, 2021
ae46603
enabling PCA for ND np.ndarray parameters
MSRudolph Dec 6, 2021
3e437cc
whitespace
MSRudolph Dec 6, 2021
6bde08c
specify that Hessian calculation is not supported for ND numpy arrays
MSRudolph Dec 6, 2021
9acc21d
style
MSRudolph Dec 6, 2021
c7563b7
style
MSRudolph Dec 6, 2021
da2825d
possibly enabling ND parameter vectors
MSRudolph Dec 6, 2021
96f5e14
Merge branch 'dev' into loss-function-clarity
MSRudolph Jan 20, 2022
62de902
fixing gradient for ND numpy arrays
MSRudolph Jan 20, 2022
31519d6
enabling ND numpy array for Hessians
MSRudolph Jan 21, 2022
4919e36
implementing custom ND tensor norm
MSRudolph Jan 21, 2022
a66752c
Adding new Aliases
MSRudolph Jan 21, 2022
d29f036
defining return types
MSRudolph Jan 21, 2022
9156347
Starting with FAQ section
MSRudolph Jan 21, 2022
b8d16e2
comments for aliases
MSRudolph Jan 21, 2022
c1fbe0f
Revert EvalFunction to LossFunction
MSRudolph Jan 21, 2022
2206feb
Update README.md
MSRudolph Jan 27, 2022
8c3035d
addressing minor comments
MSRudolph Jan 27, 2022
91d105e
Merge branch 'loss-function-clarity' of https://github.com/zapatacomp…
MSRudolph Jan 27, 2022
9288d64
merge conflicts
MSRudolph Jan 27, 2022
a5019ea
update the loss_function documentation according to the new LossFunct…
MSRudolph Jan 27, 2022
9cbb003
Added optional to perform_1D_scan.
mstechly Feb 5, 2022
e17194f
Fixing style issues.
mstechly Feb 5, 2022
034f42a
Fixed typing issues.
mstechly Feb 5, 2022
3470fcc
Added dev dependencies to style workflow.
mstechly Feb 5, 2022
0f09414
fixup! Added dev dependencies to style workflow.
mstechly Feb 5, 2022
b642843
eigenvectors are direction vectors
MSRudolph Feb 7, 2022
3778fe0
make numpy array so that list of parameter vectors is valid
MSRudolph Feb 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Expand Up @@ -59,6 +59,17 @@ This code results in the following plot:

![Image](docs/example_plot.png)

## FAQ

**What are the expected type and shape for the parameters?**\
Parameters should be of type `numpy.ndarray` filled with real numbers. In recent releases, the shape of the parameters can be arbitrary, as long as `numpy` allows it, i.e., you cannot have inconsistent sizes per dimension. Until version `0.1.1`, the parameter array needed to be one-dimensional.

**What is the format of the `loss_function` that most `orqviz` methods expect?**\
We define a `loss_function` as a function which receives only the parameters of the model and returns a floating point/ real number. That value could for example be the cost function of an optimization problem, the prediction of a classifier, or the fidelity with respect to a fixed quantum state. All the calculation that needs to be performed to get to these values needs to happen in your function. Check out the above code as a minimal example.

**What can I do if my loss function requires additional arguments?**\
In that case you need to wrap the function into another function such that it again receives only the parameters of the model. We built a wrapper class called `LossFunctionWrapper` that you can import from `orqviz.loss_function`. It is a thin wrapper with helpful perks such as measuring the average evaluation time of a single loss function call, and the total number of calls.

## Authors

The leading developer of this package is Manuel Rudolph at Zapata Computing.\
Expand Down
263 changes: 175 additions & 88 deletions docs/examples/advanced_example_notebook.ipynb

Large diffs are not rendered by default.

20 changes: 14 additions & 6 deletions docs/examples/gradient_descent_optimizer.py
@@ -1,23 +1,31 @@
from typing import Callable, Optional, Tuple

import numpy as np

from orqviz.aliases import (
ArrayOfParameterVectors,
FullGradientFunction,
LossFunction,
ParameterVector,
)
from orqviz.gradients import calculate_full_gradient


def gradient_descent_optimizer(
init_params: np.ndarray,
loss_function: Callable[[np.ndarray], float],
init_params: ParameterVector,
loss_function: LossFunction,
n_iters: int,
learning_rate: float = 0.1,
full_gradient_function: Optional[Callable] = None,
full_gradient_function: FullGradientFunction = None,
eval_loss_during_training: bool = True,
) -> Tuple[np.ndarray, np.ndarray]:
) -> Tuple[ArrayOfParameterVectors, np.ndarray]:
"""Function perform gradient descent optimization on a loss function.

Args:
init_params: Initial parameter vector from which to start the optimization.
loss_function: Loss function with respect to which the gradient is calculated.
loss_function: Function with respect to which the gradient is calculated. It must receive only a
numpy.ndarray of parameters, and return a real number.
If your function requires more arguments, consider using the 'LossFunctionWrapper'
class from 'orqviz.loss_function'.
n_iters: Number of iterations to optimize.
learning_rate: Learning rate for gradient descent. The calculated gradient
is multiplied with this value and then updates the parameter vector.
Expand Down
18 changes: 15 additions & 3 deletions src/orqviz/aliases.py
@@ -1,3 +1,5 @@
from typing import Callable

import numpy as np

"""
Expand All @@ -8,7 +10,17 @@
the only dimension, is always of size number_of_parameters, while the other dimensions
indicate how many of them there are.
"""
ParameterVector = np.ndarray # 1D array
ArrayOfParameterVectors = np.ndarray # 2D array
GridOfParameterVectors = np.ndarray # 3D array
ParameterVector = np.ndarray # ND array
ArrayOfParameterVectors = np.ndarray # Array of ND arrays
GridOfParameterVectors = np.ndarray # Grid of ND arrays
Weights = np.ndarray # 1D vector of floats from 0-1
DirectionVector = np.ndarray # ND array with same shape as ParameterVector
LossFunction = Callable[
[ParameterVector], float
] # Function that can be scanned with orqviz
GradientFunction = Callable[
[ParameterVector, DirectionVector], float
] # Returns partial derrivative of LossFunction wrt DirectionVector
FullGradientFunction = Callable[
[ParameterVector], np.ndarray
] # Returns all partial derrivatives of LossFunction wrt each parameter
18 changes: 12 additions & 6 deletions src/orqviz/elastic_band/auto_neb.py
Expand Up @@ -3,16 +3,16 @@
import numpy as np
from scipy.interpolate import interp1d

from ..aliases import ParameterVector
from ..aliases import FullGradientFunction, LossFunction, ParameterVector
from .data_structures import Chain
from .neb import run_NEB


# Nudged-Elastic-Band
def run_AutoNEB(
init_chain: Chain,
loss_function: Callable[[ParameterVector], float],
full_gradient_function: Optional[Callable[[ParameterVector], np.ndarray]] = None,
loss_function: LossFunction,
full_gradient_function: FullGradientFunction = None,
n_cycles: int = 4,
n_iters_per_cycle: int = 10,
max_new_pivots: int = 1,
Expand All @@ -39,7 +39,10 @@ def run_AutoNEB(

Args:
init_chain: Initial chain that is optimized with the algorithm.
loss_function: Loss function that is used to optimize the chain.
loss_function: Function that is used to optimize the chain. It must receive only a
numpy.ndarray of parameters, and return a real number.
If your function requires more arguments, consider using the 'LossFunctionWrapper'
class from 'orqviz.loss_function'.
full_gradient_function: Function to calculate the gradient w.r.t.
the loss function for all parameters. Defaults to None.
n_cycles: Number of cycles between which new pivots can be inserted.
Expand Down Expand Up @@ -118,7 +121,7 @@ def run_AutoNEB(

def _insert_pivots_to_improve_approximation(
chain: Chain,
loss_function: Callable[[ParameterVector], float],
loss_function: LossFunction,
max_new_pivots: int = 1,
percentage_tol: float = 0.2,
absolute_tol: float = 0.0,
Expand All @@ -129,7 +132,10 @@ def _insert_pivots_to_improve_approximation(

Args:
chain: Current Chain
loss_function: Loss function for the NEB training
loss_function: Function for NEB training. It must receive only a
numpy.ndarray of parameters, and return a real number.
If your function requires more arguments, consider using the 'LossFunctionWrapper'
class from 'orqviz.loss_function'.
max_new_pivots: Maximum number of pivots inserted to Chain. Defaults to 1.
percentage_tol: Percentage error threshold to insert new pivots.
Be mindful of the magnitude and sign of typical loss values.
Expand Down
21 changes: 14 additions & 7 deletions src/orqviz/elastic_band/data_structures.py
@@ -1,11 +1,12 @@
from __future__ import annotations

from typing import Callable, NamedTuple
from typing import Callable, NamedTuple, Tuple

import numpy as np
from scipy.interpolate import interp1d

from ..aliases import ArrayOfParameterVectors, Weights
from ..aliases import ArrayOfParameterVectors, LossFunction, ParameterVector, Weights
from ..geometric import _norm_of_arrayofparametervectors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are using _norm_of_arrayofparametervectors in a different module than it comes from (i.e. in elastic_band and not in geometric, I think it makes more sense to not make it private?
Not 100% sure though, thoughts @alexjuda?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. It is only a helper function for us though. I doubt that anyone would need that apart from us.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can recommend a 3-fold distinction for project symbols – public/internal/fileprivate based on a convention with leading underscores in the module and symbol names;

  • public: module, function
  • internal: _module, function
  • fileprivate: w/e, _function
    The above use case looks like an "internal" one :). Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexjuda In our case the content of geometric packages should be public, so not really ;)
@MSRudolph fine 🤷‍♂️

from ..scans import eval_points_on_path


Expand All @@ -21,14 +22,14 @@ class Chain(NamedTuple):
pivots: ArrayOfParameterVectors

def get_weights(self) -> Weights:
chain_weights = np.linalg.norm(np.diff(self.pivots, axis=0), axis=1)
chain_weights = _norm_of_arrayofparametervectors(np.diff(self.pivots, axis=0))
chain_weights /= np.sum(chain_weights)
cum_weights = np.cumsum(chain_weights)
matching_cum_weights = np.insert(cum_weights, 0, 0)
matching_cum_weights[-1] = 1
return matching_cum_weights

def evaluate_on_pivots(self, loss_function: Callable) -> np.ndarray:
def evaluate_on_pivots(self, loss_function: LossFunction) -> np.ndarray:
return eval_points_on_path(self.pivots, loss_function)

@property
Expand All @@ -37,7 +38,11 @@ def n_pivots(self) -> int:

@property
def n_params(self) -> int:
return len(self.pivots[0])
return np.prod(self.param_shape)

@property
def param_shape(self) -> Tuple[int, ...]:
return np.atleast_1d(self.pivots[0]).shape


class ChainPath(NamedTuple):
Expand Down Expand Up @@ -65,7 +70,7 @@ def generate_uniform_chain(self, n_points: int) -> Chain:
return self._get_chain_from_weights(weights)

def evaluate_points_on_path(
self, n_points: int, loss_function: Callable, weighted: bool = False
self, n_points: int, loss_function: LossFunction, weighted: bool = False
) -> np.ndarray:
if weighted:
chain = self.generate_chain(n_points)
Expand All @@ -74,8 +79,10 @@ def evaluate_points_on_path(
return chain.evaluate_on_pivots(loss_function)

def _get_chain_from_weights(self, weights: Weights) -> Chain:
distance_between_pivots = np.diff(self.primary_chain.pivots, axis=0)

chain_diff = np.cumsum(
np.linalg.norm(np.diff(self.primary_chain.pivots, axis=0), axis=1)
_norm_of_arrayofparametervectors(distance_between_pivots)
)
chain_diff /= max(chain_diff)
chain_diff = np.insert(chain_diff, 0, 0)
Expand Down
38 changes: 27 additions & 11 deletions src/orqviz/elastic_band/neb.py
Expand Up @@ -2,17 +2,23 @@

import numpy as np

from ..aliases import ParameterVector, Weights
from ..aliases import (
DirectionVector,
FullGradientFunction,
LossFunction,
ParameterVector,
Weights,
)
from ..gradients import calculate_full_gradient
from .data_structures import Chain, ChainPath


def run_NEB(
init_chain: Chain,
loss_function: Callable[[ParameterVector], float],
full_gradient_function: Optional[Callable[[ParameterVector], np.ndarray]] = None,
loss_function: LossFunction,
full_gradient_function: FullGradientFunction = None,
n_iters: int = 10,
eps: float = 0.1,
eps: float = 1e-3,
learning_rate: float = 0.1,
stochastic: bool = False,
calibrate_tangential: bool = False,
Expand All @@ -29,12 +35,16 @@ def run_NEB(

Args:
init_chain: Initial chain that is optimized with the algorithm.
loss_function: Loss function that is used to optimize the chain.
loss_function: Function that is used to optimize the chain. It must receive only a
numpy.ndarray of parameters, and return a real number.
If your function requires more arguments, consider using the 'LossFunctionWrapper'
class from 'orqviz.loss_function'.
full_gradient_function: Function to calculate the gradient w.r.t.
the loss function for all parameters. Defaults to None.
n_iters: Number of optimization iterations. Defaults to 10.
eps: Stencil for finite difference gradient if full_gradient_function
is not provided. Defaults to 0.1.
is not provided. For noisy loss functions,
we recommend increasing this value. Defaults to 1e-3.
learning_rate: Learning rate/ step size for the gradient descent optimization.
Defaults to 0.1.
stochastic: Flag to indicate whether to perform stochastic gradient descent
Expand Down Expand Up @@ -86,16 +96,19 @@ def _full_gradient_function(pars: ParameterVector) -> ParameterVector:

def _get_gradients_on_pivots(
chain: Chain,
loss_function: Callable[[ParameterVector], float],
full_gradient_function: Callable[[ParameterVector], np.ndarray],
loss_function: LossFunction,
full_gradient_function: FullGradientFunction,
calibrate_tangential: bool = False,
) -> np.ndarray:
"""Calculates gradient for every pivot on the chain w.r.t. the loss function
using the gradient function.

Args:
chain: Chain to calculate the gradients on.
loss_function: Loss function for which to calculate the gradient.
loss_function: Function that is used to optimize the chain. It must receive only a
numpy.ndarray of parameters, and return a real number.
If your function requires more arguments, consider using the 'LossFunctionWrapper'
class from 'orqviz.loss_function'.
full_gradient_function: Function to calculate the gradient w.r.t.
the loss function for all parameters.
calibrate_tangential: Flag to indicate whether next neighbor for finding
Expand All @@ -105,7 +118,7 @@ def _get_gradients_on_pivots(

# We initialize with zeros, as we always want first and last gradient
# to be equal to 0.
gradients_on_pivots = np.zeros(shape=(chain.n_pivots, chain.n_params))
gradients_on_pivots = np.zeros(shape=(chain.n_pivots, *chain.param_shape))

for ii in range(1, chain.n_pivots - 1):
before = chain.pivots[ii - 1]
Expand All @@ -118,7 +131,10 @@ def _get_gradients_on_pivots(
if calibrate_tangential and loss_function(after) > loss_function(before):
tan = after - this
tan /= np.linalg.norm(tan)
tangential_grad = np.dot(full_grad, tan) * tan
ax_indices = tuple(range(len(full_grad.shape)))
tangential_grad = (
np.tensordot(full_grad, tan, axes=(ax_indices, ax_indices)) * tan
)
# save update
gradients_on_pivots[ii] = full_grad - tangential_grad

Expand Down
9 changes: 6 additions & 3 deletions src/orqviz/elastic_band/plots.py
Expand Up @@ -3,23 +3,26 @@
import matplotlib
import numpy as np

from ..aliases import ParameterVector
from ..aliases import LossFunction, ParameterVector
from ..plot_utils import _check_and_create_fig_ax
from ..scans import eval_points_on_path
from .neb import Chain


def plot_all_chains_losses(
all_chains: List[Chain],
loss_function: Callable[[ParameterVector], float],
loss_function: LossFunction,
ax: Optional[matplotlib.axes.Axes] = None,
**plot_kwargs,
) -> None:
"""Function to plot

Args:
all_chains: List of Chains to evaluate the loss on.
loss_function: Loss function to evaluate the Chains
loss_function: Function to evaluate the chain pivots on. It must receive only a
numpy.ndarray of parameters, and return a real number.
If your function requires more arguments, consider using the 'LossFunctionWrapper'
class from 'orqviz.loss_function'.
ax: Matplotlib axis to plot on. If None, a new axis is created
from the current figure. Defaults to None.
plot_kwargs: kwargs for plotting with matplotlib.pyplot.plot (plt.plot)
Expand Down
33 changes: 24 additions & 9 deletions src/orqviz/geometric.py
@@ -1,24 +1,24 @@
from typing import Optional, Tuple
from typing import Optional, Tuple, Union

import numpy as np
from scipy.interpolate import interp1d

from .aliases import ArrayOfParameterVectors, ParameterVector
from .aliases import ArrayOfParameterVectors, DirectionVector, ParameterVector


def get_random_normal_vector(dimension: int) -> ParameterVector:
def get_random_normal_vector(dimension: Union[int, Tuple]) -> DirectionVector:
"""Helper function to generate a vector with a specified dimension and norm=1."""
random_vector = np.random.normal(0, 1, size=dimension)
return random_vector / np.linalg.norm(random_vector)


def get_random_orthonormal_vector(base_vector: ParameterVector) -> ParameterVector:
def get_random_orthonormal_vector(base_vector: DirectionVector) -> DirectionVector:
"""Helper function to generate a random orthogonal vector with respect to
a provided base vector."""
random_vector = np.random.normal(size=base_vector.shape)
random_vector = np.random.normal(size=np.shape(base_vector))
new_vector = (
random_vector
- np.dot(random_vector, base_vector)
- np.dot(random_vector.flatten(), base_vector.flatten())
* base_vector
/ np.linalg.norm(base_vector) ** 2
)
Expand Down Expand Up @@ -87,7 +87,7 @@ def relative_periodic_trajectory_wrap(

def get_coordinates_on_direction(
points: ArrayOfParameterVectors,
direction: np.ndarray,
direction: DirectionVector,
origin: Optional[ParameterVector] = None,
in_units_of_direction: bool = False,
) -> np.ndarray:
Expand All @@ -107,12 +107,19 @@ def get_coordinates_on_direction(
norm_direction = np.linalg.norm(direction)
if in_units_of_direction:
direction = direction / norm_direction
return np.dot(points, direction) / norm_direction
return (
np.tensordot(
points,
direction,
axes=(range(1, len(points.shape)), range(len(direction.shape))),
)
/ norm_direction
)


def direction_linspace(
origin: ParameterVector,
direction: np.ndarray,
direction: DirectionVector,
n_points: int,
endpoints: Tuple[float, float] = (-1, 1),
) -> ArrayOfParameterVectors:
Expand Down Expand Up @@ -148,3 +155,11 @@ def uniformly_distribute_trajectory(
)
eval_points = np.linspace(0, 1, num=n_points)
return weight_interpolator(eval_points)


def _norm_of_arrayofparametervectors(param_array: ArrayOfParameterVectors):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the name of the function...
Thoughts, @alexjuda ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just def _norm(param_array: ArrayOfParameterVectors):?

ax_indices = tuple(range(len(param_array.shape)))
t_dot = np.tensordot(
param_array, param_array, axes=(ax_indices[1:], ax_indices[1:])
)
return np.array(np.sqrt(np.diag(t_dot)))