-
Notifications
You must be signed in to change notification settings - Fork 1.1k
add gev distribution and test #1140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@srvasude @brianwa84 Could you help review this pr? Thx~ |
|
Pavel, could you take a look? |
SiegeLordEx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! I glanced over the math, it seemed okay. I mainly have concerns about the gradients in a few spots.
| that supports broadcasting (e.g. `loc + scale` + `concentration` is valid). | ||
|
|
||
| Args: | ||
| loc: Floating point tensor, the means of the distribution(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not the actual mean. Maybe just call it "the location parameter of the distribution(s)"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for pointing this out, I will also fix all other places of misusing mean and loc.
| parameters=parameters, | ||
| name=name) | ||
|
|
||
| @staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_param_shapes and _param_event_ndims methods, we have a new mechanism for describing this. In this case, you can do:
@classmethod
def _parameter_properties(cls, dtype, num_classes=None):
# pylint: disable=g-long-lambda
return dict(
loc=parameter_properties.ParameterProperties(),
scale=parameter_properties.ParameterProperties(
default_constraining_bijector_fn=(
lambda: softplus_bijector.Softplus(low=dtype_util.eps(dtype)))))
concentration=parameter_properties.ParameterProperties(),
# pylint: enable=g-long-lambdaThis new style lets us automatically constrain the parameters to the distribution.
|
|
||
| def _entropy(self): | ||
| # Use broadcasting rules to calculate the full broadcast sigma. | ||
| scale = self.scale * tf.ones_like(self.loc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do tf.broadcast_to(self.scale, ps.broadcast_shape(ps.shape(self.scale), ps.shape(self.loc)) instead, it's a little more self-descriptive (you can drop the comment) and doesn't waste flops.
ps should be defined as from tensorflow_probability.python.internal import prefer_static as ps up top.
|
|
||
| return self.loc + self.scale * mode_z | ||
|
|
||
| def _default_event_space_bijector(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop this entirely, the superclass has a good default implementation for this distribution (it'll use the composition of sigmoid + GevCDF). Identity is particularly bad, since this distribution often isn't supported on the entire real line.
| g1_square = tf.exp(tf.math.lgamma(1. - conc)) ** 2 | ||
| g2 = tf.exp(tf.math.lgamma(1. - 2.*conc)) | ||
|
|
||
| std_z = tf.where(equal_zero, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about NaN gradients.
| less_than_half = tf.less(conc, 0.5) | ||
|
|
||
| g1_square = tf.exp(tf.math.lgamma(1. - conc)) ** 2 | ||
| g2 = tf.exp(tf.math.lgamma(1. - 2.*conc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces around binary ops, please.
| tfd = tfp.distributions | ||
|
|
||
|
|
||
| class _GEVTest(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I had all those concerns about NaN gradients, would you mind adding a test for those? See
probability/tensorflow_probability/python/distributions/generalized_normal_test.py
Lines 168 to 186 in 612a9cc
| @test_util.numpy_disable_gradient_test | |
| def testFiniteGradientAtDifficultPoints(self): | |
| def make_fn(dtype, attr): | |
| x = np.array([-100., -20., -5., 5., 20., 100.]).astype(dtype) | |
| return lambda m, s, p: getattr( # pylint: disable=g-long-lambda | |
| tfd.GeneralizedNormal(loc=m, scale=s, power=p, validate_args=True), | |
| attr)(x) | |
| # TODO(b/157524947): add 'log_cdf', currently fails at -100, -20, in fp32. | |
| for attr in ['log_prob', 'prob', 'cdf']: | |
| value, grads = self.evaluate(tfp.math.value_and_gradient( | |
| make_fn(self.dtype, attr), | |
| [tf.constant(0, self.dtype), # mu | |
| tf.constant(1, self.dtype), # scale | |
| tf.constant(2.1, self.dtype)])) # power | |
| self.assertAllFinite(value) | |
| self.assertAllFinite(grads[0]) # d/d mu | |
| self.assertAllFinite(grads[1]) # d/d scale | |
| self.assertAllFinite(grads[2]) # d/d power |
Hi @SiegeLordEx , Thanks a lot for all your valuable comments. I have already fixed all of them(although I didn't reply one by one inline :) ). There are two places to point out:
Let me know if there is anything else needed to be updated. |
|
Looks good, thanks. I'll send this along to our internal review and that'll get this merged. |
PiperOrigin-RevId: 344308630
Resolve issue #874