-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
5 changed files
with
106 additions
and
19 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,20 +1,55 @@ | ||
Parameter space representations | ||
=============================== | ||
|
||
Parameter space representations are :math:`d \times d` objects that define metrics in parameter space such as: | ||
|
||
- Fisher Information Matrices/Gauss-Newton matrix | ||
- Gradient 2nd moment (e.g. the sometimes called *Empirical Fisher*) | ||
- Other covariances such as in Bayesian Deep Learning | ||
|
||
These matrices are often too large to fit in memory, for instance when :math:`d` is in the order of :math:`10^6 - 10^8` | ||
as is typical in current deep networks. Here is a list of parameter space representations that are available in NNGeometry, | ||
computed on a small network, represented as images where each pixel represent a component of the matrix, and the color is | ||
the magnitude of these components. These matrices are normalized by their diagonal (i.e. these are correlation matrices) for | ||
better visualization: | ||
|
||
:class:`nngeometry.object.pspace.PMatDense` representation: this is the usual dense matrix. Memory cost: :math:`d \times d` | ||
|
||
.. image:: https://github.com/tfjgeorge/nngeometry/raw/master/examples/repr_img/PMatDense.png | ||
:width: 400 | ||
|
||
:class:`nngeometry.object.pspace.PMatBlockDiag` representation: a block-diagonal representation where diagonal blocks are | ||
dense matrices corresponding to parameters of a single layer, and cross-layer interactions are ignored (their coefficients are | ||
set to :math:`0`). Memory cost: :math:`\sum_l d_l \times d_l` where :math:`d_l` is the number of parameters of layer :math:`l`. | ||
|
||
.. image:: https://github.com/tfjgeorge/nngeometry/raw/master/examples/repr_img/PMatBlockDiag.png | ||
:width: 400 | ||
|
||
:class:`nngeometry.object.pspace.PMatKFAC` representation :cite:p:`martens2015optimizing, grosse2016kronecker`: a block-diagonal representation where diagonal blocks are | ||
factored as the Kronecker product of two smaller matrices, and cross-layer interactions are ignored (their coefficients are | ||
set to :math:`0`). Memory cost: :math:`\sum_l g_l \times g_l + a_l \times a_l` where :math:`a_l` is the number of neurons of the | ||
input of layer :math:`l` and :math:`g_l` is the number of pre-activations of the output of layer :math:`l`. | ||
|
||
.. image:: https://github.com/tfjgeorge/nngeometry/raw/master/examples/repr_img/PMatKFAC.png | ||
:width: 400 | ||
|
||
:class:`nngeometry.object.pspace.PMatEKFAC` representation :cite:p:`george2018fast`: a block-diagonal representation where diagonal blocks are | ||
factored as a diagonal matrix in a Kronecker factored eigenbasis, and cross-layer interactions are ignored (their coefficients are | ||
set to :math:`0`). Memory cost: :math:`\sum_l g_l \times g_l + a_l \times a_l + d_l` where :math:`a_l` is the number of neurons of the | ||
input of layer :math:`l` and :math:`g_l` is the number of pre-activations of the output of layer :math:`l`, and :math:`d_l` is | ||
|
||
.. image:: https://github.com/tfjgeorge/nngeometry/raw/master/examples/repr_img/PMatEKFAC.png | ||
:width: 400 | ||
|
||
:class:`nngeometry.object.pspace.PMatDiag` representation: a diagonal representation that ignores all interactions between parameters. | ||
Memory cost: :math:`d` | ||
|
||
.. image:: https://github.com/tfjgeorge/nngeometry/raw/master/examples/repr_img/PMatDiag.png | ||
:width: 400 | ||
|
||
:class:`nngeometry.object.pspace.PMatQuasiDiag` representation :cite:p:`ollivier2015riemannian`: a diagonal representation where for each neuron, a coefficient is also | ||
stored that measures the interaction between this neuron's weights and the corresponding bias. | ||
Memory cost: :math:`2 \times d` | ||
|
||
.. image:: https://github.com/tfjgeorge/nngeometry/raw/master/examples/repr_img/PMatQuasiDiag.png | ||
:width: 400 | ||
:width: 400 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
@article{george2018fast, | ||
title={Fast Approximate Natural Gradient Descent in a Kronecker Factored Eigenbasis}, | ||
author={George, Thomas and Laurent, C{\'e}sar and Bouthillier, Xavier and Ballas, Nicolas and Vincent, Pascal}, | ||
journal={Advances in Neural Information Processing Systems}, | ||
volume={31}, | ||
pages={9550--9560}, | ||
year={2018} | ||
} | ||
|
||
@inproceedings{grosse2016kronecker, | ||
title={A kronecker-factored approximate fisher matrix for convolution layers}, | ||
author={Grosse, Roger and Martens, James}, | ||
booktitle={International Conference on Machine Learning}, | ||
pages={573--582}, | ||
year={2016}, | ||
organization={PMLR} | ||
} | ||
|
||
@inproceedings{martens2015optimizing, | ||
title={Optimizing neural networks with kronecker-factored approximate curvature}, | ||
author={Martens, James and Grosse, Roger}, | ||
booktitle={International conference on machine learning}, | ||
pages={2408--2417}, | ||
year={2015}, | ||
organization={PMLR} | ||
} | ||
|
||
@article{ollivier2015riemannian, | ||
title={Riemannian metrics for neural networks I: feedforward networks}, | ||
author={Ollivier, Yann}, | ||
journal={Information and Inference: A Journal of the IMA}, | ||
volume={4}, | ||
number={2}, | ||
pages={108--153}, | ||
year={2015}, | ||
publisher={Oxford University Press} | ||
} |