-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Release date and comparison question #1
Comments
Hi Robert,
We will be releasing the official code before the NeurIPS conference. In
the meantime you could find code from the supplementary material on
openreview (https://openreview.net/forum?id=h8Bd7Gm3muB) - Note that it is
still a bit rough around the edges.
In terms of comparison to FRePo:
1. Our method is primarily designed for kernel ridge regression with
infinite width NNGP/NTK kernels, whereas theirs is mainly concerned with
training networks with GD. You'll notice that the finite network
performance in our paper is a bit less emphasized due to this. Also note
that we use a very wide network when we are doing finite network SGD, so
directly comparing the performance of the two is a bit difficult.
2. You could actually consider FRePo and RFAD to be the same algorithm if
we set certain hyperparameters to be the same, namely:
1. Set the FRePo max-online-steps to be equal to 1 (so that we sample a new
random model every time we use it)
2. Set the RFAD number of models parameters (M) to be 1 (so that rather
than using multiple networks at each iteration we use a single one). Note
that in our paper we use M = 8 as the default
3. Use the MSE loss instead of the platt-loss in RFAD
4. Also RFAD uses a slighly different architecture than us (they use a
different # of conv channels for each layer, whereas we keep it the same)
Overall it's pretty interesting that these two papers came out at the same
time and both use a very similar idea of using the conjugate/NNGP kernel
3. Because our algorithm's runtime is proportional to M, and we use M=8, we
would expect FRePo to run around 8x faster than the default settings for
RFAD, but if you use M=1 they should have the same memory/time complexity
in theory
4. In practice, our code RFAD isn't very well optimized, so you could
probably shave a good bit of time by moving it to a faster library like
JAX, where everything can be jit-compiled. Note that FRePo using JAX so it
gets a bit of a speed boost just from doing that.
Thanks for showing interest in our paper. If you have any more questions
I'd be happy to answer.
Noel
…On Mon, Nov 21, 2022 at 11:22 AM Robert Krug ***@***.***> wrote:
Hello,
I currently plan to write my Master's thesis about Data Distillation and I
am very interested in your work. Is there already a date for the
publication of your code, or any other way to get access to it?
Also, I would like to ask how you evaluate the computational time and
memory requirements of RFAD compared to the FRePo method of the paper "Dataset
Distillation using Neural Feature Regression"
<https://arxiv.org/pdf/2206.00719.pdf>?
Thank you very much in advance!
—
Reply to this email directly, view it on GitHub
<#1>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACPJPMZ452QYTPYWO7EIWJLWJOOUNANCNFSM6AAAAAASG2JEOA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
I currently plan to write my Master's thesis about Data Distillation and I am very interested in your work. Is there already a date for the publication of your code, or any other way to get access to it?
Also, I would like to ask how you evaluate the computational time and memory requirements of RFAD compared to the FRePo method of the paper "Dataset Distillation using Neural Feature Regression" ?
Thank you very much in advance!
The text was updated successfully, but these errors were encountered: