Release date and comparison question #1

RK-TUI · 2022-11-21T16:22:20Z

Hello,
I currently plan to write my Master's thesis about Data Distillation and I am very interested in your work. Is there already a date for the publication of your code, or any other way to get access to it?

Also, I would like to ask how you evaluate the computational time and memory requirements of RFAD compared to the FRePo method of the paper "Dataset Distillation using Neural Feature Regression" ?

Thank you very much in advance!

yolky · 2022-11-21T16:48:27Z

Hi Robert, We will be releasing the official code before the NeurIPS conference. In the meantime you could find code from the supplementary material on openreview (https://openreview.net/forum?id=h8Bd7Gm3muB) - Note that it is still a bit rough around the edges. In terms of comparison to FRePo: 1. Our method is primarily designed for kernel ridge regression with infinite width NNGP/NTK kernels, whereas theirs is mainly concerned with training networks with GD. You'll notice that the finite network performance in our paper is a bit less emphasized due to this. Also note that we use a very wide network when we are doing finite network SGD, so directly comparing the performance of the two is a bit difficult. 2. You could actually consider FRePo and RFAD to be the same algorithm if we set certain hyperparameters to be the same, namely: 1. Set the FRePo max-online-steps to be equal to 1 (so that we sample a new random model every time we use it) 2. Set the RFAD number of models parameters (M) to be 1 (so that rather than using multiple networks at each iteration we use a single one). Note that in our paper we use M = 8 as the default 3. Use the MSE loss instead of the platt-loss in RFAD 4. Also RFAD uses a slighly different architecture than us (they use a different # of conv channels for each layer, whereas we keep it the same) Overall it's pretty interesting that these two papers came out at the same time and both use a very similar idea of using the conjugate/NNGP kernel 3. Because our algorithm's runtime is proportional to M, and we use M=8, we would expect FRePo to run around 8x faster than the default settings for RFAD, but if you use M=1 they should have the same memory/time complexity in theory 4. In practice, our code RFAD isn't very well optimized, so you could probably shave a good bit of time by moving it to a faster library like JAX, where everything can be jit-compiled. Note that FRePo using JAX so it gets a bit of a speed boost just from doing that. Thanks for showing interest in our paper. If you have any more questions I'd be happy to answer. Noel

…

On Mon, Nov 21, 2022 at 11:22 AM Robert Krug ***@***.***> wrote: Hello, I currently plan to write my Master's thesis about Data Distillation and I am very interested in your work. Is there already a date for the publication of your code, or any other way to get access to it? Also, I would like to ask how you evaluate the computational time and memory requirements of RFAD compared to the FRePo method of the paper "Dataset Distillation using Neural Feature Regression" <https://arxiv.org/pdf/2206.00719.pdf>? Thank you very much in advance! — Reply to this email directly, view it on GitHub <#1>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACPJPMZ452QYTPYWO7EIWJLWJOOUNANCNFSM6AAAAAASG2JEOA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release date and comparison question #1

Release date and comparison question #1

RK-TUI commented Nov 21, 2022

yolky commented Nov 21, 2022 via email

Release date and comparison question #1

Release date and comparison question #1

Comments

RK-TUI commented Nov 21, 2022

yolky commented Nov 21, 2022 via email