Training PET on data which is too large to fit in RAM #39

ghost · 2021-06-13T14:08:28Z

I am training a pet model on 500gb of text. I have properly processed the data, but I can't load all my data into a variable since I don't have nearly enough RAM to do that.

chris-aeviator · 2021-06-14T07:39:54Z

Random side note: I believe other projects have been solving this with the deepspeed/ deeperspeed libraries - might need loads of rework codewise before you can use it

ghost · 2021-06-14T09:38:08Z

Oh. That's sad, because I can't any code rework on my own :(

Random side note: I believe other projects have been solving this with the deepspeed/ deeperspeed libraries - might need loads of rework codewise before you can use it

So is there no simple way to do it? Could you help me?

ghost · 2021-06-14T09:38:38Z

In what ways did they use ms deepspeed for this?

timoschick · 2021-06-14T17:42:54Z

Hi @BleepLogger, the focus of PET is few-shot learning from 0-1000 examples. I'm not sure if this is really the right library for you if you've got 500GB of data to train on. We currently don't plan any modifications to PET that would support such large training datasets, so if you really want to use PET, you'll probably have to make them yourself. However, if you just want to use the 500GB of data for pretraining, a better approach would be to first use another library for pretraining and then use the resulting model with PET

ghost · 2021-06-15T03:34:33Z

Okay, I'm willing to make those modifications on my own. How do I make them?

ghost · 2021-06-15T03:39:56Z

Also, I have data in 80 parts.
What if instead of fine-tuning PET on all of my data at once, I first fine-tune it on part 1, then fine-tune part 1's resulting model on part 2, and so on. Could this work?

ghost · 2021-06-15T04:49:13Z

@timoschick

timoschick · 2021-06-24T13:05:31Z

Okay, I'm willing to make those modifications on my own. How do I make them?

Sorry if I haven't been clear. What I was trying to say is that I don't know what the best way to train on such large datasets would be, so you don't just have to implement any modifications yourself, you'll also have to figure out which modifications are required on your own.

timoschick closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training PET on data which is too large to fit in RAM #39

Training PET on data which is too large to fit in RAM #39

ghost commented Jun 13, 2021

chris-aeviator commented Jun 14, 2021

ghost commented Jun 14, 2021

ghost commented Jun 14, 2021

timoschick commented Jun 14, 2021

ghost commented Jun 15, 2021

ghost commented Jun 15, 2021

ghost commented Jun 15, 2021

timoschick commented Jun 24, 2021

Training PET on data which is too large to fit in RAM #39

Training PET on data which is too large to fit in RAM #39

Comments

ghost commented Jun 13, 2021

chris-aeviator commented Jun 14, 2021

ghost commented Jun 14, 2021

ghost commented Jun 14, 2021

timoschick commented Jun 14, 2021

ghost commented Jun 15, 2021

ghost commented Jun 15, 2021

ghost commented Jun 15, 2021

timoschick commented Jun 24, 2021