Computational Resources and Time #9

sa5r · 2022-05-17T07:58:08Z

Can you provide a recommendation for the allocated resources of computational power to run one of the downstream tasks, like run_contact including fine-tuning the mode, i.e do_train = True , like the suggest number of cores and memory and how long it is expected to take. And what were the ones used in experiments and how long it took?

I am trying to run the protein contact prediction task on 16 cores and 120GB of memory with an estimation of a week required to get the results, however, I keep getting the process killed because of the insufficient memory space.

cheng-siyuan · 2022-05-17T08:54:25Z

Can you provide a recommendation for the allocated resources of computational power to run one of the downstream tasks, like run_contact including fine-tuning the mode, i.e do_train = True , like the suggest number of cores and memory and how long it is expected to take. And what were the ones used in experiments and how long it took?

I am trying to run the protein contact prediction task on 16 cores and 120GB of memory with an estimation of a week required to get the results, however, I keep getting the process killed because of the insufficient memory space.

Since we have not used the CPU to run downstream tasks such as contact, we cannot give specific running configuration parameters under this condition, but I can tell you that we used four 32G V100s(GPU) when we fine-tuned this downstream task. Running contacts requires high computing power, so we do hot recommend fine-tuning the model without using the GPUs.

sa5r · 2022-05-18T01:20:07Z

Thanks. We use GPU to run the code.
Can you provide information about the duration it took you to run any of the downstream tasks including fine-tuning?

cheng-siyuan · 2022-05-18T03:01:19Z

Thanks. We use GPU to run the code. Can you provide information about the duration it took you to run any of the downstream tasks including fine-tuning?

We ran for about five hours. The specific calculation time depends on the GPUs you use and the epoch size you design.

jasperhyp · 2022-09-22T14:21:50Z

It might be late but I also ran into this issue, though I think it's pretty normal to see OOM in contact prediction-- The TAPE contact prediction head (PyTorch version) is indeed memory costly, especially these two steps:

prod = inputs[:, :, None, :] * inputs[:, None, :, :]
diff = inputs[:, :, None, :] - inputs[:, None, :, :]
pairwise_features = torch.cat((prod, diff), -1)

Say you have a peptide with 2000 amino acids, then in the third line you'll have two tensors (assuming batch_size = 1) with shape [1, 2000, 2000, feat_dim] and you're computing a tensor of shape [1, 2000, 2000, 2*feat_dim]. Of course it will easily take up > 40GB GPU mem:

> x = torch.ones(1, 2000, 2000, 1280).type(torch.float)
> sys.getsizeof(x.storage())
20480000048

I think you either need to find a way to more efficiently batch the contact map, or just limit the length of protein in each pass, or do SVD first.

If you are very familiar with PyTorch, I would also think of dynamic batching of tensors, though I don't know exactly how to do this.

Alexzhuan added the question Further information is requested label May 17, 2022

Alexzhuan closed this as completed May 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Computational Resources and Time #9

Computational Resources and Time #9

sa5r commented May 17, 2022

cheng-siyuan commented May 17, 2022 •

edited

sa5r commented May 18, 2022

cheng-siyuan commented May 18, 2022

jasperhyp commented Sep 22, 2022 •

edited

Computational Resources and Time #9

Computational Resources and Time #9

Comments

sa5r commented May 17, 2022

cheng-siyuan commented May 17, 2022 • edited

sa5r commented May 18, 2022

cheng-siyuan commented May 18, 2022

jasperhyp commented Sep 22, 2022 • edited

cheng-siyuan commented May 17, 2022 •

edited

jasperhyp commented Sep 22, 2022 •

edited