Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computational Resources and Time #9

Closed
sa5r opened this issue May 17, 2022 · 4 comments
Closed

Computational Resources and Time #9

sa5r opened this issue May 17, 2022 · 4 comments
Labels
question Further information is requested

Comments

@sa5r
Copy link

sa5r commented May 17, 2022

Can you provide a recommendation for the allocated resources of computational power to run one of the downstream tasks, like run_contact including fine-tuning the mode, i.e do_train = True , like the suggest number of cores and memory and how long it is expected to take. And what were the ones used in experiments and how long it took?

I am trying to run the protein contact prediction task on 16 cores and 120GB of memory with an estimation of a week required to get the results, however, I keep getting the process killed because of the insufficient memory space.

@cheng-siyuan
Copy link
Contributor

cheng-siyuan commented May 17, 2022

Can you provide a recommendation for the allocated resources of computational power to run one of the downstream tasks, like run_contact including fine-tuning the mode, i.e do_train = True , like the suggest number of cores and memory and how long it is expected to take. And what were the ones used in experiments and how long it took?

I am trying to run the protein contact prediction task on 16 cores and 120GB of memory with an estimation of a week required to get the results, however, I keep getting the process killed because of the insufficient memory space.

Since we have not used the CPU to run downstream tasks such as contact, we cannot give specific running configuration parameters under this condition, but I can tell you that we used four 32G V100s(GPU) when we fine-tuned this downstream task. Running contacts requires high computing power, so we do hot recommend fine-tuning the model without using the GPUs.

@Alexzhuan Alexzhuan added the question Further information is requested label May 17, 2022
@sa5r
Copy link
Author

sa5r commented May 18, 2022

Thanks. We use GPU to run the code.
Can you provide information about the duration it took you to run any of the downstream tasks including fine-tuning?

@cheng-siyuan
Copy link
Contributor

Thanks. We use GPU to run the code. Can you provide information about the duration it took you to run any of the downstream tasks including fine-tuning?

We ran for about five hours. The specific calculation time depends on the GPUs you use and the epoch size you design.

@jasperhyp
Copy link

jasperhyp commented Sep 22, 2022

It might be late but I also ran into this issue, though I think it's pretty normal to see OOM in contact prediction-- The TAPE contact prediction head (PyTorch version) is indeed memory costly, especially these two steps:

prod = inputs[:, :, None, :] * inputs[:, None, :, :]
diff = inputs[:, :, None, :] - inputs[:, None, :, :]
pairwise_features = torch.cat((prod, diff), -1)

Say you have a peptide with 2000 amino acids, then in the third line you'll have two tensors (assuming batch_size = 1) with shape [1, 2000, 2000, feat_dim] and you're computing a tensor of shape [1, 2000, 2000, 2*feat_dim]. Of course it will easily take up > 40GB GPU mem:

> x = torch.ones(1, 2000, 2000, 1280).type(torch.float)
> sys.getsizeof(x.storage())
20480000048

I think you either need to find a way to more efficiently batch the contact map, or just limit the length of protein in each pass, or do SVD first.

If you are very familiar with PyTorch, I would also think of dynamic batching of tensors, though I don't know exactly how to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants