Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using FUBAR cache to resume a stalled run? #1569

Closed
hayleyjaywilson opened this issue Feb 13, 2023 · 9 comments
Closed

Using FUBAR cache to resume a stalled run? #1569

hayleyjaywilson opened this issue Feb 13, 2023 · 9 comments

Comments

@hayleyjaywilson
Copy link

Hello I am (new to) using FUBAR to determine selection in my alignment. I am working on a slurm system where my job runtime is limited to 12 hours. My FUBAR job runs for the 12 hours and then exits without completing. Am I right in thinking the .cache stores this information so that a run can be resumed from where it stopped rather than restarting? I have tried to resubmit this job but it seems like it just starts from scratch each time. My command is:

hyphy fubar ENV="USE_MEMORY_SAVING_DATA_STRUCTURES=1e8;" --alignment FINAL_for_fubar.fasta --tree FINAL_for_dnds_only.final_tree.tree

Am I missing something? Apologies if this is obvious.
Thanks

@spond
Copy link
Member

spond commented Feb 13, 2023

Dear @hayleyjaywilson,

How big is your alignment? FUBAR will save "milestone" checkpoints, so it doesn't have to redo them, but it does not store partial step information. In other words, if a step (e.g. fitting some model stage) takes > 12 hours, then it won't be cached.

I am always interested in testing interesting use cases, so if you don't mind sharing the input file with me, I'll be happy to run and report on what is going on.

Best,
Sergei

@hayleyjaywilson
Copy link
Author

hayleyjaywilson commented Feb 14, 2023 via email

@spond
Copy link
Member

spond commented Feb 14, 2023

Dear @hayleyjaywilson,

Yeah, that might do it. I am curious to benchmark this kind of alignment because I don't routinely come across something this long. As this is likely a big file, please use the following dropbox link to send it to me (thanks!). It'll stay open for a few days.

https://www.dropbox.com/request/8tvfcvj8R6yZLqbLN7Pq

Best,
Sergei

@spond
Copy link
Member

spond commented Feb 15, 2023

Dear @hayleyjaywilson,

Got your file, working through it. I don't think I've ever had such a large and LONG alignment to run HyPhy on, so it's exposing some interesting behavior. Thanks for sharing -- I'll post the results here when I have them, and I will probably also spend some time optimizing the code for the next release so such "large and long" alignments are processed more efficiently.

Stay tuned.

Best,
Sergei

@hayleyjaywilson
Copy link
Author

hayleyjaywilson commented Feb 15, 2023 via email

@spond
Copy link
Member

spond commented Feb 15, 2023

Dear @hayleyjaywilson,

In 999/1000 cases HyPhy is used for single gene alignments, so we are talking on the order of 1000s bp. Occasionally it has been used on smaller LONG alignments (~10s of sequences and >>1000s bp). I think this is the first case I know of when there are >1000 sequences and >1000000bp. As a software developer, I really appreciate the chance to stress test things and find ways to improve HyPhy performance.

Best,
Sergei

@github-actions
Copy link

Stale issue message

@hayleyjaywilson
Copy link
Author

hayleyjaywilson commented Apr 24, 2023 via email

@spond
Copy link
Member

spond commented Apr 24, 2023

Dear @hayleyjaywilson,

GitHub automation sometimes closes issues due to inactivity. I performed some diagnostics and realized that the current implementation of HyPhy does not scale for such long alignments. I started doing some optimizations to improve the performance but didn't quite finish.

Let me circle back to it. I have an idea for an algorithmic improvement for such large datasets (it's based on https://pubmed.ncbi.nlm.nih.gov/34734192/) which allows much faster approximate estimation on such long alignments).

Best,
Sergei

@spond spond reopened this Apr 24, 2023
@github-actions github-actions bot closed this as completed May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants