Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example fails invoking mafft #12

Open
rcedgar opened this issue Jun 4, 2021 · 10 comments
Open

Example fails invoking mafft #12

rcedgar opened this issue Jun 4, 2021 · 10 comments

Comments

@rcedgar
Copy link

rcedgar commented Jun 4, 2021

Cloned git repo today clean Ubuntu (AWS c5a.4xlarge instance with Ubuntu 20.04).
Installed dendropy dependency.

cd example
python3 ../magus.py -d outputs -i unaligned_sequences.txt -o magus_result.txt

# ...some output deleted...

subprocess.CalledProcessError: 
Command '/home/ubuntu/magus/MAGUS-master/tools/mafft/mafft --localpair --maxiterate 1000 --ep 0.123 --quiet --thread 16 --anysymbol /home/ubuntu/magus/MAGUS-master/example/outputs/decomposition/initial_tree/skeleton_sequences.txt > /home/ubuntu/magus/MAGUS-master/example/outputs/decomposition/initial_tree/temp_initial_align.txt' 
returned non-zero exit status 126.
@vlasmirnov
Copy link
Owner

Thanks a lot for writing.
Regarding your issue, there are two possibilities that come to mind:

  1. Make sure you've permissioned MAFFT (and the other tools that are packaged with MAGUS)
  2. You might need to replace the packaged MAFFT executable with one built for your system (https://mafft.cbrc.jp/alignment/software/)

Please let me know if any of this helps. Also, there might be more information in the error log.

@rcedgar
Copy link
Author

rcedgar commented Jun 4, 2021

Vlad -- Thanks for the quick reply.

The install instructions do not mention setting permissions.

The execute bit was not set for the main mafft script, but setting it did not fix the problem, I get the same error after the execute bit is set.

FYI, this is for comparative validation against other MSA methods and I have limited patience for trouble-shooting buggy code / buggy install instructions here. If you can provide complete instructions for setting up MAGUS on a clean Ubuntu 20.04 I will be glad to include MAGUS in the comparison. A simple way for you to fix the install instructions is to install on a clean Ubuntu 20.04 on an AWS t2.micro instance, this is free tier so will not cost anything.

@vlasmirnov
Copy link
Owner

Sounds good, I'll take a look at what's going on with AWS when I get the chance. I apologize for the inconvenience. My guess is that this MAFFT distribution was for debian, although it seems to work on my home Ubuntu.

In the meantime, if you were planning to include MAFFT in your comparison, the easiest thing to do would be to overwrite MAGUS's packaged MAFFT with your working MAFFT copy (the mafft script goes into tools/mafft/mafftdir/bin, and the binaries go into tools/mafft/mafftdir/libexec). Or, in configuration.py, change the "mafftPath" line to wherever the mafft script is installed.

Alternatively, if you were planning to include PASTA in your comparison, MAGUS copies PASTA's directory structure for MAFFT, so you can just copy PASTA's MAFFT installation directly over.

@rcedgar
Copy link
Author

rcedgar commented Jun 4, 2021

I do plan to include stand-alone MAFFT, but I don't see the relevance -- I would install it and run it on a totally separate machine (i.e. separate AWS instance) without MAGUS. What is PASTA? Maybe I should ask which Warnow lab method(s) I should be testing for large input datasets?

@vlasmirnov
Copy link
Owner

PASTA (https://github.com/smirarab/pasta) is an alignment method for large datasets, which grew out of a previous method called SATe II. MAGUS grew out of PASTA in turn. In a sense, PASTA is "SATe III" and MAGUS is "SATe IV".
For very large datasets, another method to consider is UPP (https://github.com/smirarab/sepp). It tends to be faster than PASTA/MAGUS, but accuracy tends to suffer.

The best choice of method would depend on how large your datasets are. MAGUS and PASTA both use MAFFT -linsi internally, so if your dataset is a few hundred sequences, then standalone MAFFT -linsi should give about the same result. For larger and more heterogeneous datasets, the other methods tend to give better results.

@rcedgar
Copy link
Author

rcedgar commented Jun 4, 2021

Great feedback thanks. My main interest is in aligning 140k RdRP sequences for novel RNA virus species recently discovered by mining the SRA https://www.biorxiv.org/content/10.1101/2020.08.07.241729v2, which has got me interested in MSA methods again and I'm working on a new algorithm and a new benchmark. The RdRPs are an ideal real-world case for applying and validating methods like MAGUS because there is an independent check on the alignments by identifying conserved motifs https://github.com/rcedgar/palmscan.

@vlasmirnov
Copy link
Owner

I see, that makes sense. I'd be very curious to see how well MAGUS performs on biological datasets different from those that we used to test it. If you'd like to obtain a MAGUS alignment but are having issues getting it to work in your environment, I'd be happy to try aligning your dataset on our campus cluster.

@MinhyukPark
Copy link

bumping an old thread but I encountered the same issue while running MAGUS in a vm and instead of setting the individual scripts as executable, chmod -R +x ./tools/mafft/ seemed to do the trick for me.

@rmukaila
Copy link

rmukaila commented Apr 13, 2022

I had same issue recently, but it turns out MAGUS has issues with presence of special characters in sequences headers. A friend reported that preprocessing sequence headers to look as simple as the example sequences file in the MAGUS repo fixed it. That's if you are are able to run the example sequences without problems

@lrauschning
Copy link
Collaborator

Ran into a similar issue while writing an nfcore module for MAGUS.
Just chatted with @mashehu at the NFCore hackathon and we were able to figure out the issue is caused by the chown version of busybox (which runs on the AWS machines) not having the --from parameter.
Writing here if anyone else comes across this in the future, took quite a while to figure out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants