Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No result after run commands #53

Open
pcampiteli opened this issue May 27, 2022 · 9 comments
Open

No result after run commands #53

pcampiteli opened this issue May 27, 2022 · 9 comments

Comments

@pcampiteli
Copy link

Greetings, I'm trying to make use of MCScanX_h, i've prepared the necessaries files following the manual yet my data neither example data is working.

My gff with 5 species gff is edited follwing the CH# gene start end
Ta1 TA20_000001 40390 41754
...

my .homology file achieved by running OrthoFinder, and extracting the pair-wise data as follows for each species withou the third optional collumn
TH179_000002 TH3844_011373
...

Reading other issues on git, the solution of tab delimited files and moving them to the program folder doesn't resolved it. Also the example data returns the same no output.

"using example data"
/home/h.paulocampiteli/MCScanX-master/MCScanX /home/h.paulocampiteli/MCScanX-master/data/
Reading BLAST file and pre-processing
Generating BLAST list
0 matches imported (0 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to /home/h.paulocampiteli/MCScanX-master/data/.collinearity [0.001 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Done! [0.000 seconds elapsed]

"using my own on another folder"
/home/h.paulocampiteli/MCScanX-master/MCScanX_h /storage4/h.paulocampiteli/synteny/mscscan_analysis/
Reading homologs and pre-processing
Generating homolog list
0 homologous pairs imported (0 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to /storage4/h.paulocampiteli/synteny/mscscan_analysis/.collinearity [0.001 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Print statistics:
Species # of collinear homolog pairs # of homolog pairs Percentage

"using my data on the MCscan folder
/home/h.paulocampiteli/MCScanX-master/MCScanX_h /home/h.paulocampiteli/MCScanX-master/MCScanX
Reading homologs and pre-processing
Generating homolog list
0 homologous pairs imported (0 discarded)
0 pairwise comparisons
0 alignments generated
Pairwise collinear blocks written to /home/h.paulocampiteli/MCScanX-master/MCScanX.collinearity [0.001 seconds elapsed]
Writing multiple syntenic blocks to HTML files
Print statistics:
Species # of collinear homolog pairs # of homolog pairs Percentage
Done! [0.001 seconds elapsed]

I could not find any other response regarding this problem. Anyone knows what sorcery I must make to put the program to work?

Thanks in advance

@Botantisty
Copy link

Hey,
Did you ever come up with a solution to this issue? I am encountering the same problem both with the supplied test data and my data.
~Best

@thesnakeguy
Copy link

Same for me, it's not working. I am using the right .gff input data (according to other users since there is conflicting info in the man pages here) -> chr gene start stop. Anyone got this software running?

@pcampiteli
Copy link
Author

pcampiteli commented Jan 10, 2023 via email

@thesnakeguy
Copy link

Many thanks for your reply! I just got it running, the chromosome names of both species needed correct formatting in the gff and there were still some spaces instead of tabs. Now I just tried to visualize things with SynVisio, but I get nothing.. although my collinarity file is definitely not empty... Thanks for you suggestion and best wishes!

@AnezkaKar
Copy link

Hi, thanks for the tip. I have the same problem, this software is not working neither with the example data provided here in the "data" folder nor with my data. I'll try out the other tool then.

@kimlu1998
Copy link

Sorry guys, I could not fix the problem and gave up MCScanX. But I'm using Synima (Synteny Imager) which makes the Synteny analysis using three software options (orthofinder, OrthoMCL, RBH) and plots good quality Synteny graphs 😊 Em ter., 10 de jan. de 2023 7:56 AM, thesnakeguy @.> escreveu:

Same for me, it's not working. I am using the right .gff input data (according to other users since there is conflicting info in the man pages here) -> chr gene start stop. Anyone got this software running? — Reply to this email directly, view it on GitHub <#53 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZLZ65WEOL47YZNFF3Z5AH3WRU56VANCNFSM5XFMBXXA . You are receiving this because you authored the thread.Message ID: @.
>
For Synima, can I use it to detect tandem duplicates? My main goal was to use this to identify tandem duplicates and possibly classify them by gene family. Thank you.

@kimlu1998
Copy link

For Synima, can I use it to detect tandem duplicates? My main goal was to use this to identify tandem duplicates and possibly classify them by gene family. Thank you

@cdizzel
Copy link

cdizzel commented Jul 16, 2024

Hi all,
I was able to get this to work by doing the following:

Homology File

Protein fasta files were cleaned to remove all information other than gene names, and special characters removed.
">Species1||Cb2||7383158||7410177||CQ013704-RA||-1||CDS||3396420774||25012||frame0"
became
">CQ013704"

That was accomplished using the following code and some manual tidy up.
This will differ depending on your files structure.

#!/bin/bash

input_file="Species1-prot.fasta"
output_file="Species1-prot-reformat.fasta"

awk '
    BEGIN { FS="\\|\\|" } 
    /^>/ { 
        split($0, a, "\\|\\|")
        print ">" a[5] 
    } 
    !/^>/ { print }' $input_file > $output_file

A homology search was performed using blastp
blastp -query species1.protein.fa -subject species2.protein.fa -outfmt 6 -evalue 1e-10 -max_hsps 5 -max_target_seqs 5 -out aa_bb.blast

Convert the blast to a .homology using
awk '{print $1, $2, $12}' aa_bb.blast > aa_bb.homology

Find and replace spaces with tabs within aa_bb.homology which results in a file that looks like:

C0000001 CQ025429 689
C0000001 CQ055736 602
C0000002 CQ025428 71.2
C0000003 CQ025424 613
C0000003 CQ055734 575
C0000003 CQ052761 192

BED file

I converted my input gff files with agat_convert_sp_gff2bed.pl.

agat_convert_sp_gff2bed.pl --gff species1.gff3 -o aa.gff
agat_convert_sp_gff2bed.pl --gff species2.gff3 -o bb.gff
cat aa.gff bb.gff > aa_bb.gff

The columns were shifted to the correct order because as other have suggested. The BED (labeled .gff) file formatting needs to be: chr gene start stop

cf1 C0000001 9845 13412
cf1 C0000002 25196 35998
cf1 C0000003 61576 65469
cf1 C0000004 97774 99106
...
qa1 QA053298 6234676 6237979
qa1 QA053299 6297794 6299368
qa1 QA053300 6346001 6350418
qa1 QA053301 6350608 6357388

I made sure the chr names were two letter + number, and all lowercase although I'm not sure that changed anything.
Both the .homology and .gff files were placed into their own directory, in this case "homology" with nothing else in it. I was able to run MCScanX_h outside of its data directory.

cd homology

From within the homology directory MCScanX_h was called using the following
/home/bioinformatics/tools/MCScanX/MCScanX_h aa_bb

Hope this helps someone.

@SalvadorGJ
Copy link

Hi all,

First thank you for all your advices. I would like to add more tips:

  • It's not necessary to follow the 'two letter + number' format to name your chromosomes
  • It's absolutely necessary to add the suffix '.gff' to your costumed BED file, instead of '.bed'

Hope it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants