Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vicaller.pl 1164 line is strange. #2

Closed
byeonggill opened this issue Apr 11, 2019 · 9 comments
Closed

Vicaller.pl 1164 line is strange. #2

byeonggill opened this issue Apr 11, 2019 · 9 comments

Comments

@byeonggill
Copy link

byeonggill commented Apr 11, 2019

Hi, I have three issue.

  1. I find a strange line at Vicaller.pl
    I don't know perl well but line 1164 seems to something wrong.
    image
    system ("perl ${directory}Scripts/Extract_fasta.pl $GI $virus_genome >${directory}Database/GI/${GI}.fa");

  2. and file name is incorrect: "Scripts/Extract_read_informationpl "--> "Scripts/Extract_read_information.pl"
    Are the two changes correct?

  3. I have trouble running validate function.
    I get the following warning:
    Warning: [blastn] Query is Empty!
    Warning: [blastn] Query is Empty!
    Warning: [blastn] Query is Empty!

but my config and blastn probably have no problem.
so, I take a look at Vicaller.pl
blastn run using query "${input_sampleID2}_aligned_both.fas"
That means "${input_sampleID2}_aligned_both.fas" is not created.

${input_sampleID2}_aligned_both.fas is made this function.

validate function

sub v_obtain_seq {
system ("perl ${directory}Scripts/Extract_specific_loci_final_reads.pl ${input_sampleID}_f2 ${input_sampleID2}.virus_f2 >${input_sampleID2}.information");
my $cmd=q(awk '{print$8}');
system ("$cmd ${input_sampleID2}.information |sort |uniq >${input_sampleID2}.id");
system ("perl ${directory}Scripts/Extract_fastq.pl -f ${input_sampleID}_1.1fuq -b ${input_sampleID2}.id -o ${input_sampleID2}_aligned_1.fuq");
system ("perl ${directory}Scripts/Extract_fastq.pl -f ${input_sampleID}_2.1fuq -b ${input_sampleID2}.id -o ${input_sampleID2}_aligned_2.fuq");
system ("perl ${directory}Scripts/Extract_fuq_split.pl -f ${input_sampleID}_1sf.fuq -b ${input_sampleID2}.id -o ${input_sampleID2}_aligned_sf.fuq");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta.pl ${input_sampleID2}");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta_for_split_read.pl ${input_sampleID2}");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta.pl ${input_sampleID2}");
system ("perl ${directory}Scripts/Convert_fastq_to_fasta_for_split_read.pl $input_sampleID2");
system ("cat ${input_sampleID2}_aligned.fas ${input_sampleID2}_aligned_sf.fas >${input_sampleID2}_aligned_both.fas");
}

In conclusion, i wonder ${input_sampleID2}.virus_f2 is a previously generated file.
Or is there something else wrong?

  1. Among the parameter values ​​in the validation function, Is the virus name only an abbreviation?
    Should not the full name in the output file 'Candidate_virus' column is possible?
@xunchen85
Copy link
Owner

Thanks for running the VIcaller_v1.1.

  1. Yes, that path is the one i use on our server, i have revised it correspondingly.

  2. I have delete the Extract_read_information.pl script which is only used when i debug the tool. And the two changes you made are correct.

  3. There is a file is not created, I have revised the VIcaller.pl script. I also corrected the script of "Extract_fuq_split.pl" which have a bug about the "\n".

With those changes, you should be able to run the validation function now.

Please let me know if you found any other bugs. Thanks.

@byeonggill
Copy link
Author

I appreciate your help.

I download revised scripts and running validate function and calculate function.
but, I got the same error.

My data is targeted sequencing data.
and I attach output file.
fcd_Genareal_1.xlsx

file lists:
image

validate cmd: nohup perl /data/program/VIcaller_v1.1/VIcaller.pl validate -i 11FCD -c /data/program/VIcaller_v1.1/VIcaller.config -t 4 -S 11FCD_14_75126278_75126928_abelson_murine_leukemia_virus_61487 -G 61487 -V murine_leukemia_virus > validate.out
validate.out:
image

calculate cmd: nohup perl /data/program/VIcaller_v1.1/VIcaller.pl calculate -i 11FCD -c /data/program/VIcaller_v1.1/VIcaller.config -t 4 -F .fastq.gz -C 14 -P 75126936 -B 2 -N 147 > cal.out
cal.out:
image

@xunchen85
Copy link
Owner

A small correction was made for the VIcaller.pl script, can you try the latest one.

May I ask if you enriched specific viruses for target sequencing? What is the read length?

@byeonggill
Copy link
Author

I'm sorry that the answer was delayed.
The validate function seems to work normally.
but, calculate function have error.
In my samtools version,
system ("${samtools_d}samtools view ${input_sampleID}s_h.bam chr${Chr}:${position1}-${position2} >${input_sampleID}${Chr}_${Position}_h.sort.sam")
chr1 not 1

Anyway, why does it change to 1 even if I put another number in the c parameter?

cmd: nohup perl /data/program/VIcaller_v1.1/VIcaller.pl calculate -i PYH024 -t 4 -F _s_h.bam -C 4 -I -P 112387642 -B 2 -N 96 > cal.out

this is my revision.
image

this is results.
image

@xunchen85
Copy link
Owner

If the input file is FASTQ format, you can use "" for -F parameter. Or you can change to "-F _h.bam" if you have the PYH024_h.bam file indexed.

For the chromosome ID issue, because the "chr" for the chromosome IDs are deleted in the output file in the current VIcaller version, thus you can add the "chr" in the script to extract corresponding reads from the BAM file for now.

I will update the VIcaller.pl in the next version to support any chromosome ID format soon.

@byeonggill
Copy link
Author

My previous question is even if i take parameter -C 4, VIcaller receive chr1.

Sorry but i have another question.
I want to modify the virus library.
I want viral genome library composed only hepatitis. so, I modified viral.fa, viral.taxonomy and viral.virus_list files.
but, virus names in output files were recorded of unknown.
virus GI seem to normally.

Please let me know how to modify viral genome library in detail.

@xunchen85
Copy link
Owner

Sorry for my late reply.

Really appreciate for your feedback, i have correct the bug in the VIcaller main script.
Specifically, you can either downloaded the latest VIcaller script, or directly revised the line 8 defining the chromosome ID as follow:
" "C|Chr=s" => $Chr,"

For your second question:
Can you show me an example of your modified three files? If you are able to detect the virus GI in the FASTA file, but cannot find the corresponding virus name, it means something wrong with your modified virus_list, or viral.taxonomy file; Or it may means that you did not specify the correct paths for those two files in the config file.

virus_list file: only have two column, 1) GI #, and 2) virus_name without space;
viral.taxonomy: 11 columns separated by "$": column 1: GI; column 2: GB, column 3: length; column 6: original virus name; column 9: original virus name; column 10: modified_virus_name (no space)

As an example of HBV, you can modify the corresponding columns, and you should be able to find all the info from the NCBI NT database using the GI:

110225259$AB246317.1$3185$Hepatitis B virus; Viruses; Retro-transcribing viruses; Hepadnaviridae; Orthohepadnavirus.$Hepatitis B virus DNA, complete genome, isolate: TA112.$Hepatitis B virus$TA112$0$Hepatitis B virus$hepatitis_b_virus$0

@byeonggill
Copy link
Author

thank you,
I'm running it just using non-modified virome reference.

but, I have another question,
I have some problem when using parameter -r in detect function such as Repeatmasker is so confused and i can't download Rebase database and TRF output position is strange.

so, I did not use parameter -r. but, I'm worried that this might be big difference.
Can you give me advice on removing repeat region?

@xunchen85
Copy link
Owner

For the repeatmasker, you need to apply for Rebase database and then use it with repeatmasker follow their online manuscript.

You can show some example of TRF output, I can check it out for you.

I would suggest you to disable Repeatmasker function in the script, and then only use TRF and DUST to remove repeat regions. I would be happy to help you with it if you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants