Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not working without parameter selction #41

Closed
kmoosi opened this issue Jun 16, 2021 · 11 comments
Closed

not working without parameter selction #41

kmoosi opened this issue Jun 16, 2021 · 11 comments
Assignees

Comments

@kmoosi
Copy link

kmoosi commented Jun 16, 2021

Hi,

I don't have paired end reads but as described in previous issues I have copied my input fastq, removed the umi's (16 N long) and used it as input for the second file as shown in the screenshot. I've gotten an error message (no error or minimizer parameters passed. Selecting parameters based on barcode and inferred read length
Inferred read length 55 from sample of 10000 reads)
. Then I've tried to use the example command and only adjusted my input file names and the barcode length and my outfile (cluster) had been generated. But I'm not sure if this is the right parameter selection for my sequences - they are very short - only 55 bases already including 16 bases umi.

But I've tried further if I can use the generated cluster file for calib_cons. No error message here, but empty files. So my question here is, does the described example command refer to the same input files as in the first calib command for clustering or is this another fastq file, different from the input.

To run Calib error correction, run:

calib_cons -c <cluster_file> -q <space_separated_FASTQ_list> -o <space_separated_output_prefix_list>

For example:

calib_cons -c R.cluster -q R1.fastq R2.fastq -o R1. R2.

Thanks in advance and sorry for the probably dumb questions for experts, but I'm new in this topic (:

@baraaorabi
Copy link
Collaborator

I don't think your screenshot is attached. Can you add it again?

Let me make sure I got this correct. You have R1 that has no barcode and is 39bp long and R2 which has the barcode (16bp) and the rest of R2 is identical to R1?

@kmoosi
Copy link
Author

kmoosi commented Jun 17, 2021

calib error1

It's the other way round. I have R1 with 16bp long barcodes followed by 39 bp sequence and R2 with sequence (39bp) only. And yes, there is only the difference of the missing barcodes in the sequences of R2.

@baraaorabi
Copy link
Collaborator

I see. The reason why this is failing, is because Calib default parameter sets have been tested for read length between 60 and 250bp. So you will have to select the parameters yourself. I suggest to start with -e 1 -k 4 -m 5 -t 3. Maybe consider increasing -e to 2 instead.

@kmoosi
Copy link
Author

kmoosi commented Jun 18, 2021

calib error2

Thank you for the quick answer. It's working now - I got my cluster file and tried to do the calib_cons command (screenshot) but all I've got are empty files. Did I choose the wrong input fastq since I've chosen the same as in the calib command?

@baraaorabi
Copy link
Collaborator

baraaorabi commented Jun 18, 2021 via email

@kmoosi
Copy link
Author

kmoosi commented Jun 18, 2021

calib3

is it the first number in the first column? then it will be 94
but I have to say for this first test I only have used a file containing only about the first 101 sequences of my whole ngs data. so maybe the input size and/or variety is to small?

@baraaorabi
Copy link
Collaborator

baraaorabi commented Jun 18, 2021 via email

@kmoosi
Copy link
Author

kmoosi commented Jun 20, 2021

Ok, I've tried it with my complete data set and now it seems to be working. Thank you very much!
I have only one more question for my understanding. I get a fastq and a msa file as an output. The fastq is a list of my consensus reads right? But what's the meaning of the first line of an entry/first two lines of the first entry, especially the number after the @?
Is after this number in the ID line a list of the entries which belong to the consenus?

calib4

And the msa file just lists the consensus generation in detail with all the belonging aligned reads right?

@baraaorabi
Copy link
Collaborator

baraaorabi commented Jun 21, 2021 via email

@kmoosi
Copy link
Author

kmoosi commented Jul 1, 2021

OK, thanks for explaining. and as far as I understood the second file (R2 - consisting of the reads without UMI's) is processed in the same manner as R1 although no UMI's are there?

@baraaorabi
Copy link
Collaborator

Yep, exactly (sorry for late reply, was on vacation)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants