Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception in thread "main" java.lang.NegativeArraySizeException #7

Open
Al313 opened this issue Dec 11, 2023 · 4 comments
Open

Exception in thread "main" java.lang.NegativeArraySizeException #7

Al313 opened this issue Dec 11, 2023 · 4 comments

Comments

@Al313
Copy link

Al313 commented Dec 11, 2023

Describe the bug

When trying to create the "strandcount" tables from bam files I get the following error:

./my-data/step0-input/iii-13-13MT2EXPIIIVP10FullRepseq25052018_S1_L001-sorted.bam.
Exception in thread "main" java.lang.NegativeArraySizeException: -449
	at makereadcount.Read.<init>(Read.java:39)
	at makereadcount.MakeReadCount.readFromFile(MakeReadCount.java:447)
	at makereadcount.MakeReadCount.readData(MakeReadCount.java:427)
	at makereadcount.MakeReadCount.run(MakeReadCount.java:62)
	at makereadcount.MakeReadCount.main(MakeReadCount.java:44)

Here's the code that I run for this step:

for i in $(ls ./my-data/step0-input); do echo ${I} && java -cp ./lib/htsjdk-unspecified-SNAPSHOT.jar:./lib/picocli-4.1.2.jar:./lib/pal-1.5.1.jar:./lib/cache2k-all-1.0.2.Final.jar:./lib/commons-math3-3.6.1.jar:./jar/MakeReadCount.jar makereadcount.MakeReadCount ./my-data/step0-input/${i}; done

Additional context

  1. The same command works for example bam files provided within the package.
  2. My bam files were created using BWA aligner with default parameters.
@cristina86cristina
Copy link
Contributor

cristina86cristina commented Dec 11, 2023 via email

@Al313
Copy link
Author

Al313 commented Dec 12, 2023

Thank you very much for the swift response.

We have done paired-end short-read Illumina sequencing of HIV-1 genome as part of a long-term experimental evolution. Currently there are more than 50 longitudinal sequencing data for each of our 8 independent passaging lines.

The reference sequence that I used is the consensus sequence of the plasmid used to produce the ancestral virus (HIV-1 NL4-3) in fasta format. The command that I use for mapping is as follows (it's a snippet of my snakemake pipeline):

rule all:
    input:
        expand("/home/amovas/data/scrap/test-harold/HaROLD/my-data/step0-input/{experiment}-{line}-{sample}-sorted.bam", zip,experiment=EXP, line=LINES, sample=SAMPLES)



rule bwa_map:
    input:
        ref="/home/amovas/data/genome-evo-proj/data/reference/plasmid/hiv_plasmid_ref_genome.fasta", 
        read1="/home/amovas/data/genome-evo-proj/data/freezed-raw-data/fastq/{experiment}/{line}/{sample}_R1_001.fastq.gz",
        read2="/home/amovas/data/genome-evo-proj/data/freezed-raw-data/fastq/{experiment}/{line}/{sample}_R2_001.fastq.gz"
    output:
        temp("/home/amovas/data/scrap/test-harold/HaROLD/my-data/step0-input/{experiment}-{line}-{sample}.bam.tmp")
    threads: 1
    shell:
        "(bwa mem -t {threads} {input.ref} {input.read1} {input.read2} | "
        "samtools view -Sb - > {output})"



rule samtools_sort:
    input:
        "/home/amovas/data/scrap/test-harold/HaROLD/my-data/step0-input/{experiment}-{line}-{sample}.bam.tmp"
    output:
        "/home/amovas/data/scrap/test-harold/HaROLD/my-data/step0-input/{experiment}-{line}-{sample}-sorted.bam"
    shell:
        "samtools sort {input} > {output}"

Please let me know if it would help if I share one sequencing sample + reference sequence to pinpoint the problem (unfortunately sequencing files are a bit larger than GitHub size limit to share here).

Thanks in advance.

@cristina86cristina
Copy link
Contributor

cristina86cristina commented Dec 12, 2023

Thanks for the info. I think you need to run the first step in the README file under "prepare input files". Even if not necessary depending on the software used for alignment, it is needed for bam files produced with bwa.

samtools view -h -G69 file.bam | samtools view -h -G133 > newfile.bam

This gets rid of unmapped/duplicated multi-aligned reads. You can check the number of mapped reads with samtools in both files:

samtools view -c -F 260 file.bam

samtools view -c -F 260 newfile.bam

This should give you the same number.

Let me know if this solves the problem! If not, yes please share a bam file with me and I'll have a look!

@Al313
Copy link
Author

Al313 commented Dec 12, 2023

Yes, I did try running "makereadcount" both on original and new bam files but the error persisted.

I have uploaded the reference sequence + sequencing reads of one sample on Google Drive. I will share the link with you via email.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants