dbm.error: db file doesn't exist; use 'c' or 'n' flag to create a new db #208

sunsetyerin · 2023-01-08T06:39:27Z

Describe the bug
I was able to generate out_SampComp.db file but cannot be opened

To Reproduce
Steps to reproduce the behavior:

singularity shell "library://aleg/default/nanocompore:1.0.3"
python
from nanocompore.SampCompDB import SampCompDB, jhelp
db = SampCompDB (
db_fn = "/path/to/out_SampComp.db",
fasta_fn = "/path/to/transcriptome_reference.fa")

Expected behavior
open out_SampComp.db file and assign to 'db' object

Error
db = SampCompDB (
db_fn = "/projects/ly_vu_dir... ect_rna/MetaCompore/results/nanocompore/nanocompore_sampcomp/outSampComp.db",
fasta_fn = "/proje... cts/ly_vu_direct_rna/MetaCompore/results/input/get_transcriptome/transcriptome_reference.fa")
2023-01-07 21:49:12.802 | INFO | nanocompore.SampCompDB:init:55 - Loading SampCompDB
2023-01-07 21:49:12.820 | ERROR | nanocompore.common:init:21 - The result database cannot be opened
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/nanocompore/SampCompDB.py", line 59, in init
with shelve.open(db_fn, flag='r') as db:
File "/usr/local/lib/python3.8/shelve.py", line 243, in open
return DbfilenameShelf(filename, flag, protocol, writeback)
File "/usr/local/lib/python3.8/shelve.py", line 227, in init
Shelf.init(self, dbm.open(filename, flag), protocol, writeback)
File "/usr/local/lib/python3.8/dbm/init.py", line 85, in open
raise error[0]("db file doesn't exist; "
dbm.error: db file doesn't exist; use 'c' or 'n' flag to create a new db

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.8/site-packages/nanocompore/SampCompDB.py", line 77, in init
raise NanocomporeError("The result database cannot be opened")
nanocompore.common.NanocomporeError: The result database cannot be opened

Additional context
I generated out_SampComp.db file with the container "library://aleg/default/nanocompore:1.0.3"

lmulroney · 2023-01-09T10:31:49Z

sunsetyerin,

Could you share all the contents of the nanocompore_sampcomp directory?

Thanks,
Logan

sunsetyerin · 2023-01-10T02:02:08Z

Hi @lmulroney, I ran the Metacompore pipeline to run nanocompore

[yekim@gphost08 MetaCompore]$ ls -l /path/to/results/nanocompore/nanocompore_sampcomp
total 286717468
1108278509 Dec 30 04:26 outnanocompore_results.tsv
1167914580 Dec 30 04:26 outnanocompore_shift_stats.tsv
290162606080 Dec 29 03:46 outSampComp.db
8489297 Dec 29 23:52 out_sampcomp.log

lmulroney · 2023-01-10T12:53:33Z

Hi sunsetyerin,

It appears that you are missing the
outSampComp.db.bak
outSampComp.db.dat
outSampComp.db.dir
files, which are required for the plotting api to properly open the database file.

Do you have values that make sense in the results.tsv file?

Does the end of your log file look something like this:
2022-01-26T17:02:10.709087+0000 INFO - Process-3 | All Done. Transcripts processed: 1
2022-01-26T17:02:10.754651+0000 INFO - MainProcess | Loading SampCompDB
2022-01-26T17:02:10.764074+0000 DEBUG - MainProcess | Reading Metadata
2022-01-26T17:02:10.769103+0000 DEBUG - MainProcess | Loading list of reference ids
2022-01-26T17:02:10.769542+0000 DEBUG - MainProcess | Checking files and arg values
2022-01-26T17:02:10.790533+0000 INFO - MainProcess | Calculate results
2022-01-26T17:02:11.580120+0000 DEBUG - MainProcess | Save reports to ./oligo1
2022-01-26T17:02:11.581240+0000 DEBUG - MainProcess | Saving extended tabular report
2022-01-26T17:02:12.513242+0000 DEBUG - MainProcess | Saving shift results

If yes to either question, could you try executing nanocompore sampcomp outside of the metacompore pipeline to check if those database files are created?

sunsetyerin · 2023-01-10T18:55:12Z

Hi @lmulroney, Yes, log file looks like you commented. So I tried to run nanocompore sampcomp after activated conda env from [nanocompore_pipeline](https://github.com/tleonardi/nanocompore_pipeline) .yaml file But keep receiving errors. The full log is attached to the email. Can you please take a look? Best, Yerin

…

On Tue, Jan 10, 2023 at 4:53 AM lmulroney ***@***.***> wrote: Hi sunsetyerin, It appears that you are missing the outSampComp.db.bak outSampComp.db.dat outSampComp.db.dir files, which are required for the plotting api to properly open the database file. Do you have values that make sense in the results.tsv file? Does the end of your log file look something like this: 2022-01-26T17:02:10.709087+0000 INFO - Process-3 | All Done. Transcripts processed: 1 2022-01-26T17:02:10.754651+0000 INFO - MainProcess | Loading SampCompDB 2022-01-26T17:02:10.764074+0000 DEBUG - MainProcess | Reading Metadata 2022-01-26T17:02:10.769103+0000 DEBUG - MainProcess | Loading list of reference ids 2022-01-26T17:02:10.769542+0000 DEBUG - MainProcess | Checking files and arg values 2022-01-26T17:02:10.790533+0000 INFO - MainProcess | Calculate results 2022-01-26T17:02:11.580120+0000 DEBUG - MainProcess | Save reports to ./oligo1 2022-01-26T17:02:11.581240+0000 DEBUG - MainProcess | Saving extended tabular report 2022-01-26T17:02:12.513242+0000 DEBUG - MainProcess | Saving shift results If yes to either question, could you try executing nanocompore sampcomp outside of the metacompore pipeline to check if those database files are created? — Reply to this email directly, view it on GitHub <#208 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOF57637ZV6R5AABJWKCOX3WRVLVRANCNFSM6AAAAAATUMZ3PQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>

lmulroney · 2023-01-11T10:22:19Z

Sorry I don't see an attachment here on github.

sunsetyerin · 2023-01-12T00:06:02Z

out_sampcomp.log
Here is the log file @lmulroney

lmulroney · 2023-01-12T14:51:56Z

From the log file, it appears that Nanocompore crashed while processing the reference transcript, ENST00000336079.8, with a "KeyboardInterrupt" error code. Usually this means that the job was killed by the user for some reason, or that the run went above the memory limit provided for the job.

There may be an issue with that particular reference transcript, so it might be worth looking at if there is something different for ENST00000336079.8 compared to other transcripts in the reference.

Let me know if any of these explanations make sense.

sunsetyerin · 2023-01-13T23:52:30Z

@lmulroney , thanks for the explanation.

I found that when I run nanocompore with many threads, it just crashes with memory issues, and when I run it with fewer threads, it stops running without error messages and doesn't make progress.

Also, when I tried to run nanocompore with container("library://aleg/default/nanocompore:1.0.3"),
it doesn't give other files other than outSampComp.db files.
But the thing is I really need the following files ;
outSampComp.db.bak
outSampComp.db.dat
outSampComp.db.dir

It seems like nanocompore requires a lot of memory, will there be any other ways I can generate the files that I need?

lmulroney · 2023-01-16T10:55:57Z

I typically run nanocompore sampcomp with 60 Gb of memory with 6 threads on a typical 6 sample human dataset. One thing to note, one thread is dedicated to the error log and one thread is dedicated to writing the results. The 3rd and more threads do data processing. So if you are only using 3 threads it can take a while to process the data.

I honestly do not know why the container is not producing the db support files (.bak, .data, and .dir). Nanocompore writes those three files as the last thing it does, and so if there was a processing error then that could possibly explain the issue.

Depending on what type of plot you are interested in, you can do much of the plotting by reading in the results.tsv file. It will have all of the p-values and statistical test results by position. And then plot using your preferred plotting language (R, python, matlab, etc). You really only need to load the database if you are interested in plotting raw ionic current values for each position.

sunsetyerin · 2023-01-17T05:09:59Z

Thanks @lmulroney,
I can see the function SampCompDB.save_to_bed
But I was wondering if there is any ways I can generate .bed file out of .db file without
outSampComp.db.bak
outSampComp.db.dat
outSampComp.db.dir
I need .bed file out of outSampComp.db

lmulroney · 2023-01-17T10:35:23Z

All of the information required to make a bed file is in the results tsv. You need to parse the position and chromosome information from the results tsv file, and you can use one of the p-value and/or LOR columns to filter which positions are added to your bed file. This also allows you to have a little more control over how the filtering is done and can let you do something like peakcalling on top of the p-value filtering if you want. I personally use my own python script instead of the database methods for these reasons.

sunsetyerin · 2023-01-17T17:48:16Z

@lmulroney, thanks.
this is how my result table looks like.

The table doesn't include chromosome information needed to generate .bed files.
Is It because I used a wrong reference file? ("http://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz")
If so, can you recommend the right reference file?

lmulroney · 2023-01-18T10:28:24Z

It is strange that the chr information is not in the results tsv. If the chr information is missing from the results tsv file, it is also likely missing from the db, so getting access to the db probably wouldn't help in this case.

Did you save the nanopolish evenalign file or did you pipe it through to nanocompore eventalign collapse? I ask to verify if the chr information is in either or both of those files.

sunsetyerin · 2023-01-18T19:20:11Z

@lmulroney, I checked on nanopolish eventalign file and nanocompore eventalign collapse file.
They all don't contain chromosome information. But even in documentation of each software, none of them include chromosome information.
Would it be different if I feed in bed file for nanocompore sampcomp function?
--bed BED BED file with annotation of transcriptome used for mapping (optional)

lmulroney · 2023-01-19T11:11:29Z

Hi Sunsetyerin,

The contig column (column 1) in the nanopolish eventalign collapse file is your chromosome information, if that is NA, then you have an issue with running nanopolish eventalign (or f5c eventalign).

For the nanocompore eventalign collapse tsv file, the chr information is after the read name. If that is NA or there is no text after the read name in the nanocompore eventalign collapse tsv file, and there was information in the eventalign file, then you have an issue with running nanocompore eventalign collapse.

Using a gzipped reference file shouldn't cause these kinds of issues, but it is possible. If there is missing data in those two files, perhaps try rerunning the pipeline with an uncompressed reference file.

sunsetyerin · 2023-01-19T23:13:29Z

@lmulroney,

This is how my eventalign looks like.

And this is how my nanocompore eventalign collapse looks like.

I think the reference fasta file doesn't include chromosome information but only gene names that nanocompore final table doesn't include the information at all.
Can you please recommend the right reference file to use for nanocompore?

lmulroney · 2023-01-20T16:53:23Z

I see the problem now. Sorry for not grasping what you were trying to do earlier. When you run nanocompore sampcomp you can include the option --bed with a bedfile of the transcriptome in the reference genome coordinates. This should add genome coordinates to the results.tsv file.

sunsetyerin · 2023-01-25T07:03:28Z

Hi @lmulroney,
I've tried to feed in bed file for nanocompore sampcomp but continuously receiving errors.
It seems like nanocompore sampcomp cannot read or match bed file info.

I used this trasncriptome fasta file throughout the pipeline;
http://ftp.ensembl.org/pub/release-106/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all.fa.gz

Do you mind if I ask how did your group generated the bed file for the analyses?

lmulroney · 2023-01-25T14:02:26Z

Hi @sunsetyerin,

Which bed files have you used for that reference transcriptome file, and what errors were you getting from nanocompore when you used a bed file?

Thanks,
Logan

sunsetyerin · 2023-01-26T07:56:06Z

@lmulroney

I had the error above.

This is my assumption but since I already fed transcriptome fasta file that didn't include genomic coordinate, even if I feed in the correct bed file, nanocompore sampcomp won't give me results with chromosome information.
Since nanocompore sampcomp requires fast file that was used during the mapping step, in which case I also used transcriptome fasta with no genomic coordinate.

Is that correct?

lmulroney · 2023-01-26T10:28:45Z

Hi @sunsetyerin,

Yes, you should use the same transcriptome reference in sampcomp that you used for mapping the reads. the bed file that you're using should have the genome coordinates of every transcript that exists in the transcriptome reference that you're using. Sampcomp will do an internal mapping of the transcriptome coordinates to the genome coordinates given the bed file provided. Based on the error message, it appears that there are more references in your fasta file than exist in the bed file you used.

lmulroney closed this as completed May 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dbm.error: db file doesn't exist; use 'c' or 'n' flag to create a new db #208

dbm.error: db file doesn't exist; use 'c' or 'n' flag to create a new db #208

sunsetyerin commented Jan 8, 2023

lmulroney commented Jan 9, 2023

sunsetyerin commented Jan 10, 2023

lmulroney commented Jan 10, 2023

sunsetyerin commented Jan 10, 2023 via email

lmulroney commented Jan 11, 2023

sunsetyerin commented Jan 12, 2023

lmulroney commented Jan 12, 2023

sunsetyerin commented Jan 13, 2023

lmulroney commented Jan 16, 2023

sunsetyerin commented Jan 17, 2023

lmulroney commented Jan 17, 2023

sunsetyerin commented Jan 17, 2023

lmulroney commented Jan 18, 2023

sunsetyerin commented Jan 18, 2023

lmulroney commented Jan 19, 2023

sunsetyerin commented Jan 19, 2023 •

edited

Loading

lmulroney commented Jan 20, 2023

sunsetyerin commented Jan 25, 2023 •

edited

Loading

lmulroney commented Jan 25, 2023

sunsetyerin commented Jan 26, 2023

lmulroney commented Jan 26, 2023

dbm.error: db file doesn't exist; use 'c' or 'n' flag to create a new db #208

dbm.error: db file doesn't exist; use 'c' or 'n' flag to create a new db #208

Comments

sunsetyerin commented Jan 8, 2023

lmulroney commented Jan 9, 2023

sunsetyerin commented Jan 10, 2023

lmulroney commented Jan 10, 2023

sunsetyerin commented Jan 10, 2023 via email

lmulroney commented Jan 11, 2023

sunsetyerin commented Jan 12, 2023

lmulroney commented Jan 12, 2023

sunsetyerin commented Jan 13, 2023

lmulroney commented Jan 16, 2023

sunsetyerin commented Jan 17, 2023

lmulroney commented Jan 17, 2023

sunsetyerin commented Jan 17, 2023

lmulroney commented Jan 18, 2023

sunsetyerin commented Jan 18, 2023

lmulroney commented Jan 19, 2023

sunsetyerin commented Jan 19, 2023 • edited Loading

lmulroney commented Jan 20, 2023

sunsetyerin commented Jan 25, 2023 • edited Loading

lmulroney commented Jan 25, 2023

sunsetyerin commented Jan 26, 2023

lmulroney commented Jan 26, 2023

sunsetyerin commented Jan 19, 2023 •

edited

Loading

sunsetyerin commented Jan 25, 2023 •

edited

Loading