Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Database locked while cosnumer runs #546

Open
robsdedude opened this issue Jun 3, 2019 · 11 comments
Open

Database locked while cosnumer runs #546

robsdedude opened this issue Jun 3, 2019 · 11 comments

Comments

@robsdedude
Copy link

robsdedude commented Jun 3, 2019

First of all: Thanks for the nice project!

I've got the project running on a Raspberry Pi 3 B+ (so basically a toaster). This means that the consumer takes a looong time to consume the PDFs wich per se is no problem for me. However I noticed that the consumer seems to lock the database while it's processing PDFs. So I can't edit already consumed documents while there are any left in the consumption dir. I get a 500 (OperationalError: database is locked) when I try to save any model while the consumer is working.

Is this necessary or could the consumer close/unlock the database connection until ocr, guesswork, and stuff is done?

https://github.com/the-paperless-project/paperless/blob/master/src/documents/consumer.py#L115 this seems to be the code in question.

@robsdedude robsdedude changed the title Database locked when cosnumer runs Database locked while cosnumer runs Jun 3, 2019
@ddddavidmartin
Copy link
Contributor

Thanks for bringing this up! I have run into this as well. I'm fairly sure this has not always been the case and I used to edit documents whilst some were still being consumed.

@stgarf
Copy link
Contributor

stgarf commented Jun 4, 2019 via email

@joshwizzy
Copy link

joshwizzy commented Jun 5, 2019

The issue seems to be specific to SQLite
https://docs.djangoproject.com/en/2.2/ref/databases/#database-is-locked-errors

There is only one database write in the try_consume_file function which happens at the end of the consumption process

document = self._store(

Is the transaction.atomic decorator necessary if your don't have
document_consumption_started or document_consumption_finished signal handlers
that write to the database?

Is it possible to have a setting which toggles atomic transactions that default to True then
have try_consume_file_atomic

if CONSUME_FILE_ATOMIC:
    result = self.try_consume_file_atomic(file):
else:
    result = self.try_consume_file(file):

@transaction.atomic
def try_consume_file_atomic(self, file):
   self.try_consume_file(file)

def try_consume_file(self, file):

@ddddavidmartin
Copy link
Contributor

ddddavidmartin commented Jun 5, 2019

Would it help to just decorate the _store method with the @transaction.atomic decorator instead of the whole try_consume_file consumption method? Then I'd think it would not lock the database for the whole consumption but only for actually writing the consumed file to the database.

I'd expect that there would be only one instance of the consumer running in any case.

@joshwizzy
Copy link

Isn't the @transaction.atomic only necessary if you are writing to the database in the signal handlers
so that the is a set of DB operations that may be rolled back?
The _store method is only one database operation so doesn't need the @transaction.atomic

@ddddavidmartin
Copy link
Contributor

Isn't the @transaction.atomic only necessary if you are writing to the database in the signal handlers
so that the is a set of DB operations that may be rolled back?

My experience with databases and django is practically zero, so no idea. My suggestion was just based on the lock being possibly too broad and that a lot of time in the initial consumption in try_consume_file seems unrelated to the database.

@LorenzBischof
Copy link

LorenzBischof commented Oct 24, 2019

I have to preprocess my documents with OCR, because of this. Then the consumer doesn't run as long and the database lock is shorter.

@MasterofJOKers
Copy link
Contributor

Thinking through this the third time now, there doesn't seem to be a good way to properly handle this while keeping the same assumption: that anything connected to the pre- or post-signals may change the database and if that pre- or post-signal fails, we should roll back everything.

I'm in favor of changing the behavior and move @transaction.atomic to _store() instead of try_consume_file(). There doesn't seem to be anything happening in the DB before the pre-signal. We have to make sure all the steps in _store() are atomic, as that's saving all the document's data. In the default pre- and post-signalhandlers, we shell out into another process. Since we run in a transaction, no other process should be able to change the same table we do (especially for sqlite DBs).

@languitar
Copy link
Contributor

Another nasty side-effect of this: while the consumer is doing its work you cannot log into the web interface.

@stueja
Copy link

stueja commented May 30, 2020

Is there anything which can be done regarding the immensely long runtime? I remember that in my "old" installation using mariadb/mysql, I never experienced the long runtime nor the database locking.
I imagine that the database locking wouldn't be much of a problem if the runtime wasn't so long.

@LorenzBischof
Copy link

@stueja It could be related to the new tesseract version that was activated (v3->v4) 3050ff1#diff-3254677a7917c6c01f55212f86c57fbf It uses neural networks and I noticed a decrease in performance on my system. But generally OCR is expensive and takes a while. Not much you can do...

Please open a new issue if you are having performance issues. This issue is about the database lock being too broad and this causing usability issues, because the database is locked longer than need be.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants