Database locked while cosnumer runs #546

robsdedude · 2019-06-03T10:36:26Z

First of all: Thanks for the nice project!

I've got the project running on a Raspberry Pi 3 B+ (so basically a toaster). This means that the consumer takes a looong time to consume the PDFs wich per se is no problem for me. However I noticed that the consumer seems to lock the database while it's processing PDFs. So I can't edit already consumed documents while there are any left in the consumption dir. I get a 500 (OperationalError: database is locked) when I try to save any model while the consumer is working.

Is this necessary or could the consumer close/unlock the database connection until ocr, guesswork, and stuff is done?

https://github.com/the-paperless-project/paperless/blob/master/src/documents/consumer.py#L115 this seems to be the code in question.

The text was updated successfully, but these errors were encountered:

ddddavidmartin · 2019-06-03T11:38:03Z

Thanks for bringing this up! I have run into this as well. I'm fairly sure this has not always been the case and I used to edit documents whilst some were still being consumed.

stgarf · 2019-06-04T23:32:54Z

I've noticed this too and this makes paperless unusable if you're doing something like an initial ingestion of a boatload of PDFs and you also want to managed them in the UI...

…

On Mon, Jun 3, 2019 at 3:36 AM Rouven Bauer ***@***.***> wrote: First of all: Thanks for the nice project! I've got the project running on a Raspberry Pi 3 B+ (so basically a toaster). This means that the consumer take a looong time to consume the PDFs wich per se is no problem for me. However I noticed that the consumer seems to lock the database while it's processing PDFs. So I can't edit already consumed documents while there are any left in the consumption dir. I get a 500 (OperationalError: database is locked) when I try to save any model while the consumer is working. Is this necessary or could the consumer close/unlock the database connection until ocr and guesswork is done? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#546?email_source=notifications&email_token=AAM5SMU2AW2D5RL2ERJMNL3PYTX23A5CNFSM4HSGXWG2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GXH7FRQ>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAM5SMTMGG3G7GK2A7W35XLPYTX23ANCNFSM4HSGXWGQ> .

joshwizzy · 2019-06-05T05:16:15Z

The issue seems to be specific to SQLite
https://docs.djangoproject.com/en/2.2/ref/databases/#database-is-locked-errors

There is only one database write in the try_consume_file function which happens at the end of the consumption process

paperless/src/documents/consumer.py

Line 154 in 8e6d7cb

document = self._store(

Is the transaction.atomic decorator necessary if your don't have
document_consumption_started or document_consumption_finished signal handlers
that write to the database?

Is it possible to have a setting which toggles atomic transactions that default to True then
have try_consume_file_atomic

if CONSUME_FILE_ATOMIC:
    result = self.try_consume_file_atomic(file):
else:
    result = self.try_consume_file(file):

@transaction.atomic
def try_consume_file_atomic(self, file):
   self.try_consume_file(file)

def try_consume_file(self, file):

ddddavidmartin · 2019-06-05T05:22:13Z

Would it help to just decorate the _store method with the @transaction.atomic decorator instead of the whole try_consume_file consumption method? Then I'd think it would not lock the database for the whole consumption but only for actually writing the consumed file to the database.

I'd expect that there would be only one instance of the consumer running in any case.

joshwizzy · 2019-06-05T05:28:23Z

Isn't the @transaction.atomic only necessary if you are writing to the database in the signal handlers
so that the is a set of DB operations that may be rolled back?
The _store method is only one database operation so doesn't need the @transaction.atomic

ddddavidmartin · 2019-06-05T05:40:08Z

Isn't the @transaction.atomic only necessary if you are writing to the database in the signal handlers
so that the is a set of DB operations that may be rolled back?

My experience with databases and django is practically zero, so no idea. My suggestion was just based on the lock being possibly too broad and that a lot of time in the initial consumption in try_consume_file seems unrelated to the database.

LorenzBischof · 2019-10-24T07:24:30Z

I have to preprocess my documents with OCR, because of this. Then the consumer doesn't run as long and the database lock is shorter.

MasterofJOKers · 2020-02-28T23:02:17Z

Thinking through this the third time now, there doesn't seem to be a good way to properly handle this while keeping the same assumption: that anything connected to the pre- or post-signals may change the database and if that pre- or post-signal fails, we should roll back everything.

I'm in favor of changing the behavior and move @transaction.atomic to _store() instead of try_consume_file(). There doesn't seem to be anything happening in the DB before the pre-signal. We have to make sure all the steps in _store() are atomic, as that's saving all the document's data. In the default pre- and post-signalhandlers, we shell out into another process. Since we run in a transaction, no other process should be able to change the same table we do (especially for sqlite DBs).

languitar · 2020-03-07T14:08:07Z

Another nasty side-effect of this: while the consumer is doing its work you cannot log into the web interface.

stueja · 2020-05-30T15:43:09Z

Is there anything which can be done regarding the immensely long runtime? I remember that in my "old" installation using mariadb/mysql, I never experienced the long runtime nor the database locking.
I imagine that the database locking wouldn't be much of a problem if the runtime wasn't so long.

LorenzBischof · 2020-05-30T16:42:44Z

@stueja It could be related to the new tesseract version that was activated (v3->v4) 3050ff1#diff-3254677a7917c6c01f55212f86c57fbf It uses neural networks and I noticed a decrease in performance on my system. But generally OCR is expensive and takes a while. Not much you can do...

Please open a new issue if you are having performance issues. This issue is about the database lock being too broad and this causing usability issues, because the database is locked longer than need be.

robsdedude changed the title ~~Database locked when cosnumer runs~~ Database locked while cosnumer runs Jun 3, 2019

LorenzBischof mentioned this issue May 30, 2020

Long runtime / Database is locked in Docker #667

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database locked while cosnumer runs #546

Database locked while cosnumer runs #546

robsdedude commented Jun 3, 2019 •

edited

ddddavidmartin commented Jun 3, 2019

stgarf commented Jun 4, 2019 via email

joshwizzy commented Jun 5, 2019 •

edited

ddddavidmartin commented Jun 5, 2019 •

edited

joshwizzy commented Jun 5, 2019

ddddavidmartin commented Jun 5, 2019

LorenzBischof commented Oct 24, 2019 •

edited

MasterofJOKers commented Feb 28, 2020

languitar commented Mar 7, 2020

stueja commented May 30, 2020 •

edited

LorenzBischof commented May 30, 2020

Database locked while cosnumer runs #546

Database locked while cosnumer runs #546

Comments

robsdedude commented Jun 3, 2019 • edited

ddddavidmartin commented Jun 3, 2019

stgarf commented Jun 4, 2019 via email

joshwizzy commented Jun 5, 2019 • edited

ddddavidmartin commented Jun 5, 2019 • edited

joshwizzy commented Jun 5, 2019

ddddavidmartin commented Jun 5, 2019

LorenzBischof commented Oct 24, 2019 • edited

MasterofJOKers commented Feb 28, 2020

languitar commented Mar 7, 2020

stueja commented May 30, 2020 • edited

LorenzBischof commented May 30, 2020

robsdedude commented Jun 3, 2019 •

edited

joshwizzy commented Jun 5, 2019 •

edited

ddddavidmartin commented Jun 5, 2019 •

edited

LorenzBischof commented Oct 24, 2019 •

edited

stueja commented May 30, 2020 •

edited