Database locked while cosnumer runs #546
Comments
Thanks for bringing this up! I have run into this as well. I'm fairly sure this has not always been the case and I used to edit documents whilst some were still being consumed. |
I've noticed this too and this makes paperless unusable if you're doing
something like an initial ingestion of a boatload of PDFs and you also want
to managed them in the UI...
…On Mon, Jun 3, 2019 at 3:36 AM Rouven Bauer ***@***.***> wrote:
First of all: Thanks for the nice project!
I've got the project running on a Raspberry Pi 3 B+ (so basically a
toaster). This means that the consumer take a looong time to consume the
PDFs wich per se is no problem for me. However I noticed that the consumer
seems to lock the database while it's processing PDFs. So I can't edit
already consumed documents while there are any left in the consumption dir.
I get a 500 (OperationalError: database is locked) when I try to save any
model while the consumer is working.
Is this necessary or could the consumer close/unlock the database
connection until ocr and guesswork is done?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#546?email_source=notifications&email_token=AAM5SMU2AW2D5RL2ERJMNL3PYTX23A5CNFSM4HSGXWG2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GXH7FRQ>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAM5SMTMGG3G7GK2A7W35XLPYTX23ANCNFSM4HSGXWGQ>
.
|
The issue seems to be specific to SQLite There is only one database write in the try_consume_file function which happens at the end of the consumption process paperless/src/documents/consumer.py Line 154 in 8e6d7cb
Is the transaction.atomic decorator necessary if your don't have Is it possible to have a setting which toggles atomic transactions that default to True then if CONSUME_FILE_ATOMIC:
result = self.try_consume_file_atomic(file):
else:
result = self.try_consume_file(file):
@transaction.atomic
def try_consume_file_atomic(self, file):
self.try_consume_file(file)
def try_consume_file(self, file): |
Would it help to just decorate the I'd expect that there would be only one instance of the consumer running in any case. |
Isn't the @transaction.atomic only necessary if you are writing to the database in the signal handlers |
My experience with databases and django is practically zero, so no idea. My suggestion was just based on the lock being possibly too broad and that a lot of time in the initial consumption in |
I have to preprocess my documents with OCR, because of this. Then the consumer doesn't run as long and the database lock is shorter. |
Thinking through this the third time now, there doesn't seem to be a good way to properly handle this while keeping the same assumption: that anything connected to the pre- or post-signals may change the database and if that pre- or post-signal fails, we should roll back everything. I'm in favor of changing the behavior and move |
Another nasty side-effect of this: while the consumer is doing its work you cannot log into the web interface. |
Is there anything which can be done regarding the immensely long runtime? I remember that in my "old" installation using mariadb/mysql, I never experienced the long runtime nor the database locking. |
@stueja It could be related to the new tesseract version that was activated (v3->v4) 3050ff1#diff-3254677a7917c6c01f55212f86c57fbf It uses neural networks and I noticed a decrease in performance on my system. But generally OCR is expensive and takes a while. Not much you can do... Please open a new issue if you are having performance issues. This issue is about the database lock being too broad and this causing usability issues, because the database is locked longer than need be. |
First of all: Thanks for the nice project!
I've got the project running on a Raspberry Pi 3 B+ (so basically a toaster). This means that the consumer takes a looong time to consume the PDFs wich per se is no problem for me. However I noticed that the consumer seems to lock the database while it's processing PDFs. So I can't edit already consumed documents while there are any left in the consumption dir. I get a
500
(OperationalError: database is locked
) when I try to save any model while the consumer is working.Is this necessary or could the consumer close/unlock the database connection until ocr, guesswork, and stuff is done?
https://github.com/the-paperless-project/paperless/blob/master/src/documents/consumer.py#L115 this seems to be the code in question.
The text was updated successfully, but these errors were encountered: