-
Notifications
You must be signed in to change notification settings - Fork 499
Deskew/Despeckle #20
Comments
Just opening this issue for thoughts on how to automatically process for this. There are a number of commercial products but I haven't found anything OSS that I've gotten to work well |
I hear that unpaper is the go-to tool for this sort of thing, but integrating it would have to be done on the command-line level (like we're already doing for Tesseract). I'm not opposed to this, but given that all of my scans so far have been really good quality, I'm not prioritising it. However, if someone wants to write a PR to make this work, I'd probably merge it :-) |
This would be incredible! |
Just to keep you guys in the loop: I have a working version of Paperless with I did some limited testing (mainly because I don't have any bad scan samples at hand), but from what I have seen, the OCR results at least didn't get worse for decent scans. If somebody (a) either has a bad scan for me to test with or (b) can test it for themselves, some feedback would be great! |
If there is any help I can lend with unpaper please don't hesitate to tag me :) |
@Flameeyes as @pitkley suggested, you could post a link to a low-quality scan you'd like to see work, or even try out his fork and test some stuff yourself :-) Edit: whoops, I just realised that you're the guy who made unpaper! Nifty! Well in that case, maybe you can take a look at @pitkley's fork and see if there's anything you'd change, like if there's a Pythonic way to interface with unpaper that hasn't been tried yet? |
Sorry I should have been more clearer, I meant that as a current unpaper Although I'd be happy to try this out once I have some spare time! On Wed, Feb 17, 2016 at 8:36 AM Daniel Quinn notifications@github.com
|
Will there be a PR for paperless with unpaper or are there any problems with it right now? |
@Cyber1000 I have opened PR #74 which adds |
I've merged @pitkley's unpaper integration PR, so I'm going to go ahead and close this. |
Thanks, I'll take a look at it. |
also: discard document changes button
Often times when scanning documents (especially in bulk) you run into situations where the document is slightly skewed and/or has speckles that throw off OCR.
The text was updated successfully, but these errors were encountered: