Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Deskew/Despeckle #20

Closed
rileytg opened this issue Feb 11, 2016 · 12 comments
Closed

Deskew/Despeckle #20

rileytg opened this issue Feb 11, 2016 · 12 comments

Comments

@rileytg
Copy link

rileytg commented Feb 11, 2016

Often times when scanning documents (especially in bulk) you run into situations where the document is slightly skewed and/or has speckles that throw off OCR.

skew
speckle

@rileytg rileytg changed the title Deskew/Despeckle Deskewe/Despeckle Feb 11, 2016
@rileytg rileytg changed the title Deskewe/Despeckle Deskew/Despeckle Feb 11, 2016
@rileytg
Copy link
Author

rileytg commented Feb 11, 2016

Just opening this issue for thoughts on how to automatically process for this. There are a number of commercial products but I haven't found anything OSS that I've gotten to work well

@danielquinn
Copy link
Collaborator

I hear that unpaper is the go-to tool for this sort of thing, but integrating it would have to be done on the command-line level (like we're already doing for Tesseract). I'm not opposed to this, but given that all of my scans so far have been really good quality, I'm not prioritising it.

However, if someone wants to write a PR to make this work, I'd probably merge it :-)

@dimitrieh
Copy link

This would be incredible!

@pitkley
Copy link
Member

pitkley commented Feb 16, 2016

Just to keep you guys in the loop: I have a working version of Paperless with unpaper which you can find over at feature/unpaper.
This PR is currently blocked by #34, but as soon as that is resolved I will open a PR for unpaper.

I did some limited testing (mainly because I don't have any bad scan samples at hand), but from what I have seen, the OCR results at least didn't get worse for decent scans.

If somebody (a) either has a bad scan for me to test with or (b) can test it for themselves, some feedback would be great!

@Flameeyes
Copy link
Contributor

If there is any help I can lend with unpaper please don't hesitate to tag me :)

@danielquinn
Copy link
Collaborator

@Flameeyes as @pitkley suggested, you could post a link to a low-quality scan you'd like to see work, or even try out his fork and test some stuff yourself :-)

Edit: whoops, I just realised that you're the guy who made unpaper! Nifty! Well in that case, maybe you can take a look at @pitkley's fork and see if there's anything you'd change, like if there's a Pythonic way to interface with unpaper that hasn't been tried yet?

@Flameeyes
Copy link
Contributor

Sorry I should have been more clearer, I meant that as a current unpaper
maintainer :)

Although I'd be happy to try this out once I have some spare time!

On Wed, Feb 17, 2016 at 8:36 AM Daniel Quinn notifications@github.com
wrote:

@Flameeyes https://github.com/Flameeyes as @pitkley
https://github.com/pitkley suggested, you could post a link to a
low-quality scan you'd like to see work, or even try out his fork and test
some stuff yourself :-)


Reply to this email directly or view it on GitHub
#20 (comment)
.

Diego Elio Pettenò (aka Flameeyes)

@Cyber1000
Copy link

Will there be a PR for paperless with unpaper or are there any problems with it right now?

@danielquinn
Copy link
Collaborator

I see that @pitkley has a branch where he's started the unpaper integration, but it's fallen out of sync with master for now. My priority right now is the UI, but when that's ready, I'll be looking at this issue -- that is if @pitkley hasn't already submitted a PR.

@pitkley pitkley mentioned this issue Mar 6, 2016
@pitkley
Copy link
Member

pitkley commented Mar 6, 2016

@Cyber1000 I have opened PR #74 which adds unpaper if you want to test it.

@danielquinn
Copy link
Collaborator

I've merged @pitkley's unpaper integration PR, so I'm going to go ahead and close this.

@Cyber1000
Copy link

Thanks, I'll take a look at it.

jayme-github pushed a commit to jayme-github/paperless that referenced this issue Nov 29, 2020
also: discard document changes button
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants