Skip to content
This repository has been archived by the owner on Feb 19, 2021. It is now read-only.

Server setup via Docker #2

Closed
gamesbook opened this issue Jan 15, 2016 · 25 comments
Closed

Server setup via Docker #2

gamesbook opened this issue Jan 15, 2016 · 25 comments

Comments

@gamesbook
Copy link

It would be really cool to have a small Docker script that allows this project to be built and deployed behind an Apache or nginx server; that way a person could simply drop into onto a cheap host somewhere in the cloud and have the service accessible to them from everywhere...

@danielquinn
Copy link
Collaborator

The big problem with this is that a "cheap host" is exactly the kind of place you don't want to be sending your private documents. This kind of app is likely to play host to things like your national insurance (social security, whatever) number, VAT, tax information -- all the fiddly stuff people get sent to them in the post for the purposes of a proper paper trail. If we do include a Docker implementation, I think it would have to come with a Great Big Warning that this shouldn't be used on an untrusted service.

As for the suggestion that I actually do this, I'm afraid I don't know how :-( I've never used Docker (I'm a Gentooer, everything is built on bare-metal), so I wouldn't know where to start. However, you want to submit a PR with such a script, I'd be happy to roll it into the scripts directory, along side the stuff I've already got for systemd.

@waynew
Copy link

waynew commented Jan 15, 2016

It shouldn't be too hard to make a Dockerfile for this. If one were so inclined they could probably include tesseract et al so all you'd have to do is something like docker run <someone>/paperless.

If I have some time and nobody else is interested in it, I might have a go. Technically speaking the Dockerfile would need to live in the project root (unless you wrote a script to move it ;) )

@danielquinn
Copy link
Collaborator

If those are the requirements, I'm cool with it. Send me a PR if/when you have time and I'll give it a shot locally before I merge it :-)

@gamesbook
Copy link
Author

TBH, I never envisaged storing all my insurance (social security, whatever) , VAT, tax information in the cloud?! I saw this app as more useful to scan in the day-to-day expenses from grocery purchases et al. If someone really wanted to hack into that - well, I am sure they would be bored very quickly.

@waynew I would be happy to test.

@danielquinn
Copy link
Collaborator

Well I guess that's the advantage of opening up your code. People see uses for it that you didn't expect. By all means, if one of you wants to write a Docker setup, I'd love to add it to the repo.

@gamesbook
Copy link
Author

@danielquinn Indeed - and they can make it easily usable where that otherwise might be a barrier.

@avichalp
Copy link
Contributor

Even if one does not want to deploy on cloud can we have a vagrant setup for development purposes?
What you guys think ?

@danielquinn
Copy link
Collaborator

I'm not opposed to having a Vagrant setup available, but I'm not going to write it unless I suddenly find myself with an abundance of free time on my hands. However, if someone submits a Vagrantfile, I'll give it a shot and merge it if it works.

@avichalp
Copy link
Contributor

I have written a Vagrantfile here. Take a look.

@stelund
Copy link

stelund commented Feb 2, 2016

I've been thinking about this and I cannot see the reason to add an nginx infront of django unless you want ssl. It would require some specific configuration to get the certs in place. The docker approach for this problem would also be to host nginx and this django app in seperate containers. Especially since there is no static data.

I think most reasonable way to package this with docker is with a small modification of djangos image. Should it include the consumer to run default then?

But there is no drop-point open unless the user links in some directory or a ftp service from another container.

@waynew
Copy link

waynew commented Feb 2, 2016

I've started some work and I've got the server started, but it doesn't appear to be parsing files from the input directory. My next step was going to be adding the logging module to see if I could pinpoint the cause, as it's failing silently.

@waynew
Copy link

waynew commented Feb 3, 2016

Ah. I see - I expected the server to run all of the things, but I see it's not doing that ;)

@danielquinn
Copy link
Collaborator

Nope, there's two processes: the runserver and consume_documents. There's also an exporter, and the details of all three are here.

@danielquinn
Copy link
Collaborator

@stelund there a few elements to consider:

  1. The webserver cannot run on a privileged port unless it's run by root -- not something I'd recommend generally, which is why you typically see nginx coupled with gunicorn or something. The webserver does indeed have some static files though: there's a bunch of css, js, and images used for the admin.
  2. The consumer is a separate process that needs to be run in parallel to the webserver. It monitors a directory and consumes what's there, inserting into the database used by the webserver and writing to a local directory so that the webserver can serve-up the PDFs when requested.
  3. A means of transferring the scans to the server. In my own setup, I have proftpd running so that my scanner can push the documents to the server where the consumer can find them. It doesn't have to be FTP though, as some scanners support writing to a Windows share (Samba) or you could even just run things locally and let users copy files into the consumption directory manually.

I don't know how any of this plays into Docker though, whether everything needs to be in separate containers or whatnot, but it should be noted that:

  • The consumer and webserver read and write to the database (sqlite)
  • The consumer writes to the MEDIA_ROOT
  • The webserver reads from the MEDIA_ROOT
  • The consumption directory must be writeable by the scanner (somehow)
  • The consumption directory must be readable and writeable by the consumer

@waynew
Copy link

waynew commented Feb 3, 2016

I've currently run into an issue where the image built from the python 3.5 dockerfile is producing a wonky version of imagemagick - namely there's no ability to convert from pdfs.

@waynew
Copy link

waynew commented Feb 3, 2016

Looks like it's just not installing ghostscript as a default dependency. Very strange, given that both of them have the exact same apt sources.list. There must be some other wizardry at play there. Looks like I've got the document converter running now 🤘

@stelund
Copy link

stelund commented Feb 3, 2016

@danielquinn Yep, thanks for the heads up. I think getting an ftp server up with docker and to mount both mount the shared data directory should be easy enough.

@waynew Great to hear! I havn't gotten that far. So I think I'll leave you to it. Holler if I can help you.

@pitkley
Copy link
Member

pitkley commented Feb 12, 2016

In case anyone is interested, I have written a Dockerfile for paperless which is available as the pitkley/paperless container, source is on GitHub.
The interesting thing -- at least in my opinion -- is that the webserver and the consumer are running in different containers accessing the same volumes, which makes the Dockerfile pretty simple.

There are still a few things to do, mainly adding a user to not use root, but it is functional and I am using it 'in production'.

Update: I have implemented both using a different user and being able to install additional languages (readme has still to be updated to reflect the latter).

@TheConnMan TheConnMan mentioned this issue Feb 13, 2016
@danielquinn
Copy link
Collaborator

Some of you guys may be interested in the pull request @Theconman issued. It's apparently incomplete, but it looks like a simple start.

@gamesbook
Copy link
Author

The "Dockerfile" (https://github.com/pitkley/dockerfiles/blob/master/paperless/Dockerfile) link is broken?

@pitkley
Copy link
Member

pitkley commented Feb 14, 2016

@gamesbook The URL you posted was the original one, I've since moved the Dockerfile to it's own repository and edited my comment. I assume you have used the link from the GitHub notification e-mail? Anyway, this is the correct link: https://github.com/pitkley/docker-paperless/blob/master/Dockerfile.

@pitkley
Copy link
Member

pitkley commented Feb 15, 2016

Just an FYI, I just opened PR #39 as an alternative Dockerfile implementation to #28. You might want to check it out 👍 (and feedback would be greatly appreciated!)

@danielquinn
Copy link
Collaborator

My Docker-foo is quite minimal, but having looked over #39 it's clear you've done your homework on how to set this up nicely. I'm inclined to merge it as-is, but I'd like to hear from others here before I do as the collective Docker skills here are considerable.

@pitkley
Copy link
Member

pitkley commented Feb 18, 2016

The mentioned PR #39 is now merged and you can now have an official Paperless Docker image. (It's not on Docker Hub yet, but we'll get there to!)


@danielquinn I think you can close this one?

@danielquinn
Copy link
Collaborator

Indeed! Thanks to everyone for your input!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants