Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding a warning about using SQLite in production #17

Closed
pydanny opened this issue Feb 16, 2017 · 19 comments
Closed

Consider adding a warning about using SQLite in production #17

pydanny opened this issue Feb 16, 2017 · 19 comments

Comments

@pydanny
Copy link
Member

pydanny commented Feb 16, 2017

While PythonAnywhere allows and seems to encourage it, and Charles Leifer seems to think it's a wonderful option, our personal experience is that SQLite works great in production until it doesn't. Then it's a huge pain to migrate off to something, be it PostgreSQL or MySQL (migration via fixtures is painful, if it works at all).

Our opinion is that while we love SQLite in its sweet spot (as a local database for computers, phones, or browsers), for a system with any number of users sending data to a database or requiring anything but light concurrency, it's a nightmare in the making. We know, we've experience it ourselves and heard horror stories from others.

Before we put in a warningbox against using it in production, it would be nice to get other input

Reference articles:

@audiolion
Copy link

Maybe instead of warning against use in production, warn that using in production can cause significant growing pains and if the website is expecting anything other than x users per y time unit then you should consider a full fledged DBMS from the start.

@natecox
Copy link

natecox commented Feb 16, 2017

I actually support warning against using SQLite in general, for two reasons:

  1. My experience is consistent with yours in saying that SQLite is asking for disaster past very simple usage in production, and
  2. my experience has demonstrated to me that migrations across different platforms aren't foolproof. E.g., migration may run fine on SQLite locally but fail horribly on the production MySQL/Postgres for edge-case integrity reasons.

My personal policy is that while I respect the database agnosticism of Django's ORM, I save myself quite a headache by simply using the same database solution locally as I do remotely. Since SQLite is out of the question for production, it's out of the question for local development too.

@natecox
Copy link

natecox commented Feb 25, 2017

Given this issue is marked as help wanted, is there anything specific we can do to assist?

@aptiko
Copy link

aptiko commented Apr 18, 2017

I have also treated this issue at https://djangodeployment.com/2016/12/23/which-database-should-i-use-on-production/, of which I include an excerpt. I don't believe you can just say "don't use in production"; it depends on the kind of application and deployment. The first paragraph of the excerpt below illustrates an extreme case where anything but SQLite would really be overkill. I suspect there must be many less extreme examples. As I put it in the article, trucks have valid use cases, but when I want to carry a couple of books I use a bicycle.

I’m using SQLite in production in one application that has no models; all the data is stored elsewhere and is retrieved through a webservice API. The only significant data stored in SQLite is the users’ names and passwords used for login, by django.contrib.auth. It’s hardly three users. Recreating them would be easier than maintaining a PostgreSQL installation. So SQLite it is.

What if your database is small and you don’t have many users, but you store mission-critical data in the database? That’s a hard one. The thing is, no-one really knows if SQLite is appropriate, because no-one is using it for mission-critical data. Thunderbird doesn’t use it for storing emails, but for storing indexes, which can be recreated. Likewise for Firefox. The SQLite people claim it’s appropriate for mission-critical applications, but industry experience on that is practically nonexistent. I’ve never seen corruption in SQLite. I’ve seen corruption in PostgreSQL, but we are comparing apples to oranges. I have a gut feeling (but no hard data) that I can trust SQLite more than MySQL.

If I ever choose to use SQLite for mission-critical data, I will make sure I not just backup the database file, but also backup a plain text dump of the database. I trust plain text dumps more than database files in case there is silent corruption that can go unnoticed for some time.

@natecox
Copy link

natecox commented Apr 18, 2017

@aptiko in the context of a best practices book, I don't think it's wise or realistic to address edge cases that contradict established conventional wisdom. Sure, SQLite may fit the needs of a website out there--it exists for a reason, and it's ideal for small embedded databases. However, I would argue that within the context of a Django project a case where SQLite does fit, but Postgres, etc. does not fit is absolutely an edge case.

Furthermore, I would estimate that the vast majority of users who are using SQLite are using it because they don't know better, rather than that they're addressing the specific needs of their project. Most of the people I've met that tell you to avoid SQLite like the plague are doing so for the same reason I started doing so: because it's burned them one too many times. At some point, the time spent on failed deployments and debugging will outweigh the convenience of not having to configure Postgres.

tl;dr: I agree SQLite has its place, but I don't think it's within the context of this book.

@pydanny
Copy link
Member Author

pydanny commented Apr 18, 2017

The problem with getting into discussing edge cases for SQLite is that they are, as @natecox pointed out, edge cases. We could fill in pages of special SQLite edge cases. For example, a local Django tool that runs on Windows machines for organizing file workflow on desktops (I know US government agencies that do this kind of thing).

However, my fear is that by bringing up these edge cases, readers may think their special case for an app with more than one user makes it ideal. They'll say something like, "Five users can't break SQLite!". That's true, but if their project grows suddenly by management fiat to one hundred users, they might run into difficulties. For what it's worth, this isn't far-fetched: I experienced it personally in 2009 and I know several others who have run into the same thing since.

We're still undecided about this warning. For a tiny but vocal minority, it's contentious, but TSD is supposed to be opinionated. In theory, we should fight this battle, but in practice we're wondering if it's truly worth fighting for.

@aptiko
Copy link

aptiko commented Apr 20, 2017

This has been a very interesting discussion, thank you. It's exactly what I had been looking for when I was researching the issue. I will update my blog post and my book accordingly.

Just to play the devil's advocate: A person who doesn't know better than SQLite decides to deploy. They see the warning, they think "let me do it with PostgreSQL just to be sure". They install PostgreSQL, they struggle a bit with authentication, they manage to do it without really understanding what they've done. What is better? SQLite or a PostgreSQL managed by an ignorant person? They might not even know they can't just backup the data files, and instead of the horror of needing to migrate from SQLite to PostgreSQL a few years later, they might go through the horror of losing the data.

So a warning on its own might be insufficient for someone who's on a tight budget/deadlines, as is usually the case.

@audiolion
Copy link

Strange hypothetical, IMO if its been a few years and you havent figured out how to backup your database, you are the only one to blame, not the book you read 3 years ago for saying use this tool but didnt warn you that you should take time to learn it if you use it.

@natecox
Copy link

natecox commented Apr 20, 2017

@aptiko Honestly, I think your contention relies on the implication that managing a basic Postgres database is very difficult; which I have never found to be true.

If you're scaling to multi-server sharding and redundant live backups then sure it can be very complicated, but those are inherently advanced topics and well outside of the purvue of this discussion. Basic administration of Postgres is not unreasonably difficult, and past the threshold of an exceedingly simple database, the administration will probably be easier than dealing with SQLite gotchas.

@pydanny
Copy link
Member Author

pydanny commented Apr 20, 2017

On a side note, @aptiko, I didn't know you had written a book on Django. And if I'm not mistaken, in LaTeX! Grats and you have my respect! From tons of experience, I know how hard (and yet rewarding) writing a book can be.

Please take a look at https://www.twoscoopspress.com/pages/django-references. Figure out where you think your book and courses should be and send me a brief description of what your stuff is. We'll list it there for you, which is linked to from our book. 😄

@pydanny
Copy link
Member Author

pydanny commented Apr 20, 2017

Back to the topic at hand, @aptiko you bring up a good point. Many tutorials (including Django Girls) start you with SQLite. Python Anywhere even lets you run production apps using it!

As for what @natecox said, running MySQL or PostgreSQL is no longer hard. There are so many recipes and tools for them for both local and deployed code that the point is moot. And if you get to the point where you have to carefully edit settings for those tools, that means you should have enough business to justify the time and money spent in configuration.

After a lot of thought, I think we should take a stand. We'll put in something that says in effect:

  • First off, SQLite is a great database. We use it constantly. Just not for web development.
  • Don't use SQLite in production. It works great until it doesn't. Then it's a huge pain to migrate off to something, be it PostgreSQL or MySQL (migration via fixtures is painful, if it works at all)
  • That some power users of the tool (the 1%) can get away with it is not justification for using SQLite in production
  • As we advocate using the same database engine in all locations, for the purposes of Django development, don't use SQLite.

@natecox
Copy link

natecox commented Apr 20, 2017

@pydanny Sounds good to me

@pydanny
Copy link
Member Author

pydanny commented Apr 20, 2017

@natecox We'll try and have that updated in TSD review today or tomorrow. Once we get it up there, mind taking a look and giving it the normal sanity check? 😉

@natecox
Copy link

natecox commented Apr 20, 2017

Sure thing!

@pydanny
Copy link
Member Author

pydanny commented Apr 20, 2017

@natecox Posted!

@aptiko
Copy link

aptiko commented Apr 26, 2017

@pydanny Re Django book (#17 (comment)) I replied at info at twoscoopspress dot org several days ago (2017-04-22 09:40 UTC). Sometimes it can be difficult to reach you by email so I was wondering whether you received it.

@pydanny
Copy link
Member Author

pydanny commented Apr 26, 2017

@aptiko Weird, I can't find your email anywhere. Including any I might have sent. Can you resend it?

@pydanny
Copy link
Member Author

pydanny commented Apr 26, 2017

Closing as this issue is resolved

@pydanny pydanny closed this as completed Apr 26, 2017
@aptiko
Copy link

aptiko commented Apr 27, 2017

@pydanny Resent just now to four addresses: [info|hello] at twoscoopspress dot [org|com]. If you still not receive it and you want to investigate without torturing this ticket, you can contact me at antonis@djangodeployment.com.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants