-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database is written as whole to disk on each run --> SSD wearout, especially SD-Cards on Raspberry PI #690
Comments
I did some research. According to https://charlesleifer.com/blog/going-fast-with-sqlite-and-python/ for this it is sufficient to do:
Now database should be in wal mode. Update:
Wal mode is working as it can be seen by the temporary filenames. But still same amount of data is written.
But didn't help either. So is this a general SQLite problem and can't be done otherwise? |
Further investigating this lead to this testcode: #!/usr/bin/python3
import sqlite3
import random
import os
import time
filename="example.db"
targetSizeMB=20
numBytesPerEntry = 1024
con = sqlite3.connect(filename)
con.execute('pragma journal_mode=wal')
cur = con.cursor()
targetSizeBytes = targetSizeMB * 1024 * 1024
def createDB():
print("Creating DB, this can take some time")
cur.execute("CREATE TABLE test (id integer, data text)")
iterCount = int(targetSizeMB * 1024 * 1024 / numBytesPerEntry / 1.33)
for i in range(0,iterCount):
randomValue = ''.join(random.choice([chr(i) for i in range(ord('a'),ord('z'))]) for _ in range(numBytesPerEntry))
stmt = f'INSERT INTO TEST VALUES({i},"{randomValue}")'
#print(stmt)
cur.execute(stmt)
if i % 1000 == 0:
pct = round(i / iterCount * 100,2)
print(f"Progress {pct} %")
dbSize = os.path.getsize(filename)
if dbSize < targetSizeBytes:
createDB()
randID = random.randint(10000000000,100000000000)
randomValue = ''.join(random.choice([chr(i) for i in range(ord('a'),ord('z'))]) for _ in range(numBytesPerEntry))
stmt = f'INSERT INTO TEST VALUES({randID},"{randomValue}")'
cur.execute(stmt)
stmt=f'INSERT OR REPLACE INTO test values (900000000000,"{randomValue}")'
#print(stmt)
cur.execute(stmt)
con.commit()
wait=10
print("Wating", 10, "seconds to examine files")
time.sleep(wait)
con.close() First a sqlite Database is created and filled with random data. So this is either a urlwatch problem or a minidb problem. |
Hello @JsBergbau , I maintain a backward-compatible fork called webchanges that's optimized for web browsing here. That fork uses SQLite natively (instead of through the minidb package), and this may help you in your quest since you have direct access to the SQLite code. The class you're looking for is here. I note that every time the database is closed there is a call to I think the broader goal is to find a solution that strikes a balance between not writing the database as a whole to disk on each run while at the same time preventing the database to grow to infinity. Suggestions welcomed. Feel free to open an issue on that project. |
Hello @mborsetti thank you very much for your answer. btw: What do you mean with "optimized for web browsing"?
I didn't expect it to be so easy and didn't expect that the database will be vaccumed on every write, so I didn't search for vacum. Yes of course VACCUM rewrites the database.
I've commented the line https://github.com/thp/minidb/blob/0dbecfa68f34199ccae9cb9f4201dc35e2a3ec67/minidb.py#L171 and now the rewriting of the database is gone. Thank you so much. That also speeds up urlwatch a lot for me (because SD card on raspberry PI is quite slow).
VACCUM is then needeed if you delete entries to gain the space back. Since URL watch only appends data (except for So only for |
Hello @JsBergbau,
Check out https://webchanges.readthedocs.io/en/stable/migration.html
Thanks for the quote and explanation.
The database in The one in |
Exactly and thats why a vaccum command does not gain any additional space, only when run with
In your case, when deleting old data, then yes, a VACCUM is needed to make the DB smaller. Yes delting in Sqlite is like a flag, just a logical delete and with VACCUM data is really removed. According to https://stackoverflow.com/questions/26812249/sqlite-vacuuming-fragmentation-and-performance-degradation the freespace is used again, so even with just keeping the 4 last entries your database won't grow forever. |
Thank you so much for your expertise and research. Will remove (in Appreciate the help very much! |
Thanks you for pointing me to the right place. |
This should be fixed in the master branch now (the mentioned commit + cfee541). |
Thanks for fixing. |
Command
./urlwatch --urls sonstiges.url --cache sonstigesWatch.db
Filesize: sonstigesWatch.db about 12 MB
On Prometheuts stats you see disk write marked as 1
Now I use another db which has about 35 MB. Then you see in the graph the disk write marked with number 2.
Job with 12 MB runs every 5 minutes. Thats 8760 hours per year * 60 minutes / 5 minutes * 12 MB = 1,261,440 MB or 1,2 TB written per year.
Job with now 35 MB DB runs every 30 minutes, thats about 600 GB written per year.
Thats a lot of workload per year, just because of urlwatch and with an SD-Card of raspberry PI this is a really huge workload.
DB is even rewritten when there is no change detected, because timestamp of last run is updated.
Since this is basically SQLite, I can't imagine that there is no other way than rewriting the whole file.
Update: This also affects urlwatch performance with large databases. Since each time database is rewritten it slows down urlwatch process.
The text was updated successfully, but these errors were encountered: