Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whotracks.me/whotracksme/data/assets/trackerdb.sql -> sqlite3 error #277

Open
jpgpi250 opened this issue May 13, 2022 · 9 comments
Open
Assignees

Comments

@jpgpi250
Copy link

Using the latest version of sqlite3, the database cannot be build from the sql script.

this has been discussed here, non working and working example included. It appears older sqlite3 versions ignore the error, the latest version apparently has a problem with it.

For clarification, using the debian bullseye version 3.34.1-3 ignores the problem, pihole-FTL has the latest version (3.38.2) of sqlite3 embedded, this version does show the problem.

Whould it be possible to update the sql script to allow newer versions of sqlite3 process the script without errors.

Thanks for your time and effort.

@jpgpi250
Copy link
Author

did some tests. apparently, the version of sqlite3, used to create trackersdb.sql (.dump ???) creates a script, that isn't compatible with the latest version of sqlite3. This would mean you need to install a more recent (latest) version of sqlite3 on the system, used to create the script.

@philipp-classen philipp-classen self-assigned this May 16, 2022
@philipp-classen
Copy link
Member

Thanks for reporting. I have some difficulties to reproduce though. I'm able to create a database from the sql when testing with sqlite 3.38.5 (on Arch).

This is what I tried:

$ sqlite3 --version
3.38.5 2022-05-06 15:25:27 78d9c993d404cdfaa7fdd2973fa1052e3da9f66215cff9c5540ebe55c407alt1

$ wget https://raw.githubusercontent.com/whotracksme/whotracks.me/master/whotracksme/data/assets/trackerdb.sql

$ sqlite3 ./whotracks.db < ./trackerdb.sql

$ ls -alhgtr ./whotracks.db 
... 2.2M May 16 12:34 ./whotracks.db

So, on the latest sqlite3 version, it seems to work.

@jpgpi250
Copy link
Author

Thanks for looking at this, difficult, since there are 3 parties involved (you, me and the pihole-FTL developpers). I'm trying to keep everybody in the loop, and reported your findings and my additional test here. Will get back to you as soon as possible.

@Bucking-Horn
Copy link

Bucking-Horn commented May 16, 2022

I have some difficulties to reproduce though.

You should be able to reproduce by including
PRAGMA foreign_keys=ON;

Your current code seems to assume foreign key constraints are not enforced by the database engine.
That may not be a valid assumption (of course, that may be debatable if that script would be intended for internal use only, and targeting a specific database, version and binary where the actual value would be known to be ON. ;) )

Different SQLite3 versions may use either ON or OFF as a database default, see https://sqlite.org/pragma.html#pragma_foreign_keys.
This may even be true for the same version, depending on the compile options used to produce the binary, see https://sqlite.org/foreignkeys.html#fk_enable.

You could consider to run the statements in an order that would comply with foreign key constraints.

Alternatively, if you are sure that execution would never result in any foreign key violations, and if your SQL would target SQLite3 only, you could also include the respective PRAGMA statements in your script.

@philipp-classen
Copy link
Member

I have some difficulties to reproduce though.

You should be able to reproduce by including PRAGMA foreign_keys=ON;

Yes, when adding that, it fails with such errors:

...
Runtime error near line 4925: UNIQUE constraint failed: tracker_domains.tracker, tracker_domains.domain (19)
...
Parse error near line 6642: no such table: main.trackers
Runtime error near line 7634: FOREIGN KEY constraint failed (19)
...

@Bucking-Horn
Copy link

Bucking-Horn commented May 16, 2022

For changing the order of SQL statements, you could take a peek at yubiuser's suggestion from https://discourse.pi-hole.net/t/interesting-study-leaky-forms/55401/4

@DL6ER
Copy link

DL6ER commented May 19, 2022

I do agree that this should be changed in your script. PRAGMA foreign_keys=ON; will eventually become an issue in the future as the SQLite3 maintainers consider to make foreign key enforcement the new default in a future release:

Note, however, that future releases of SQLite might change so that foreign key constraints enabled by default. Careful developers will not make any assumptions about whether or not foreign keys are enabled by default [...].

(https://www.sqlite.org/foreignkeys.html)

And in your case, it isn't even necessary to deal with foreign keys at all. Just change the order of the commands to ensure you are not referencing something which does not exist at this time. Create parents before children (please follow @Bucking-Horn's link for a clear explanation). It'd be better if the script is already prepared for this (especially because the fix is simple and easy).

@philipp-classen
Copy link
Member

Looked at, but it is unfortunately it is hard to change the order. It is not that we generate the SQL directly; it comes indirectly by dumping an already existing sqlite database with https://pypi.org/project/sqlite-dump/

The code does something like that:

import sqlite3
from sqlite_dump import iterdump

conn = sqlite3.connect("./test.db")
for line in iterdump(conn):
    print(line)

I'm currently not aware of an elegant way to fix the order without rewriting the whole exporter.

For the moment, the best workaround that I can see is to import it with PRAGMA foreign_keys=off; if you are hit by the problem.

@philipp-classen
Copy link
Member

philipp-classen commented May 3, 2023

We recently opened all the data: https://github.com/ghostery/trackerdb. It's now the recommended place to start and will replace the trackerdb.sql file in this repository (#315).

In trackerdb releases, we export the data in different formats. There is an sqlite binary dump (trackerdb.db), but perhaps the
JSON representation would be easier to process (trackerdb.json).

You can find the latest releases here:
https://github.com/ghostery/trackerdb/releases

@DL6ER @Bucking-Horn @jpgpi250 If you have any feedback, please let us know (either here or open a ticket on https://github.com/ghostery/trackerdb/issues).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants