Skip to content
Eevee edited this page Jan 13, 2015 · 2 revisions

Schema redesign

The pokedex schema is showing its age. It has some ancient assumptions baked into it that have been broken by Game Freak repeatedly in the meantime. Before I go ahead upgrading (or rewriting) the web interface, I want to give the schema a long hard look and make it a little more futureproof.

Problems

  • Change across time is handled awkwardly at best, with the "changelog" tables. It's not easy to get a set of data from a specific version. Type effects are not versioned correctly. Pokémon and move types are not versioned at all. Pokémon stats are not versioned, so pre-gen6 stats are lost entirely, and gen1 stats have never existed since gen2. There are a good few hacks for entities we assumed would exist throughout a generation (such as Pokémon!), but there are always occasional exceptions such as gizamimi Pichu or the original Deoxys forms.

  • Side games are very poorly represented. We have Conquest data, as a group of conquest_* tables that occasionally join to the main tables. Any game not in the main series doesn't get an entry in the "versions" table, so we don't even have any real representation of Stadium. Yet it's not clear how much overlap there should be: is the list of Pokémon shared between all games? What about abilities or moves, which have the same names in other series (e.g. PMD) but work very differently?

  • Version groups are a fabrication, and should surely be removed, but this would have a significant impact on the size of the database.

  • Storing the data as CSV is unwieldly: it's hard to read, hard to edit, hard for third parties (especially those not using Python) to do anything with.

SEE ALSO: This gist, which should probably be merged into this document: https://gist.github.com/eevee/6a257a9d42400e2d03f9

Constraints

  • Reloading the database now takes rather a long time — the move and encounter tables in particular are fairly massive, and that's with the move table semi-optimized by using version groups.

  • Ultimately some UI needs to be able to make sense of the data. The more double-checking it needs to do to eliminate redundancy, the harder it is to make the UI useful. (On the other hand, we've had quite a number of awkward bugs as a result of our attempts to consolidate data when importing, and some of the consolidation is already necessary anyway, e.g. the move table's grouping.)

Proposals

...