Manage index creation #109

Closed
ezyang opened this Issue Dec 31, 2012 · 14 comments

Comments

Projects
None yet
9 participants
@ezyang

ezyang commented Dec 31, 2012

By default, persistent generates no indexes. persistent should permit index suggestions in the model, attempt to create them, and give warnings when it is not possible to create an index in the current backend.

Original: yesodweb/yesod#444 (comment)

Related: http://stackoverflow.com/questions/11861339/sql-indices-with-database-persist-yesod-web-framework

@gregwebs

This comment has been minimized.

Show comment Hide comment
@gregwebs

gregwebs Aug 29, 2014

Owner

See https://github.com/jcristovao/migrationplus which has some sqlite index support.
This is going to be tough for persistent to manage though. Index creation varies across backends, and automatically creating them on deployment can potentially degrade performance.

Owner

gregwebs commented Aug 29, 2014

See https://github.com/jcristovao/migrationplus which has some sqlite index support.
This is going to be tough for persistent to manage though. Index creation varies across backends, and automatically creating them on deployment can potentially degrade performance.

@snoyberg

This comment has been minimized.

Show comment Hide comment
@snoyberg

snoyberg Jul 19, 2015

Owner

Closing out old issues, please reopen if still relevant

Owner

snoyberg commented Jul 19, 2015

Closing out old issues, please reopen if still relevant

@snoyberg snoyberg closed this Jul 19, 2015

@RasmusKlett

This comment has been minimized.

Show comment Hide comment
@RasmusKlett

RasmusKlett Mar 8, 2016

I think this is still very relevant. With it, I would be able to handle all my SQL needs from Persistent, which would be much easier to manage.
Additionally, maybe Django's approach of adding indexes to foreign key fields by default should be considered? I am guessing it would be beneficial to most, and do very little harm to those who do not spend time removing unnecessary indices.

In this issue it is mentioned that maybe a manual alternative should simply be documented instead, has that been done? That would be a reasonable alternative, but I cannot find it anywhere.

I think this is still very relevant. With it, I would be able to handle all my SQL needs from Persistent, which would be much easier to manage.
Additionally, maybe Django's approach of adding indexes to foreign key fields by default should be considered? I am guessing it would be beneficial to most, and do very little harm to those who do not spend time removing unnecessary indices.

In this issue it is mentioned that maybe a manual alternative should simply be documented instead, has that been done? That would be a reasonable alternative, but I cannot find it anywhere.

@gregwebs

This comment has been minimized.

Show comment Hide comment
@gregwebs

gregwebs Mar 10, 2016

Owner

Automatically handling indexes is fraught with peril. When data gets large, index creation can take an extremely long time. Generally it is best to get some advise from persistent about how to do migrations but manage them yourself. You can try sqitch, dbmigrations, or even ActiveRecord migrations. Let us know what works for you.

Owner

gregwebs commented Mar 10, 2016

Automatically handling indexes is fraught with peril. When data gets large, index creation can take an extremely long time. Generally it is best to get some advise from persistent about how to do migrations but manage them yourself. You can try sqitch, dbmigrations, or even ActiveRecord migrations. Let us know what works for you.

@RasmusKlett

This comment has been minimized.

Show comment Hide comment
@RasmusKlett

RasmusKlett Mar 10, 2016

In the Django world where I come from, it seems to work very well, and I really don't see why it wouldn't here. People with very big data can just not use the feature, and for the rest of us it would be very nice.
But anyway, if manually managed migrations are recommended, maybe it should be documented somwhere easily findable? Everyone with medium-sized data is bound to have this issue.

In the Django world where I come from, it seems to work very well, and I really don't see why it wouldn't here. People with very big data can just not use the feature, and for the rest of us it would be very nice.
But anyway, if manually managed migrations are recommended, maybe it should be documented somwhere easily findable? Everyone with medium-sized data is bound to have this issue.

@gregwebs

This comment has been minimized.

Show comment Hide comment
@gregwebs

gregwebs Mar 10, 2016

Owner

Where else should this be documented?

Owner

gregwebs commented Mar 10, 2016

Where else should this be documented?

@RasmusKlett

This comment has been minimized.

Show comment Hide comment
@RasmusKlett

RasmusKlett Mar 10, 2016

I was imagining something like a github wiki page showing how to get a basic setup running. Or it might even be in the Yesod book?

I was imagining something like a github wiki page showing how to get a basic setup running. Or it might even be in the Yesod book?

@cies

This comment has been minimized.

Show comment Hide comment
@cies

cies Aug 4, 2016

With it, I would be able to handle all my SQL needs from Persistent

For me not all needs (I certainly need Esqueleto and some just-drop-down-to-SQL-here'n'there), but it would allow me to do all migrations from Haskell. I also would prefer to see them in the TH DSL, then in some .sql file that need to be executed "every once in a while" (I dont know if dbs skip the action "superfluously creating already existing indexes" gracefully -- if so that would make keeping indexes in raw SQL quite a bit more annoying when deploying to different targets).

cies commented Aug 4, 2016

With it, I would be able to handle all my SQL needs from Persistent

For me not all needs (I certainly need Esqueleto and some just-drop-down-to-SQL-here'n'there), but it would allow me to do all migrations from Haskell. I also would prefer to see them in the TH DSL, then in some .sql file that need to be executed "every once in a while" (I dont know if dbs skip the action "superfluously creating already existing indexes" gracefully -- if so that would make keeping indexes in raw SQL quite a bit more annoying when deploying to different targets).

@rdnetto

This comment has been minimized.

Show comment Hide comment
@rdnetto

rdnetto Nov 26, 2016

Contributor

Speaking as someone new to Persistent (who found this issue when trying to figure out how to handle index creation), the thing I'd like to see most is the best practice (whatever that is) for index creation documented in the Yesod book. The alternative is that everyone has to re-discover this for themselves, and repeat the same mistakes.

Contributor

rdnetto commented Nov 26, 2016

Speaking as someone new to Persistent (who found this issue when trying to figure out how to handle index creation), the thing I'd like to see most is the best practice (whatever that is) for index creation documented in the Yesod book. The alternative is that everyone has to re-discover this for themselves, and repeat the same mistakes.

@sboosali

This comment has been minimized.

Show comment Hide comment
@sboosali

sboosali Dec 12, 2017

This limitation is reasonable needs to be documented, probably in runMigration. Also, by "persistent advising you", do you mean to just print out showMigration and add indices manually?

This limitation is reasonable needs to be documented, probably in runMigration. Also, by "persistent advising you", do you mean to just print out showMigration and add indices manually?

@gregwebs

This comment has been minimized.

Show comment Hide comment
@gregwebs

gregwebs Dec 13, 2017

Owner

Yeah, print out persistent's migrations. You can also generate SQL names from persistent for indexes, we could actually have functions around generating this SQL. Still I would want to leave running actual migrations in the hands of the user. And I would like to use existing migration frameworks rather than re-invent this wheel. Here are ones you could try:

FlywayDB

Every feature you want, especially on the JVM
https://flywaydb.org/documentation/migration/sql
can use a CLI also, including docker image: https://github.com/shouldbee/docker-flyway

Go CLI

sql-migrate: https://github.com/rubenv/sql-migrate
supports many DBs
no Env variable expansion!

rambler: https://github.com/elwinar/rambler
mysql/postgres/sqlite

Sqitch (Perl)

sqitch: http://sqitch.org/
just runs the db client (e.g. psql)
has a nice story for versioning, etc

LiquidBase

http://www.liquibase.org/index.html

Owner

gregwebs commented Dec 13, 2017

Yeah, print out persistent's migrations. You can also generate SQL names from persistent for indexes, we could actually have functions around generating this SQL. Still I would want to leave running actual migrations in the hands of the user. And I would like to use existing migration frameworks rather than re-invent this wheel. Here are ones you could try:

FlywayDB

Every feature you want, especially on the JVM
https://flywaydb.org/documentation/migration/sql
can use a CLI also, including docker image: https://github.com/shouldbee/docker-flyway

Go CLI

sql-migrate: https://github.com/rubenv/sql-migrate
supports many DBs
no Env variable expansion!

rambler: https://github.com/elwinar/rambler
mysql/postgres/sqlite

Sqitch (Perl)

sqitch: http://sqitch.org/
just runs the db client (e.g. psql)
has a nice story for versioning, etc

LiquidBase

http://www.liquibase.org/index.html

@tysonzero

This comment has been minimized.

Show comment Hide comment
@tysonzero

tysonzero Mar 21, 2018

@gregwebs

Is there any chance that the ability to add indexes in the TH DSL will be considered? This is probably the biggest remaining issue I am having with persistent at the moment, a missing index is basically a correctness issue for the data I am dealing with, as the queries I want to run just will not ever finish without those indices present.

I can see how people dealing with complex migrations might perhaps want to go beyond running the auto generated migrations. However for my (and I'm sure many other people's) use case, complex migrations of existing data is not an issue, I just need to be able to rely on those indices being there even if I drop and rebuild the table.

There already exists infrastructure for creating Unique constraints, which actually in postgresql/sqlite automatically creates an index under the covers, so it seems to me as though the code for regular old indexes should not be any harder or more backend specific than Unique constraint are, but I could be wrong.

Thanks for considering this, it really would be a massive QOL improvement to have this feature.

tysonzero commented Mar 21, 2018

@gregwebs

Is there any chance that the ability to add indexes in the TH DSL will be considered? This is probably the biggest remaining issue I am having with persistent at the moment, a missing index is basically a correctness issue for the data I am dealing with, as the queries I want to run just will not ever finish without those indices present.

I can see how people dealing with complex migrations might perhaps want to go beyond running the auto generated migrations. However for my (and I'm sure many other people's) use case, complex migrations of existing data is not an issue, I just need to be able to rely on those indices being there even if I drop and rebuild the table.

There already exists infrastructure for creating Unique constraints, which actually in postgresql/sqlite automatically creates an index under the covers, so it seems to me as though the code for regular old indexes should not be any harder or more backend specific than Unique constraint are, but I could be wrong.

Thanks for considering this, it really would be a massive QOL improvement to have this feature.

@MaxGabriel

This comment has been minimized.

Show comment Hide comment
@MaxGabriel

MaxGabriel Mar 21, 2018

Member

Personally I am in favor of Persistent handling indexes. MySQL and Postgres now support online index creation, so queries aren’t blocked anymore, unless you have an exceptionally large table (like, multi-day index creation times)

So I would welcome a PR for this

Member

MaxGabriel commented Mar 21, 2018

Personally I am in favor of Persistent handling indexes. MySQL and Postgres now support online index creation, so queries aren’t blocked anymore, unless you have an exceptionally large table (like, multi-day index creation times)

So I would welcome a PR for this

@tysonzero

This comment has been minimized.

Show comment Hide comment
@tysonzero

tysonzero Mar 21, 2018

@MaxGabriel That's good to know, thanks! I unfortunately do not know Template Haskell yet and do not have any free time at the moment, so it will be quite some time before I can make such a PR, if anyone else wants to make it please go ahead.

@MaxGabriel That's good to know, thanks! I unfortunately do not know Template Haskell yet and do not have any free time at the moment, so it will be quite some time before I can make such a PR, if anyone else wants to make it please go ahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment