Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the use of multiple databases #539

Merged
merged 14 commits into from Apr 15, 2019

Conversation

dopatraman
Copy link
Contributor

@dopatraman dopatraman commented Mar 26, 2019

Right now, this library does not support saving historical to a database separate from its operational model. This PR allows the historical record to be saved to the same db as the historical record table.

Description

  • Save historical record to the database associated with the historical record table by passing in the db alias contained in the historical instance.

Motivation and Context

As of now, the historical records are created using the database associated with the operational model being tracked, not the database associated with the historical table. That means if the operational model is in a separate database, the library attempts to look up the historical table in the operational database, and subsequently fails.

How Has This Been Tested?

So far, I have not been able to get the full test suite to run locally.

That being said, the tests involve creating a record on a second database by passing the db name into the HistoricalRecords constructor and then asserting its existence.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • I have run the make format command to format my code
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • I have added my name and/or github handle to AUTHORS.rst
  • I have added my change to CHANGES.rst
  • All new and existing tests passed.

@rossmechanic
Copy link
Collaborator

@dopatraman thanks for this. Where is instance.history.db populated? Not entirely sure how your implementation would work, but open to having a solution here. You're saying that, at the moment, dsh supports multiple databases, but not having the base model and history model in separate databases, correct? Unsure why we wouldn't be able to accomplish that with routers.

@dopatraman
Copy link
Contributor Author

@rossmechanic no prob, happy to contribute!

You're saying that, at the moment, dsh supports multiple databases, but not having the base model and history model in separate databases, correct?

Correct. Using a router crossed my mind, but it looks like argument passed to post_save is ultimately the decider of where the record goes. I tried to dig for the source of the using argument (declared in models.py, line 416) -- what I found was that it is coming from somewhere in Django core.

I found that if a database router sends a historical table to a separate database, the db property of the instance's HistoryManager was set to that separate database. If I had to guess, I'd say that the db property of instance.history is likely set when django runs the migration for the operational model.

I'm open to a better way to do this. But short of hacking Django core, this seemed like the best option.

Thanks!

@rossmechanic
Copy link
Collaborator

To me, that's a bug. Ideally, history check for it's db in the following order:

  1. history's own using parameter
  2. Router's db for history
  3. base table's db

Right now, it seems that the history table only accept its base table's db, overruling anything else. I'd want the solution to include options 1 and 2 without breaking anything for current users (we can then deprecate if necessary before DSH3). instance.history.db sounds too magical to me without knowing how exactly it's set. If it's set by the router, the solution looks fine to me, but i'd also like to add a using parameter to HistoricalRecords and have a proper hierarchy here. What do you think? @ThePumpingLemma @kseever

@dopatraman
Copy link
Contributor Author

dopatraman commented Mar 26, 2019

@rossmechanic

i'd also like to add a using parameter to HistoricalRecords and have a proper hierarchy here.

I like this solution. However, this means we're explicitly overriding the using argument already passed to post_save (line 414, models.py). This could be confusing for other developers down the line.

Side note, I'm unable to run the test and format tasks.

@rossmechanic
Copy link
Collaborator

Don't think anyone should be using post_save as it's part of dsh's internal api. the hooks the are documented and are public api are pre_create_historical_record and post_create_historical_record, so I just think you pass the database that's actually being used for the historical record into those hooks

@rossmechanic
Copy link
Collaborator

Why can't you test or format?

@dopatraman
Copy link
Contributor Author

@rossmechanic ok, I've updated the code.

FWIW, I'm not 100% happy with how I've written this because the root of the problem seems to be coming from Django itself. Do you know if there is a way to override whatever is passing using to the hook?

@codecov-io
Copy link

codecov-io commented Mar 28, 2019

Codecov Report

Merging #539 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #539      +/-   ##
==========================================
+ Coverage   97.75%   97.75%   +<.01%     
==========================================
  Files          17       17              
  Lines         847      848       +1     
  Branches      117      117              
==========================================
+ Hits          828      829       +1     
  Misses          8        8              
  Partials       11       11
Impacted Files Coverage Δ
simple_history/models.py 99% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6b113e3...10aa065. Read the comment docs.

@dopatraman
Copy link
Contributor Author

This is my output for make test:

__________________________________________________________ summary __________________________________________________________
  py27-django111: commands succeeded
ERROR:  py34-django111: InterpreterNotFound: python3.4
ERROR:  py35-django111: InterpreterNotFound: python3.5
ERROR:  py36-django111: InterpreterNotFound: python3.6
  py37-django111: commands succeeded
ERROR:  py34-django20: InterpreterNotFound: python3.4
ERROR:  py35-django20: InterpreterNotFound: python3.5
ERROR:  py36-django20: InterpreterNotFound: python3.6
  py37-django20: commands succeeded
ERROR:  py35-django21: InterpreterNotFound: python3.5
ERROR:  py36-django21: InterpreterNotFound: python3.6
  py37-django21: commands succeeded
ERROR:  py35-djangotrunk: InterpreterNotFound: python3.5
ERROR:  py36-djangotrunk: InterpreterNotFound: python3.6
ERROR:   py37-djangotrunk: commands failed
  docs: commands succeeded
  flake8: commands succeeded
make: *** [test] Error 1

This strikes me as strange since, IIRC, tox should run tests across all of the missing environments. @rossmechanic

@rossmechanic
Copy link
Collaborator

@dopatraman you need to install pyenv and then install all of the different versions of python using pyenv. Tox can't do anything if the interpreter doesn't exist locally. If that's not in the contributing docs, feel free to add it

@dopatraman
Copy link
Contributor Author

@rossmechanic great, that did the trick.

I'm in a bit of a quandary re: code climate style issues. The complaint is that a function should not have greater than X parameters. Unfortunately, we need to pass an extra argument to HistoricalRecords to ensure the log creation happens on the right database. There is probably a way to accomplish this by refactoring the code, but I think that's outside the purview of this pull request...

Is it possible to relax the rules regarding parameter count for this feature?

@rossmechanic
Copy link
Collaborator

@dopatraman yep I can override the codeclimate issue to merge. DWAI. I'll look into this pr more closely this weekend and get any last feedback before merging

@dopatraman
Copy link
Contributor Author

@rossmechanic any progress on reviewing the PR?

@rossmechanic
Copy link
Collaborator

@dopatraman it doesn't currently respect routers. Ideally we could have a router as the fallback rather than the base model's history. You see any way to do that?

@dopatraman
Copy link
Contributor Author

Not sure I understand what you mean by having the router as a fallback?

@rossmechanic
Copy link
Collaborator

@dopatraman so ideally, we have:

  1. history's own using parameter
  2. Router's db for history
  3. base table's db

Right now, this PR only respects 1 and 3, but if the router were to send these historical tables elsewhere, it wouldn't work. Not sure how to fix this though. Need to look more deeply into it

@dopatraman
Copy link
Contributor Author

Sorry, I'm not sure I follow. In theory, the user of this library would have their own router that sends the historical tables to some database. All that matters is that future records are saved to the same database, right?

@rossmechanic

@rossmechanic
Copy link
Collaborator

Right now that router wouldn't work tho, right? Since the router_db will be overruled by the base table's db? Or if I specify in the router that all models starting with historical in their model name should be written to db2 that would work?

@dopatraman
Copy link
Contributor Author

dopatraman commented Apr 2, 2019

The db router would have to route the historical tables to the desired database. The router could do this any number of ways, for instance:

class DbRouter(object):
    ...
    def allow_migrate(self, db, app_label, model_name, **kwargs):
        if self._is_history_db(db, app_label, model_name, kwargs):
            return True
        elif self._is_default_db(db, app_label, model_name, kwargs):
            return True
        else:
            return False

The implementation of these routing methods is up to the developer.

If your concern is about automating this type of configuration, it may worth looking into, but I wonder if it is within the scope of this pull request. Users of this library may want to organize their historical tables any number of ways -- as of now they have the freedom to write a db router class that meets their specifications without having the library impose something onto them.

@rossmechanic
Copy link
Collaborator

My concern was that the using parameter that the base table passes to the historical table overrides the router. Such that historical tables will always either follow their own using parameter or the using parameter of the base table, but never the router itself. Is that not what you saw when you first created this PR?

@dopatraman
Copy link
Contributor Author

dopatraman commented Apr 2, 2019

I see... to my knowledge the historical tables were never using the using parameter of the router to begin with. I'm not actually sure where the router's using parameter enters the scene, if at all. To my knowledge the only aspect of the router's configuration that is present on the historical model is hidden in a _db attribute.

The reason for this pull request was to force an override of the default using parameter. I was under the impression that should a developer choose to supply the using parameter to the HistoricalRecords constructor, their intention would be to explicitly override the default (I was in this situation before making the PR).

@rossmechanic
Copy link
Collaborator

There is no using parameter of the router. But if the historical model doesn't have a using parameter, it should rely on the router. The router currently cannot pick what database the historical tables are written to, which I know is outside of the scope of this PR, but touches same part of code and was hoping that we could role that into this. Not exactly sure how we would do that though, so I may accept this as is. Just need to spend some time looking into it to make sure we can't solve both problems at the same time.

@rossmechanic
Copy link
Collaborator

Regardless, can you add docs and tests?

@dopatraman
Copy link
Contributor Author

re: docs and tests, will do.

re: router fall back I see what you're saying now. In theory we could capture the router's context when the migrations are run, and then compare the context to the inputs passed to create_historical_model. So, if a user creates a record on a table associated with another database but does not explicitly pass in the using param, we could fall back to the router context.

I'd be in favor of merging this in first and attempting the latter afterwards. I'd be happy to help build it out 😄

@rossmechanic
Copy link
Collaborator

Great. Add the docs and tests and I'll do another pass through. I agree it should be another PR, but didn't want to do anything here that made that second PR more difficult to implement. Thanks for your help!

@rossmechanic
Copy link
Collaborator

Yea, happy with this PR. Add docs and tests and I'll merge 👍

@dopatraman
Copy link
Contributor Author

dopatraman commented Apr 4, 2019

re: tests, I'm not able to find where we run migrations on the test databases. The TestDbRouter does not appear to be used by anything...

@rossmechanic

@dopatraman
Copy link
Contributor Author

@rossmechanic

As mentioned in the reply above, I don't think we can hook into the router without modifying django core. Since I'm on a bit of a time crunch and this changeset does not break anything that already exists, would it be possible to merge and address the inherit_db argument in a later refactor?

@rossmechanic
Copy link
Collaborator

rossmechanic commented Apr 12, 2019

@dopatraman what I'm saying is more along the lines of this:

Pass a parameter to HistoricalRecords called use_base_table_db=True. Then add this check to both of the create_historical_record calls, such that the one on create would look like this:

self.create_historical_record(instance, created and "+" or "-", using=using if self.use_base_table_db else None)

With that, if you pass use_base_table_db=False to HistoricalRecords, then the historical model will rely on router logic rather than the base table. Would that solve your problem? If so, I'm happy to implement it right now or let you implement it. Essentially this allows us to defer to the router rather than having to override it with another using var (which worries me because it only appears on the create)

@dopatraman
Copy link
Contributor Author

dopatraman commented Apr 12, 2019

@rossmechanic

I don't believe this would have the desired effect because the using argument bound to the scope of post_save (line 434) is coming from django core, and will only be set to the database of the operational table.

For illustration purposes -- suppose we have an operational model and its associated historical model. By the time create_historical_record is called, using will be set to 'default'. Should self.user_base_table_db be set to True, the historical table will be tied to the operational table (default). On the off case that self.use_base_table_db is set to False, it will be even worse because using will be passed into the creation function as None.

TLDR, without a using argument passed directly to HistoricalRecords, the historical record can either be tied only to the operational table, or not exist at all.

I don't know how we can get around forcing a table parameter on the historical record without modifying django core, which I think is outside the scope of this PR.

@rossmechanic
Copy link
Collaborator

@dopatraman If you look at https://github.com/treyhunner/django-simple-history/blob/6b113e3e90f2405a2d9a9479bdf91afc0e695ab0/simple_history/models.py#L477 , that's where the instance is saved. If we pass using=None to create_historical_record, then that will be called with history_instance.save(using=None).

Now, if you look at what the save call is actually doing, https://github.com/django/django/blob/master/django/db/models/base.py#L663, it's default is using=None, so it's fine for us to pass using=None to create_historical_record. In fact, if the save call is passed using=None, then the save call defers to the router to figure out which database to use:

https://github.com/django/django/blob/master/django/db/models/base.py#L700

Thus, the method described above should defer to routers. Essentially, I think we made a mistake by merging #507, but adding this flag would allow us to go back to a state where historical models can be completely controlled by routers.

@dopatraman
Copy link
Contributor Author

@rossmechanic

I'm still not sure I understand the interface. What does base_table refer to here? If we say that a HistoricalRecords constructor has use_base_table_db set to True, what does that mean in the context of the history and its associated operational table?

From what I gather, if use_base_table_db defaults to true, we want the history to be saved to the operational database by default. If it is false, it goes to the table defined by the router. Is that correct?

@rossmechanic
Copy link
Collaborator

@dopatraman ‘use_base_table_db’ (actually let’s call it use_base_model_db) would mean whether or not we always use the base model’s DB. When I say base model, I mean MyModel as opposed to HistoricalMyModel. The reason we want use_base_model_db to default to True is to keep this backward compatible, wed likely change the default in the next major version

@dopatraman
Copy link
Contributor Author

@rossmechanic

It looks like the TestDbRouter still does not migrate the models properly. I am not clear on how the migrations happen -- I am not able to hit breakpoints or log statements in the allow_migrate method.

@dopatraman
Copy link
Contributor Author

dopatraman commented Apr 13, 2019

@rossmechanic

I've done my best to repair the tests and refactor. Tell me what you think.

IMO, the test configuration is kind of funky. I'm not sure we're using the TestDbRouter in the best way. FWIW, the codebase I'm using that imports simple_history is able to migrate historical tables without any issues. I'm happy to take a look at how to improve this in a separate ticket.

@rossmechanic
Copy link
Collaborator

rossmechanic commented Apr 13, 2019

Fixed the tests for you in this branch https://github.com/treyhunner/django-simple-history/tree/using-separate-db. You can copy them over from there. You've put a lot of work into this PR so I want your version to be the one merged.

Essentially just added another router particularly for this test case. Migrations are difficult to test so we don't have to worry about them, but the tests do test that the base model and history model are read and written to the correct (separate) dbs. For the record, migrations are run before the test suite which is why they're hard to test. If you copy those tests over and update the docs, I'm happy to merge @dopatraman

@dopatraman
Copy link
Contributor Author

@rossmechanic

Thank you for the assist. I like your solution -- a separate router for the override is a clean way to test.

I've cherry-picked your changes onto this branch. It seemed quicker and I'm happy to share the credit. The tests are passing now... if you're ready, let's merge!

@rossmechanic
Copy link
Collaborator

@dopatraman Looks good to me! Just need to update the docs and I'll merge

docs/multiple_dbs.rst Outdated Show resolved Hide resolved
@dopatraman
Copy link
Contributor Author

@rossmechanic

Looks like we're good to go.

Copy link
Collaborator

@rossmechanic rossmechanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Thanks @dopatraman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Issue accepted for completion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants