Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bayarea code review #9

Closed
fscottfoti opened this issue Jul 31, 2014 · 7 comments
Closed

bayarea code review #9

fscottfoti opened this issue Jul 31, 2014 · 7 comments

Comments

@fscottfoti
Copy link
Contributor

@jiffyclub I'm hoping you can review what I'm currently doing in the BayArea implementation of UrbanSim and

  1. make sure there's not a more elegant way to do be doing things
  2. make sure we don't want to move any of these methods into UrbanSim
  3. think about doing more functional testing - like, maybe we can run a simulation using the code in a repo like this on only 300 or so parcels (VERY small hdf5) or so just to keep things humming (maybe even build badge and coverage badge) - or more realistically we could run this on a public example implementation repo

Anyway, I'm definitely happy with models.py, variables.py and assumptions.py - the real questionable code is in

https://github.com/synthicity/bayarea_urbansim/blob/master/utils.py
and
https://github.com/synthicity/bayarea_urbansim/blob/master/models.py

One point is on relocating agents. I've actually gone back to using -1 to mark the relocating agents. np.nan coerces the building_id into a float64 dtype, which wreaks havoc on performance with all the joins by building_id to other tables (this was a fun bug to track down). I know I could just record the indexes of the moving agents and carry that from the relocation model to the location model, but I use the -1's to mark the moving agents just to make sure I'm updating the data correctly. Otherwise I think it would be much easier to hide the fact that I'm doing the update wrong (which has happened a couple of times), since all agents would always have a valid building_id, and in lieu of unit testing this I think the -1's and logging is the best option to convince me it's working. Something to think about though.

And one other question is on adding records - like in the developer model and the transition models - is just calling add_table at the end the canonical way to update a dataframe or does this have consequences?

@jiffyclub
Copy link
Member

To your last question, add_table is the way to do that. It's designed to work seamlessly, so if it's not there's a bug.

@jiffyclub
Copy link
Member

Using -1 to mark unfilled IDs (as opposed to real-numbered data) is fine. Are there any changes necessary in urbansim to accommodate that?

@jiffyclub
Copy link
Member

Looks like maybe the neighborhood_vars and price_vars models could actually be table sources?

@jiffyclub
Copy link
Member

This all seems fine! I like the way you're doing a lot of user feedback in the utils/models. User feedback is something that's hard to get right in the library code itself because you never know which users are going to want lots of feedback and which ones will find it annoying. So doing that over here, where users have control over it, is great.

One thing to watch out for is a proliferation of places that modify the simulation state. The ability to update the simulation from anywhere is convenient, but can also lead situations where something is changing the state unexpectedly and it's difficult to track down what exactly is doing it. If we ever move any of the utilities into urbansim we'll want to make them return their results rather than updating the sim themselves.

@fscottfoti
Copy link
Contributor Author

No changes (within urbansim) necessary to use -1's to mark relocating households. Just making a comment as to why I went back to that...

neighborhood_vars and price_vars would get recomputed yearly. So they would be @sim.table rather than @sim.table_source? Is that right (either way that's a good idea)? And I'm realizing something like get_vacant_units should probably be @sim.variable instead of being in utils.

Other comments are good. I'll make a few small revisions when I get a chance. I realized that since these things are all registering functions or objects they can be registered in a random order. The real power of that is that a novice user only needs to thing of one function at a time instead of all the interconnections, which is great. Which then made me think that you could create an awesome web interface where you can browse the tables, variables, models, etc and only see one function at a time (which are usually just a few lines of code a-la a cell in the notebook). And if you're trying to set the injectables you can get a list of the currently registered injectables, and maybe even each function can be tested as you work on it. Anyway, for the distant future, but I'm really liking the flexibility.

@jiffyclub
Copy link
Member

If you want something to be run yearly a model is the easiest way to do that. Another way is to use a table, turn on the cache, and clear the results yearly. For neighborhood_vars and price_vars is something writing those files out yearly so they have to be read back in?

Another suggestion is not skimp on comments, both docstrings and inline comments. Like any code you're going to come back to this in future and wonder why you did something one way as opposed to another, and other people are going to be reading/modifying the code too. For your web UI idea the docstrings could be especially useful for providing descriptions of the models, tables, columns, etc. in the UI.

@jiffyclub
Copy link
Member

One thing that occurred to me today is that models themselves could be injectables.

@sim.injectable('rsh_hedonic')
def rsh_hedonic():
    return models.RegressionModel.from_yaml('rsh.yaml')

@sim.model('rsh-estimate')
def rsh_estimate(rsh_hedonic, homesales, nodes):
    # joins or whatever
    rsh_hedonic.fit(df)
    rsh_hedonic.to_yaml('rsh.yaml')

@sim.model('rsh-simulate')
def rsh_simulate(rsh_hedonic, buildings, nodes):
    # joins or whatever
    new_stuff = rsh_hedonic.predict(df)
    buildings.update_col_from_series('residential_sales_price', new_stuff)

I haven't totally thought through how much code this would save you or how much more readable it would make the simulation, but it does seem like it would make the model instances a more explicit part of the simulation and potentially move some of the yaml specificity up out of the model classes.

fscottfoti added a commit that referenced this issue Jul 29, 2016
…-2-compare-output

remove unused variable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants