bayarea code review #9

fscottfoti · 2014-07-31T17:42:48Z

@jiffyclub I'm hoping you can review what I'm currently doing in the BayArea implementation of UrbanSim and

make sure there's not a more elegant way to do be doing things
make sure we don't want to move any of these methods into UrbanSim
think about doing more functional testing - like, maybe we can run a simulation using the code in a repo like this on only 300 or so parcels (VERY small hdf5) or so just to keep things humming (maybe even build badge and coverage badge) - or more realistically we could run this on a public example implementation repo

Anyway, I'm definitely happy with models.py, variables.py and assumptions.py - the real questionable code is in

https://github.com/synthicity/bayarea_urbansim/blob/master/utils.py
and
https://github.com/synthicity/bayarea_urbansim/blob/master/models.py

One point is on relocating agents. I've actually gone back to using -1 to mark the relocating agents. np.nan coerces the building_id into a float64 dtype, which wreaks havoc on performance with all the joins by building_id to other tables (this was a fun bug to track down). I know I could just record the indexes of the moving agents and carry that from the relocation model to the location model, but I use the -1's to mark the moving agents just to make sure I'm updating the data correctly. Otherwise I think it would be much easier to hide the fact that I'm doing the update wrong (which has happened a couple of times), since all agents would always have a valid building_id, and in lieu of unit testing this I think the -1's and logging is the best option to convince me it's working. Something to think about though.

And one other question is on adding records - like in the developer model and the transition models - is just calling add_table at the end the canonical way to update a dataframe or does this have consequences?

jiffyclub · 2014-07-31T17:51:26Z

To your last question, add_table is the way to do that. It's designed to work seamlessly, so if it's not there's a bug.

jiffyclub · 2014-08-01T17:17:56Z

Using -1 to mark unfilled IDs (as opposed to real-numbered data) is fine. Are there any changes necessary in urbansim to accommodate that?

jiffyclub · 2014-08-01T17:44:16Z

Looks like maybe the neighborhood_vars and price_vars models could actually be table sources?

jiffyclub · 2014-08-01T18:26:23Z

This all seems fine! I like the way you're doing a lot of user feedback in the utils/models. User feedback is something that's hard to get right in the library code itself because you never know which users are going to want lots of feedback and which ones will find it annoying. So doing that over here, where users have control over it, is great.

One thing to watch out for is a proliferation of places that modify the simulation state. The ability to update the simulation from anywhere is convenient, but can also lead situations where something is changing the state unexpectedly and it's difficult to track down what exactly is doing it. If we ever move any of the utilities into urbansim we'll want to make them return their results rather than updating the sim themselves.

fscottfoti · 2014-08-01T21:29:07Z

No changes (within urbansim) necessary to use -1's to mark relocating households. Just making a comment as to why I went back to that...

neighborhood_vars and price_vars would get recomputed yearly. So they would be @sim.table rather than @sim.table_source? Is that right (either way that's a good idea)? And I'm realizing something like get_vacant_units should probably be @sim.variable instead of being in utils.

Other comments are good. I'll make a few small revisions when I get a chance. I realized that since these things are all registering functions or objects they can be registered in a random order. The real power of that is that a novice user only needs to thing of one function at a time instead of all the interconnections, which is great. Which then made me think that you could create an awesome web interface where you can browse the tables, variables, models, etc and only see one function at a time (which are usually just a few lines of code a-la a cell in the notebook). And if you're trying to set the injectables you can get a list of the currently registered injectables, and maybe even each function can be tested as you work on it. Anyway, for the distant future, but I'm really liking the flexibility.

jiffyclub · 2014-08-01T23:31:50Z

If you want something to be run yearly a model is the easiest way to do that. Another way is to use a table, turn on the cache, and clear the results yearly. For neighborhood_vars and price_vars is something writing those files out yearly so they have to be read back in?

Another suggestion is not skimp on comments, both docstrings and inline comments. Like any code you're going to come back to this in future and wonder why you did something one way as opposed to another, and other people are going to be reading/modifying the code too. For your web UI idea the docstrings could be especially useful for providing descriptions of the models, tables, columns, etc. in the UI.

jiffyclub · 2014-08-07T00:33:36Z

One thing that occurred to me today is that models themselves could be injectables.

@sim.injectable('rsh_hedonic')
def rsh_hedonic():
    return models.RegressionModel.from_yaml('rsh.yaml')

@sim.model('rsh-estimate')
def rsh_estimate(rsh_hedonic, homesales, nodes):
    # joins or whatever
    rsh_hedonic.fit(df)
    rsh_hedonic.to_yaml('rsh.yaml')

@sim.model('rsh-simulate')
def rsh_simulate(rsh_hedonic, buildings, nodes):
    # joins or whatever
    new_stuff = rsh_hedonic.predict(df)
    buildings.update_col_from_series('residential_sales_price', new_stuff)

I haven't totally thought through how much code this would save you or how much more readable it would make the simulation, but it does seem like it would make the model instances a more explicit part of the simulation and potentially move some of the yaml specificity up out of the model classes.

…-2-compare-output remove unused variable

fscottfoti closed this as completed Sep 15, 2014

fscottfoti added a commit that referenced this issue Jul 29, 2016

Merge pull request #9 from MetropolitanTransportationCommission/patch…

539687f

…-2-compare-output remove unused variable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bayarea code review #9

bayarea code review #9

fscottfoti commented Jul 31, 2014

jiffyclub commented Jul 31, 2014

jiffyclub commented Aug 1, 2014

jiffyclub commented Aug 1, 2014

jiffyclub commented Aug 1, 2014

fscottfoti commented Aug 1, 2014

jiffyclub commented Aug 1, 2014

jiffyclub commented Aug 7, 2014

bayarea code review #9

bayarea code review #9

Comments

fscottfoti commented Jul 31, 2014

jiffyclub commented Jul 31, 2014

jiffyclub commented Aug 1, 2014

jiffyclub commented Aug 1, 2014

jiffyclub commented Aug 1, 2014

fscottfoti commented Aug 1, 2014

jiffyclub commented Aug 1, 2014

jiffyclub commented Aug 7, 2014