-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
27 create db tables aka django modelspy #50
27 create db tables aka django modelspy #50
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Matches our data model!
n_started = models.IntegerField() | ||
n_ended = models.IntegerField() | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you JP! This looks great, the models are consistent with what we were expecting based on the documentation.
app/route_rangers_api/models.py
Outdated
class TransitStation(models.Model): | ||
station_id = models.CharField(max_length=30, primary_key=True) | ||
location = models.PointField() | ||
route = models.CharField(max_length=30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a route name (like "Red Line") or route_id like "Chicago_CTA_R" or whatever it ends up being? Given that it's set up way, I take it we went with separate entities for each line for a multi-line stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand Django well enough yet to do a detailed review of the syntax (can do a deeper dive with you or by myself this week if requested); approving this since nothing looks blatantly contrary to the data model.
Once we have proper route path ingestion (i.e. once #43 pull request is in) we may want to create a new issue for adding existing route paths to the db (unless I'm mistaken/missing it here or it's somehow "too late"); class ExistingRoute(models.Model)
or similar; my guess is they'd need a MultiLineStringField()
for route (to handle lines with branches, like the CTA Green line) and some other attributes like a string representing hex for route color.
Sorry for so many nitpick comments. I think the main thing that I was not detailed in checking on the tables is that each table needs a primary key (usually an id field) and the columns that reference other tables need to be the same type, ideally same name and be set as foreign keys. Otherwise all of the types look good as far as my understanding of max char lengths and things. Also Matt makes a good point about maybe adding a color field for ease of visualization but I'm not sure we need that immediately, I think it would be easy to tack on later. Happy to sit tomorrow evening/wednesday and go through it if you have questions. |
I am not sure if this completely answers your question, and @JPMartinezClaeys correct me if I am misstating, but the |
When a primary key is not specified Django creates a primary key for the table which is basically an integer from 1 to N. |
Hi, we wanted to add you as reviewers for this PR given that defining the models correctly will be key for ingestion. We have the following tables in our model:
We have two main questions that we would like to get more detailed feedback:
The second issue is less time pressing given that there is no ingestion that is completely necessary to move forward with the project. Thanks! |
For #2, this is a really good use of a JSONField generally speaking, I'd probably do something similar. A few notes on downsides to keep in mind:
I'm taking a closer look at #1 now. |
|
||
################################# | ||
######## SURVEY MODELS ########## | ||
################################# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small note, these would typically be in their own app when they are this distinct. not necessary for the project, but wanted to mention as a "best practice"
We can discuss a bit today, but I think it might make sense to just go with two different tables that each have a FK to their respective unit. I'm not sure what cross-unit analysis you plan to/need to do, so that might not be the right call, but it'd be preferable to have the model tied directly to the table in question, and aside from something called Generic Foreign Keys which are tough to work with and a bit of a hack, that'd require two different (though nearly identical) tables. |
…tations instances
…tions and lines and split ridership data
app/route_rangers_api/models.py
Outdated
|
||
|
||
class Demographics(models.Model): | ||
census_tract = models.CharField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember @benjaleivas was ingesting at the block level, so as long as he is good with aggregating to the tract level by the time it goes to the db and that wont lose info for the frontend needs (I am pretty sure this is ok but want to double check)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified to the block level
app/route_rangers_api/models.py
Outdated
population = models.IntegerField() | ||
# age = models.IntegerField() | ||
median_income = models.IntegerField() | ||
transportation_to_work = models.CharField( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max chars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing it!
verbose_name="Means of Transportation to Work" | ||
) | ||
work_commute_time = models.FloatField(verbose_name="Time of commuto to work") | ||
vehicles_available = models.IntegerField() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would this be a float median/mean estimate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this field is a count of vehicles
app/route_rangers_api/models.py
Outdated
transportation_to_work = models.CharField( | ||
verbose_name="Means of Transportation to Work" | ||
) | ||
work_commute_time = models.FloatField(verbose_name="Time of commuto to work") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would help at some point maybe from @benjaleivas to clarify the units commute in min? hour?
app/route_rangers_api/models.py
Outdated
) | ||
work_commute_time = models.FloatField(verbose_name="Time of commuto to work") | ||
vehicles_available = models.IntegerField() | ||
disability_status = models.IntegerField() # Check the type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe an enum? disabled, not disabled, temporarily disabled?
app/route_rangers_api/models.py
Outdated
class TransitStation(models.Model): | ||
station_id = models.CharField(max_length=30, primary_key=True) | ||
location = models.PointField() | ||
route = models.CharField(max_length=30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think multiple routes have the same station_id (if we go along with the Roosevelt
chat we had) so route
or route_id
would also need to be set as a primary key
app/route_rangers_api/models.py
Outdated
""" | ||
Class that represent subway lines and bus routes | ||
""" | ||
city = models.CharField(max_length=30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to set cities as a choice of ['CHICAGO', 'NYC', 'PORTLAND']
like this https://docs.djangoproject.com/en/5.0/ref/models/fields/#choices
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the model
app/route_rangers_api/models.py
Outdated
Class that represents bus stops and station complex | ||
(i.e. CTA - Roosevelt) | ||
""" | ||
city = models.CharField(max_length=30) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same choice
type comment here as above
""" | ||
station = models.ForeignKey(TransitStation, on_delete=models.PROTECT) | ||
route = models.ForeignKey(TransitRoute, on_delete=models.PROTECT) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!!
route = models.ForeignKey(TransitRoute, on_delete=models.PROTECT) | ||
date = models.DateField() | ||
ridership = models.IntegerField() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For @jamesturk and @divij-sinha we decided to go with these RidershipRoute
and RidershipStation
tables because we realized that depending on the city the ridership metrics may occur at different levels (i.e. I think Portland may have route level ridership for the subway lines whereas the other cities have them at the station level).
For now it seems like we will have to create one view for Portland ridership and one view for Chicago/NYC ridership since they will pull from different tables so if you suggest handling this differently, i.e. add a city
column and mode
to each table or an association table please advise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JPMartinezClaeys depending on the response to this question, the TransitModes
might need to be accessible by Ridership instead of nested within TransitRoute
@jamesturk @divij-sinha we made some modifications to the model after the conversation we had yesterday. |
app/geodjango/settings.py
Outdated
@@ -124,7 +130,7 @@ | |||
|
|||
LANGUAGE_CODE = "en-us" | |||
|
|||
TIME_ZONE = "UTC" | |||
TIME_ZONE = "America/Chicago" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want a gut check on this. I don't think we currently have anything more granular than day-level dates so this shouldn't cause huge problems, but if we were to have any daily schedule data or if we had users submit the timeframe they want to take a particular route we will have to translate to Chicago time.
Usually when there is conversion I have seen everything get converted to UTC so that we know it is all fully standardized, but if you are ok with keeping Chicago then that is fine, we will just have to note that in future hourly data for non-chicago places will need to be translated.
Here is a pass at the Django models to start ingesting data.
This is based on the diagram and md file that describes the data model for milestone 4.
The only difference is that since the subway information is at the station level and the bus data at the route level I changed the first field from a Foreign Key that matches to the TransitStation class to a CharField that we could still match to the TransitStation table.
If something in the model makes you think it will make ingestion too complicated, let me know so we can discuss how to adapt the models.