Data discussion #1

tanmaysharma19 · 2021-01-11T23:54:03Z

Discussing ideas for finalizing the dataset for the dashboard.

rtaph · 2021-01-12T01:42:24Z

Cancer dataset:

What I like:
- Many quantitative variables. This opens up potential for interesting graphs.
- Few variables have missing data
Limitations (what I like less):
- There only seems to be one categorical dimension (the Geography variable). If we want to build in dimensions as disaggregations/filters this might be a bit limiting. Of course, we could bin some categories if we wanted.
- The data is at the level of the county, not the individual (it is not microdata). This might make it a bit tougher to disaggregate and uncover patterns. It restricts us to inferences at the level of the county and above. Many levels are actually semi-aggregated levels (e.g. pctbachdeg18_24). This will make it hard to reshape the data since we cannot cross those columns while reshaping. We can probably only analyse the data marginally, rather than drill down by multiple variables.
- Even if we can aggregate and slice the data above, there is an additional level of complication needed in that we would likely need to weight every single measure by population size of the county.
- There are no time-series to plot. This is not mission critical but would be nice to have.
- This is a combined sample. This means that some variables likely clustered in the sampling design itself, which might mean we have a lot of holes in our data if we try to do geographic mapping at a certain level.
Potential uses / personas:
- Clinical researcher / scientific audience
- Policy planner (municipal gov’t)
- Physician

See auto-generated data profile report.

rtaph · 2021-01-12T02:27:24Z

NYC agency performance indicators from the FY20 Mayor's Management Report:

What I like:
- It's a collection of KPIs, which is something that people naturally create dashboards for.
- It has time depth (FY16-FY20: probably more years in older datasets) so can be visualized as several series.
- It has target data. We could make something similar to this scorecard I made (unrelated), or use bullet charts.
- Appears quite complete and clean.
- There probably is a written report somewhere where we can get a lot more information about the data, and validate that our summaries match.
Limitations (what I like less):
- Other than the year, there are not many disaggregating variables. Maybe we need/want more than what exists in the data? We might be able to augment the data ourselves by, say, classifying KPIs into themes.
- It could also be overwhelming if we have too many KPIs.
- might be hard to determine which indicators can safely be summed from year to year (vs. being cumulative or needing distinct counts)
- The value variables is represented different ways for different KPIs (e.g. 72 calls, 0:11 average wait time, $12,300 dollars, "↓" target, 99.2%)
- KPIs have subsetted sections making it a bit tricky to work with. E.g. "– Robbery" is a sub-bullet KPI of "Major felony crime"
Potential uses / personas:
- Mayor of NYC
- Residents of NYC (taxpayer), accountability dashboard

rtaph · 2021-01-12T03:22:59Z

OECD Business Tendency Data:

What I like:
- OECD generally has very complete and reliable, especially on economics.
- Monthly data for all OECD countries going back a decade
- Many economic metrics
- Can disaggregate/filter by countries (or regions) and industry
Limitations (what I like less):
- Would be nicer to have more disaggregating variables
- I don't think we can access micro-data.
- Likely requires weights for aggregation.
- Metrics are all on a relative scale. Might make it a little less intuitive for the average person.
Potential uses / personas:
- Economists (which metrics are trending up or down)
- Public servants (planning agencies)
Related ideas:
- Compare these series with Coronavirus time series to see the impact the pandemic has had on numerous economic measures

dusty736 · 2021-01-12T06:28:45Z

World Happiness Report

What I like:
- Complete Dataset
- Contains spatiotemporal features
- All numeric features
- Clear features to filter on (freedom level, life expectancy, generosity, etc)
Limitations:
- Only 9 common features across data
- Some years have different features
Potential Uses:
- Travel companies (moving recommendations)
- Public servants (seeing what makes people happy)

jraza19 · 2021-01-12T07:04:10Z

Obesity dataset

What I like:
- simple dataset with time as a variable
- imagine a nice interactive map that could be created
- data was pre cleaned
Limitations
- Only a 3 variables
Personas/usage
- international government agencies
- non profit organizations
- dieticians/public health professionals

tanmaysharma19 · 2021-01-13T00:02:32Z

COVID data

What I like:
- contemporary dataset
- 55 variables
- time series data
- data for all countries
- can subset data easily across time, countries and attributes
Limitations
- some missing data from first half of 2020
Personas/usage
- covid researchers
- public health agencies
- government agencies

rtaph · 2021-01-13T00:06:44Z

Desiderata:
• Micro-data (each row represents one unit, without aggregation)
• 5+ categorical dimensions for filtering/disaggregating
• 5+ numeric measures
• Geographic variables (ideally hierarchical, e.g. municipality -> province -> country).
• time-series data
• little to no missing data
• no need for weighting

rtaph · 2021-01-16T23:09:40Z

Closing this issue out. The team decided to go with the obesity data during a team meeting.

tanmaysharma19 assigned rtaph, dusty736 and jraza19 Jan 11, 2021

tanmaysharma19 added this to To do in Group1-dashboard Jan 12, 2021

tanmaysharma19 linked a pull request Jan 12, 2021 that will close this issue

Update README #3

Merged

tanmaysharma19 removed a link to a pull request Jan 12, 2021

Update README #3

Merged

rtaph assigned tanmaysharma19 Jan 12, 2021

rtaph moved this from To do to In progress in Group1-dashboard Jan 12, 2021

rtaph closed this as completed Jan 16, 2021

Group1-dashboard automation moved this from In progress to Done Jan 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data discussion #1

Data discussion #1

tanmaysharma19 commented Jan 11, 2021

rtaph commented Jan 12, 2021

rtaph commented Jan 12, 2021 •

edited

Loading

rtaph commented Jan 12, 2021

dusty736 commented Jan 12, 2021

jraza19 commented Jan 12, 2021 •

edited

Loading

tanmaysharma19 commented Jan 13, 2021

rtaph commented Jan 13, 2021

rtaph commented Jan 16, 2021

Data discussion #1

Data discussion #1

Comments

tanmaysharma19 commented Jan 11, 2021

rtaph commented Jan 12, 2021

rtaph commented Jan 12, 2021 • edited Loading

rtaph commented Jan 12, 2021

dusty736 commented Jan 12, 2021

jraza19 commented Jan 12, 2021 • edited Loading

tanmaysharma19 commented Jan 13, 2021

rtaph commented Jan 13, 2021

rtaph commented Jan 16, 2021

rtaph commented Jan 12, 2021 •

edited

Loading

jraza19 commented Jan 12, 2021 •

edited

Loading