Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent definition of tidy data #968

Closed
andtheWings opened this issue May 22, 2020 · 0 comments
Closed

Inconsistent definition of tidy data #968

andtheWings opened this issue May 22, 2020 · 0 comments

Comments

@andtheWings
Copy link

On the tidyr website index, the Github README, and R4DS, tidy data is defined as:

  1. Every column is variable.
  2. Every row is an observation.
  3. Every cell is a single value.

Then in the Tidy data article derived from vignettes/tidy-data.Rmd, it's defined as:

  1. Each variable forms a column.
  2. Each observation forms a row.
  3. Each type of observational unit forms a table.

Using these definitions changes how I label this motor vehicle collision dataset.

From my assessment it meets rules 1-3 for the first definition so I would call it tidy. But using the second definition, it fits rules 1-2, but violates rule 3. There are variables corresponding to three different observational units:

  1. Individual involved in a collision event:
    • PERSONNMB -- Unique numeric sequence value for each person associated with a collision and a vehicle.
    • GENDERCDE -- Code indicating person's gender.
  2. Vehicle involved in a collision event:
    • UNIT_MR_NUMBER -- Unique numeric sequence value for each vehicle in a collision.
    • VEHMAKETXT -- Description indicating the name of the manufacturer of the vehicle.
  3. Collision event:
    • INDIVIDUAL_MR_RECORD -- Unique identifier for each collision.
    • INJUREDNMB -- Total number of people injured in the collision.

In this sense, does it mean the dataset isn't tidy?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants