Skip to content

Missing Height value in SOCR_MLB.tsv dataset  #525

Open
@Rayycoding

Description

@Rayycoding

Missing height value in the SOCR_MLB.tsv dataset

The 'Height' column has a missing value in one of the rows

  • To Reproduce:
  1. Go to the 04-stats-and-probability directory
  2. Open the notebook.ipynb
  3. Go to the Correlation and Evil Baseball Corp section
  4. Run the whole notebook
  5. From this section and below you will see a few nan values in the outputs.
  • Expected behavior
    Numerical outputs for correlation were expected but due to the missing value, nan ended up being the output for a few cells.

  • Screenshots
    image
    image
    image

  • If you want to see the missing row just go to the Analyzing Real Data section and add this code

height_is_null = df['Height'].isnull()

-- Use boolean indexing to display rows where 'Height' is null
rows_with_null_height = df[height_is_null]

--Print the resulting DataFrame
print(rows_with_null_height)

below the first cell in that section (after reading and printing the dataset)

The output should look like this: 640 Kirk_Saarloos CIN Starting_Pitcher 72 NaN 27.77

NOTE that if you want to use the notebook which is located in the data directory, you can use the CTRL + F command and search for 'Kirk_Saarloos' to see the row with the missing value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions