### NumPy - Exercise Complete Baseball

In this exercise, our goal is to harness the capabilities of Python, specifically leveraging key libraries such as `numpy` and `pandas`, to conduct a thorough analysis of a baseball dataset. Through the use of `pandas`, we read the dataset into a DataFrame, allowing us to explore and extract pertinent information such as player weights and heights. The subsequent conversion of these data points into NumPy arrays enables us to perform essential array operations, access individual elements, and create a 2D array for more in-depth analysis. As we delve into array properties, calculate basic statistics on height, including average and median, and explore the correlation between height and weight, this Jupyter Notebook serves as an interactive guide, providing a hands-on experience in utilizing Python for data manipulation and statistical insights, all within the fascinating context of baseball analytics.


In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np

# Step 1: Read the MLB (Baseball) dataset into a pandas DataFrame
mlb = pd.read_csv("MLB (Baseball).txt")
print(mlb.head(3))  # Print the first 3 rows of the DataFrame

# Step 2: Extract weight and height columns from the DataFrame
weight_lb = mlb['Weight'].tolist()
height_in = mlb['Height'].tolist()

# Step 3: Store weight and height lists as numpy arrays
np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)

# Step 4: Access individual elements in the arrays
print("Weight at index 50:", np_weight_lb[49])  # Indexing starts from 0
print("Height from index 100 to 110:", np_height_in[99:110])

# Step 5: Create np_baseball from the DataFrame
np_baseball = mlb[['Height', 'Weight']].to_numpy()

# Step 6: Explore the properties of np_baseball
print("Type of np_baseball:", type(np_baseball))
print("Shape of np_baseball:", np_baseball.shape)
print("50th row of np_baseball:", np_baseball[49, :])
print("Height of 124th player:", np_baseball[123, 0])

# Step 7: Calculate statistics for height, same way to do for "weight"
avg_height = np.mean(np_baseball[:, 0])
med_height = np.median(np_baseball[:, 0])
stddev_height = np.std(np_baseball[:, 0])
corr_height_weight = np.corrcoef(np_baseball[:, 0], np_baseball[:, 1])

# Step 8: Print height statistics, same way to do for "weight"
print("Average Height:", avg_height)
print("Median Height:", med_height)
print("Standard Deviation of Height:", stddev_height)
print("Correlation between Height and Weight:", corr_height_weight[0, 1])

# Step 9: Print additional statistics for height, same way to do for "weight"
print("Mean of Height:", np.mean(np_height_in))
print("Median of Height:", np.median(np_height_in))


              Name Team Position  Height  Weight    Age PosCategory
0    Adam_Donachie  BAL  Catcher      74     180  22.99     Catcher
1        Paul_Bako  BAL  Catcher      74     215  34.69     Catcher
2  Ramon_Hernandez  BAL  Catcher      72     210  30.78     Catcher
Weight at index 50: 195
Height from index 100 to 110: [74 73 74 72 73 69 72 73 75 75 73]
Type of np_baseball: <class 'numpy.ndarray'>
Shape of np_baseball: (1015, 2)
50th row of np_baseball: [ 70 195]
Height of 124th player: 75
Average Height: 73.6896551724138
Median Height: 74.0
Standard Deviation of Height: 2.312791881046546
Correlation between Height and Weight: 0.5315393226146092
Mean of Height: 73.6896551724138
Median of Height: 74.0
