## Problem Statement


You are a data scientist working for a sports analytics company. The dataset, named **"sports_league_data.csv"** includes a variety of attributes for players in the league. The columns in the dataset are as follows:

- **player_id:** A unique identifier for each player.
- **age:** Age of the player.
- **position:** The position the player typically plays in (e.g., Forward, Midfielder, Defender, Goalkeeper).
- **team:** The team the player belongs to.
- **games_played:** The total number of games played by the player in the season.
- **player_rating:** An overall performance rating for the player for the season.

In [2]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import seaborn as sns 


**Import Necessary Libraries**

In [4]:
# Pandas display option for cleaner visibility 
pd.set_option("display.max_columns",None)
pd.set_option("display.width",120)

In [5]:
#Loading the dataset 
path = "sports_league_data.csv"
df = pd.read_csv(path)


## Task1

1. Import the data from the "sports_league_data.csv" file.
2. display the number of rows and columns. 
3. Display the first few rows of the dataset to get an overview.


In [7]:
# number of rows and columns 
n_rows,n_column = df.shape
print(f"shape:{n_rows} rows * {n_column} columns")
#Previewing first few rows 
display(df.head())

# ---- (Recommended quick sanity checks – keep these) ----
print("\nColumn dtypes:")
print(df.dtypes)

print("\nMissing values per column:")
print(df.isna().sum())

shape:1000 rows * 6 columns


Unnamed: 0,player_id,age,position,team,games_played,player_rating
0,P0001,34,Forward,Team 14,29,7.2
1,P0002,29,Defender,Team 15,27,8.3
2,P0003,31,Goalkeeper,Team 4,39,6.4
3,P0004,36,Goalkeeper,Team 6,35,6.6
4,P0005,34,Defender,Team 14,26,7.2



Column dtypes:
player_id         object
age                int64
position          object
team              object
games_played       int64
player_rating    float64
dtype: object

Missing values per column:
player_id        0
age              0
position         0
team             0
games_played     0
player_rating    0
dtype: int64


## Task2:

**Age Analysis:**
- Mean: Calculate the average age of players in the league.
- Standard Deviation Intervals:
    - Calculate the age range within one standard deviation of the mean.
    - Calculate the age range within two standard deviations of the mean.
    - Calculate the age range within three standard deviations of the mean.
  

## Task3:

**Player Rating Analysis:**
- Mean: Determine the average player rating across the league.
- Standard Deviation Intervals:
    - Calculate the player rating range within one standard deviation of the mean.
    - Calculate the player rating range within two standard deviations of the mean.
    - Calculate the player rating range within three standard deviations of the mean.

## Task4:

Create histogram with Kernel Density Estimation (KDE) plots for both the 'age' and 'player_rating' columns from the dataset. While constructing these plots, ensure to visually represent the standard deviation intervals on the graph.


- **'age'**

- **'player_rating'**

## Task5

- Calculate the lower and upper boundaries for the 'Age' and 'Player_Rating' columns. 
- For each column, identify values that fall outside the calculated lower and upper boundaries. These values are considered outliers.

- **Display age outliers**

- **Display player_rating outliers**

## Task6

-  Remove outliers from both the 'age' and 'player_rating' columns and then visualize the adjusted data using histogram plots.


- **plot the histograms**