# income.csv

In [1]:
import pandas as pd
import numpy as np 

In [2]:
df=pd.read_csv("income.csv",names=["name","income"],skiprows=[0])
df

Unnamed: 0,name,income
0,Rob,5000
1,Rafiq,6000
2,Nina,4000
3,Sofia,7500
4,Mohan,8000
5,Tao,7000
6,Elon Musk,10000000


In [3]:
df.head()

Unnamed: 0,name,income
0,Rob,5000
1,Rafiq,6000
2,Nina,4000
3,Sofia,7500
4,Mohan,8000


In [5]:
df.income.describe()

count    7.000000e+00
mean     1.433929e+06
std      3.777283e+06
min      4.000000e+03
25%      5.500000e+03
50%      7.000000e+03
75%      7.750000e+03
max      1.000000e+07
Name: income, dtype: float64

Metric | Value | Meaning

count | 7.000000e+00 = 7 | There are 7 income entries in total.

mean | 1.433929e+06 = 1,433,929 | The average income is ~1.43 million.

std | 3.777283e+06 = 3,777,283 | The standard deviation is very high, meaning income values are widely spread.

min | 4.000000e+03 = 4,000 | The lowest income is ₹4,000.

25% (Q1) | 5.500000e+03 = 5,500 | 25% of the data lies below ₹5,500.

50% (median) | 7.000000e+03 = 7,000 | The middle value (median) is ₹7,000.

75% (Q3) | 7.750000e+03 = 7,750 | 75% of the data lies below ₹7,750.

max | 1.000000e+07 = 10,000,000 | The highest income is ₹1 crore (₹10 million).

### Insights
The mean (₹1.43M) is much higher than the median (₹7,000) — this suggests the presence of extreme outliers (like ₹10M).

The standard deviation is huge, indicating that the income data is highly skewed.

Most of the incomes (up to 75%) are below ₹7,750, but the maximum is ₹10M, showing a massive jump at the top.

This dataset contains mostly low-income values, with one or more extremely high-income outliers skewing the average. The median gives a better representation of a typical income than the mean in this case

In [6]:
# count -number of values present 
# mean-1.43 


In [7]:
df.income.quantile(0)

4000.0

In [8]:
df.income.quantile(0.25)

5500.0

In [9]:
df.income.quantile(0.50)

7000.0

In [10]:
df.income.quantile(0.75)

7750.0

In [11]:
df.income.quantile(1)

10000000.0

In [16]:
df['income'][3]=np.NaN

In [17]:
df

Unnamed: 0,name,income
0,Rob,5000.0
1,Rafiq,6000.0
2,Nina,4000.0
3,Sofia,
4,Mohan,8000.0
5,Tao,7000.0
6,Elon Musk,10000000.0


In [18]:
df.income.mean()

1671666.6666666667

In [20]:
df_new=df.fillna(df.income.mean())
df_new

Unnamed: 0,name,income
0,Rob,5000.0
1,Rafiq,6000.0
2,Nina,4000.0
3,Sofia,1671667.0
4,Mohan,8000.0
5,Tao,7000.0
6,Elon Musk,10000000.0


In [21]:
df_new=df.fillna(df.income.median())
df_new

Unnamed: 0,name,income
0,Rob,5000.0
1,Rafiq,6000.0
2,Nina,4000.0
3,Sofia,6500.0
4,Mohan,8000.0
5,Tao,7000.0
6,Elon Musk,10000000.0
