# Introduction
 One package you absolutely need to learn for data science, and it’s called pandas.
 
 Pandas is a core package plus features from a variety of other packages. And that’s great, because you can work only using pandas.

pandas is like Excel in Python: it uses tables (namely DataFrame) and operates transformations on the data. But it can do a lot more.

# Elementary functions 

## Read

In [1]:
import pandas as pd

### Load .csv file

In [None]:
df=pd.read_csv("data.csv")
df=df.loc[:,["e_id","kpi","Date","Week_Num","absolute","target_monthly","Target_perc"]]
df=df.loc[:100,:]

In [8]:
df.head()

Unnamed: 0,e_id,kpi,Date,Week_Num,absolute,target_monthly,Target_perc
0,100001,KPI1_First_Time_Right_Quantity,06-06-2018,23,42,30,1.4
1,100001,KPI1_First_Time_Right_Quantity,07-06-2018,23,0,30,0.0
2,100001,KPI1_First_Time_Right_Quantity,10-06-2018,24,-21,30,-0.7
3,100001,KPI1_First_Time_Right_Quantity,11-06-2018,24,0,30,0.0
4,100001,KPI1_First_Time_Right_Quantity,12-06-2018,24,0,30,0.0


### Load .xlsx file

## Write 

In [None]:
df.to_csv("data.csv", sep=",", index=False)

## Checking the data

In [None]:
df.shape

In [None]:
df.describe()

In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 392530 entries, 0 to 392529
Data columns (total 7 columns):
e_id              392530 non-null object
kpi               392530 non-null object
Date              392530 non-null object
Week_Num          392530 non-null int64
absolute          392530 non-null int64
target_monthly    392530 non-null int64
Target_perc       347755 non-null float64
dtypes: float64(1), int64(3), object(3)
memory usage: 21.0+ MB


In [13]:
df.columns

Index(['e_id', 'kpi', 'Date', 'Week_Num', 'absolute', 'target_monthly',
       'Target_perc'],
      dtype='object')

## Seeing the data

In [None]:
df.head(3)

In [None]:
df.loc[8]

In [None]:
df.loc[8,"kpi"]

In [None]:
df.loc[range(4,6)]

## Get data by feature name

In [23]:
df.loc[:,"kpi"].head(5)

0    KPI1_First_Time_Right_Quantity
1    KPI1_First_Time_Right_Quantity
2    KPI1_First_Time_Right_Quantity
3    KPI1_First_Time_Right_Quantity
4    KPI1_First_Time_Right_Quantity
Name: kpi, dtype: object

## Summary information about your data

In [None]:
# Sum of values in a data frame
df.sum()
# Lowest value of a data frame
df.min()
# Highest value
df.max()
# Index of the lowest value
df.idxmin()
# Index of the highest value
df.idxmax()
# Statistical summary of the data frame, with quartiles, median, etc.
df.describe()
# Average values
df.mean()
# Median values
df.median()
# Correlation between columns
df.corr()
# To get these values for only one column, just select it like this#
df["size"].median()

# Basic Data Handling

## Drop missing data

In [None]:
df.dropna(axis=0, how='any')

## Replace missing data

In [None]:
df.replace(to_replace=None, value=None)

## Check for NANs

Detect missing values (NaN in numeric arrays, None/NaN in object arrays)

In [14]:
pd.isnull(object)

False

## Drop a feature

In [16]:
df.drop('feature_variable_name', axis=1)

## Convert object type to float

In [None]:
pd.to_numeric(df["feature_name"], errors='coerce')

## Convert data frame to numpy array 

In [17]:
df.as_matrix()

  """Entry point for launching an IPython kernel.


array([['100001', 'KPI1_First_Time_Right_Quantity', '06-06-2018', ...,
        42, 30, 1.4],
       ['100001', 'KPI1_First_Time_Right_Quantity', '07-06-2018', ..., 0,
        30, 0.0],
       ['100001', 'KPI1_First_Time_Right_Quantity', '10-06-2018', ...,
        -21, 30, -0.7],
       ...,
       ['L99902', 'KPI8_Subscriptions_of_Rs.1500_Plan_SoftSkew',
        '23-07-2018', ..., 0, 8, 0.0],
       ['L99902', 'KPI8_Subscriptions_of_Rs.1500_Plan_SoftSkew',
        '25-07-2018', ..., 5, 8, 0.625],
       ['L99902', 'KPI8_Subscriptions_of_Rs.1500_Plan_SoftSkew',
        '31-07-2018', ..., 1, 8, 0.125]], dtype=object)

## Sorting your data

In [None]:
df.sort_values(ascending = False)

## Boolean indexing

In [None]:
df[df["size"] == 5]

## Apply

In [None]:
df["absolute"].apply(lambda height: 2 * height)

## Rename

In [None]:
df.rename(columns = {df.columns[2]:'size'}, inplace=True)

##  Get the unique entries of a column 

In [None]:
df["name"].unique()