# Useful methods and operations
* unique(), nunique(), value_counts(), sort_values(), apply(), index, column

In [1]:
import numpy as np
import pandas as pd
data_dic = {'col_1':[1,2,3,4,5],
           'col_2':[111,222,333,111,555],
           'col_3':['alpha','bravo','charlie',np.nan,np.nan],
           }
df = pd.DataFrame(data_dic)
df

Unnamed: 0,col_1,col_2,col_3
0,1,111,alpha
1,2,222,bravo
2,3,333,charlie
3,4,111,
4,5,555,


### <code>unique()</code>
Find and returns all the unique values.<br>
Lets see how it works on all the columns in our dataframe.

In [None]:
df

In [3]:
df['col_2'].unique()

array([111, 222, 333, 555], dtype=int64)

In [4]:
print(df['col_1'].unique())
print(df['col_2'].unique())
print(df['col_3'].unique())
# 111 and NaN are repeated values, unique will only return once. 

[1 2 3 4 5]
[111 222 333 555]
['alpha' 'bravo' 'charlie' nan]


### <code>nunique()</code>
Find returns "how many unique values exist".<br>
&#9758; Notice the difference, for NaN, it count a missing value and returns "3" for col_3.

In [5]:
print(df['col_1'].nunique())
print(df['col_2'].nunique())
print(df['col_3'].nunique())

5
4
3


### <code>value_counts()</code>
We want a table with all the values along with no. of times they appeared in our data, value_counts do the work here!<br>
&#9758; for NaN, it count a missing value, nothing in the output.

In [6]:
df

Unnamed: 0,col_1,col_2,col_3
0,1,111,alpha
1,2,222,bravo
2,3,333,charlie
3,4,111,
4,5,555,


In [7]:
df['col_2'].value_counts()

col_2
111    2
222    1
333    1
555    1
Name: count, dtype: int64

In [8]:
df['col_2'].count()

5

In [9]:
df['col_3'].value_counts()

col_3
alpha      1
bravo      1
charlie    1
Name: count, dtype: int64

### <code>sort_values()</code>
by default:<br>
* <code>ascending=True
* inplace=False</code> 

In [10]:
df

Unnamed: 0,col_1,col_2,col_3
0,1,111,alpha
1,2,222,bravo
2,3,333,charlie
3,4,111,
4,5,555,


In [11]:
df.sort_values(by='col_2') # select * from df order by col_2

Unnamed: 0,col_1,col_2,col_3
0,1,111,alpha
3,4,111,
1,2,222,bravo
2,3,333,charlie
4,5,555,


In [12]:
df.sort_values(by='col_2', ascending=False) # select * from df order by col_2

Unnamed: 0,col_1,col_2,col_3
4,5,555,
2,3,333,charlie
1,2,222,bravo
0,1,111,alpha
3,4,111,


### <code>apply()</code>
Indeed, this is one of the most powerful pandas feature. Using <code>**apply()**</code> method, we can **broadcast** our **customized functions** on our data.<br>
Let's see how to calculate square of col_1 

In [13]:
# Our customized function to calculate the squares 
def square(value):
    return value * value

* Let's broadcast our customized function <code>"square"</code> using <code>"apply"</code> method to calculate squares of the col_1 in our DataFrame, df.

In [14]:
df

Unnamed: 0,col_1,col_2,col_3
0,1,111,alpha
1,2,222,bravo
2,3,333,charlie
3,4,111,
4,5,555,


In [15]:
square(3)

9

In [16]:
df['col_1'].apply(square)

0     1
1     4
2     9
3    16
4    25
Name: col_1, dtype: int64

* The same operation can be conveniently carried out using state of the art <code>**lambda**</code> expression!

In [17]:
df['col_1'].apply(lambda value:value*value)

0     1
1     4
2     9
3    16
4    25
Name: col_1, dtype: int64

## Good to know

In [18]:
# Getting index names
df.index

RangeIndex(start=0, stop=5, step=1)

In [19]:
# Getting column names
df.columns

Index(['col_1', 'col_2', 'col_3'], dtype='object')

In [20]:
df

Unnamed: 0,col_1,col_2,col_3
0,1,111,alpha
1,2,222,bravo
2,3,333,charlie
3,4,111,
4,5,555,


In [21]:
newdf = df.copy()
newdf

Unnamed: 0,col_1,col_2,col_3
0,1,111,alpha
1,2,222,bravo
2,3,333,charlie
3,4,111,
4,5,555,


In [22]:
aa = df
aa

Unnamed: 0,col_1,col_2,col_3
0,1,111,alpha
1,2,222,bravo
2,3,333,charlie
3,4,111,
4,5,555,


In [23]:
del newdf['col_1']

In [24]:
newdf

Unnamed: 0,col_2,col_3
0,111,alpha
1,222,bravo
2,333,charlie
3,111,
4,555,


In [25]:
df

Unnamed: 0,col_1,col_2,col_3
0,1,111,alpha
1,2,222,bravo
2,3,333,charlie
3,4,111,
4,5,555,


In [26]:
newdf['dd']=2

In [27]:
newdf

Unnamed: 0,col_2,col_3,dd
0,111,alpha,2
1,222,bravo,2
2,333,charlie,2
3,111,,2
4,555,,2


In [28]:
newdf.insert(1, 'new-col2', [1,2,3,4,5])
newdf

Unnamed: 0,col_2,new-col2,col_3,dd
0,111,1,alpha,2
1,222,2,bravo,2
2,333,3,charlie,2
3,111,4,,2
4,555,5,,2


In [29]:
newdf['dd'] = newdf['col_2'] * newdf['new-col2']

In [30]:
newdf['ddd']=11

In [31]:
newdf

Unnamed: 0,col_2,new-col2,col_3,dd,ddd
0,111,1,alpha,111,11
1,222,2,bravo,444,11
2,333,3,charlie,999,11
3,111,4,,444,11
4,555,5,,2775,11


# Great Job!
This was little long, but you did it! Let's have a quick over view and move on to use the skills we have learned in the coming exercises!<br>
This was all about pandas that we wanted to learn. <br>
&#9989; Keep practicing to brush-up and add new skills.