# 6 Important things you should know about Numpy and Pandas

The data manipulation capabilities of pandas are built on top of the numpy library. In a way, numpy is a dependency of the pandas library.
Pandas is best at handling tabular data sets comprising different variable types (integer, float, double, etc.). In addition, the pandas library can also be used to perform even the most naive of tasks such as loading data or doing feature engineering on time series data.
Numpy is most suitable for performing basic numerical computations such as mean, median, range, etc. Alongside, it also supports the creation of multi-dimensional arrays.
Numpy library can also be used to integrate C/C++ and Fortran code.
Remember, python is a zero indexing language unlike R where indexing starts at one.
The best part of learning pandas and numpy is the strong active community support you'll get from around the world.

In [1]:
import numpy as np

In [2]:
np.__version__

'1.16.1'

In [3]:
L=list(range(10))#list

In [4]:
[str(c) for c in L]#list comprehension

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [5]:
[type(item) for item in L]

[int, int, int, int, int, int, int, int, int, int]

# Creating arrays

Numpy arrays are homogeneous in nature, i.e., they comprise one data type (integer, float, double, etc.) unlike lists.

In [7]:
np.zeros(10,dtype = 'int')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [8]:
#matrix
np.zeros((3,5), dtype = float)

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [9]:
np.full((3,5),1.25)

array([[1.25, 1.25, 1.25, 1.25, 1.25],
       [1.25, 1.25, 1.25, 1.25, 1.25],
       [1.25, 1.25, 1.25, 1.25, 1.25]])

In [11]:
np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [12]:
np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [13]:
#creatong a matrix with mean 0 standard deviation 1
np.random.normal(0,1,(3,3))

array([[ 1.49876126, -0.60513931,  0.09405014],
       [-1.12503106, -0.67667229,  2.21334808],
       [ 0.58716023,  0.55238784, -0.53478221]])

In [14]:
#identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [20]:
np.random.seed(0)
x1 = np.random.randint(10, size =60)
x2 = np.random.randint(10,size =(3,4))
x3 = np.random.randint(14, size = (2,4,5))

In [21]:
x3.shape
x3.ndim
x3.size

40

In [22]:
print(x3)

[[[ 5 12  9 10  4]
  [11  4  6  4  4]
  [ 3 12  4  4  8]
  [ 4  3 10  7 13]]

 [[ 5  5  0  1  5]
  [ 9  3  0  5  0]
  [ 1  2  4  2  0]
  [13  3  2 10 13]]]


In [24]:
x = np.arange(10)

In [25]:
x[:5]

array([0, 1, 2, 3, 4])

In [26]:
x[4:]

array([4, 5, 6, 7, 8, 9])

In [28]:
x[4:7]

array([4, 5, 6])

In [29]:
x[: : 2]

array([0, 2, 4, 6, 8])

In [30]:
x[1::2]

array([1, 3, 5, 7, 9])

In [31]:
x[::-1]

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

# Array concatenation

In [32]:
x =np.array([1,2,3])
y =np.array([3,4,56])
z =[1,2,3,4,5,6]
np.concatenate([x,y,z])

array([ 1,  2,  3,  3,  4, 56,  1,  2,  3,  4,  5,  6])

In [34]:
#You can also use this function to create 2-dimensional arrays.
grid = np.array([[1,2,3],[1,23,4]])
np.concatenate([grid,grid])

array([[ 1,  2,  3],
       [ 1, 23,  4],
       [ 1,  2,  3],
       [ 1, 23,  4]])

In [35]:
np.concatenate([grid,grid],axis=1)

array([[ 1,  2,  3,  1,  2,  3],
       [ 1, 23,  4,  1, 23,  4]])

Until now, we used the concatenation function of arrays of equal dimension. But, what if you are required to combine a 2D array with 1D array? In such situations, np.concatenate might not be the best option to use. Instead, you can use np.vstack or np.hstack to do the task. Let's see how!

In [36]:
x = np.array([3,4,5])
grid = np.array([[1,2,3],[17,18,19]])
np.vstack([x,grid])

array([[ 3,  4,  5],
       [ 1,  2,  3],
       [17, 18, 19]])

In [37]:
x = np.array([3,4,5])
grid = np.array([[1,2,3],[17,18,19]])
np.vstack([x,grid])

array([[ 3,  4,  5],
       [ 1,  2,  3],
       [17, 18, 19]])

# Pandas

In [38]:
import pandas as pd

#create a data frame - dictionary is used here where keys get converted to column names and values to row values.

In [40]:
data =pd.DataFrame({'Country': ['India','China'],
                   'Rank':[1,2]})

In [41]:
data

Unnamed: 0,Country,Rank
0,India,1
1,China,2


In [42]:
data.describe()

Unnamed: 0,Rank
count,2.0
mean,1.5
std,0.707107
min,1.0
25%,1.25
50%,1.5
75%,1.75
max,2.0


In [43]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
Country    2 non-null object
Rank       2 non-null int64
dtypes: int64(1), object(1)
memory usage: 112.0+ bytes


In [44]:
data = pd.DataFrame({'group':['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],'ounces':[4, 3, 12, 6, 7.5, 8, 3, 5, 6]})
data

Unnamed: 0,group,ounces
0,a,4.0
1,a,3.0
2,a,12.0
3,b,6.0
4,b,7.5
5,b,8.0
6,c,3.0
7,c,5.0
8,c,6.0


In [45]:
#Let's sort the data frame by ounces - inplace = True
#will make changes to the data
data.sort_values(by=['ounces'], ascending =True,inplace = False)

Unnamed: 0,group,ounces
1,a,3.0
6,c,3.0
0,a,4.0
7,c,5.0
3,b,6.0
8,c,6.0
4,b,7.5
5,b,8.0
2,a,12.0


In [47]:
#We can sort the data by not just one column but multiple columns as well.
data.sort_values(by = ['group','ounces'],ascending = [True,False], inplace = False)

Unnamed: 0,group,ounces
2,a,12.0
0,a,4.0
1,a,3.0
5,b,8.0
4,b,7.5
3,b,6.0
8,c,6.0
7,c,5.0
6,c,3.0


In [49]:
data = pd.DataFrame({'k1':['one']*3 + ['two']*4, 'k2':[3,2,1,3,3,4,4]})
data

Unnamed: 0,k1,k2
0,one,3
1,one,2
2,one,1
3,two,3
4,two,3
5,two,4
6,two,4


In [50]:
data.drop_duplicates()

Unnamed: 0,k1,k2
0,one,3
1,one,2
2,one,1
3,two,3
5,two,4


In [51]:
data = pd.DataFrame({'food': ['bacon', 'pulled pork', 'bacon', 'Pastrami','corned beef', 'Bacon', 'pastrami', 'honey ham','nova lox'],
                 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})
data

Unnamed: 0,food,ounces
0,bacon,4.0
1,pulled pork,3.0
2,bacon,12.0
3,Pastrami,6.0
4,corned beef,7.5
5,Bacon,8.0
6,pastrami,3.0
7,honey ham,5.0
8,nova lox,6.0
