# Numpy Basics

Welcome to section of Numpy and Pandas. This is the most used Python libraries for data science. NumPy consists of a powerful data structure called multidimensional arrays. Pandas is another powerful Python library that provides fast and easy data analysis platform.

NumPy is a library written for scientific computing and data analysis. It stands for numerical python and also known as array oriented computing.

The most basic object in NumPy is the ndarray, or simply an array which is an n-dimensional, homogeneous array. By homogenous, we mean that all the elements in a NumPy array have to be of the same data type, which is commonly numeric (float or integer).


 # Why Numpy?
 convenience & speed
 
 Numpy is much faster than the standard python ways to do computations.
 
Vectorised code typically does not contain explicit looping and indexing etc. (all of this happens behind the scenes, in precompiled C-code), and thus it is much more concise.

Also, many Numpy operations are implemented in C which is basically being executed behind the scenes, avoiding the general cost of loops in Python, pointer indirection and per-element dynamic type checking. The speed boost depends on which operations you're performing.
 
 NumPy arrays are more compact than lists, i.e. they take much lesser storage space than lists

In [2]:
import numpy

In [24]:
numpy.array([[1,2,3],[4,5,6]])

array([[1, 2, 3],
       [4, 5, 6]])

In [3]:
import numpy as np

In [26]:
a = np.array([1,2,3])

In [27]:
a

array([1, 2, 3])

In [32]:
b = np.array([[1.0,2.0,3.3],[4.6,5.9,6.9],[3,8,9]])

In [33]:
b

array([[1. , 2. , 3.3],
       [4.6, 5.9, 6.9],
       [3. , 8. , 9. ]])

In [30]:
b.shape

(3, 3)

In [7]:
a.shape

(3,)

In [34]:
b.dtype

dtype('float64')

In [35]:
b.ndim

2

In [36]:
print(type(a))

<class 'numpy.ndarray'>


In [37]:
print(type(b))

<class 'numpy.ndarray'>


In [38]:
np.arange(10,20,3)

array([10, 13, 16, 19])

# Performance measurement
I mentioned that the key advantages of numpy are convenience and speed of computation.

You'll often work with extremely large datasets, and thus it is important point for you to understand how much computation time (and memory) you can save using numpy, compared to standard python lists.

In [40]:
c = range(100000)
%timeit [i**3 for i in c]


56.7 ms ± 672 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [39]:
l =[1,5,8,9]
[i*9 for i in l]

[9, 45, 72, 81]

In [42]:
c_numpy = np.arange(10000)
%timeit c_numpy**3

28.6 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Still not convinced? want to see one more intresting example

In [22]:
l1 = range(10000)
l2 = [i**2 for i in range(10000)]

In [41]:
a = np.arange(4)
a*3

array([0, 3, 6, 9])

In [23]:
%timeit list(map(lambda x, y: x*y, l1, l2))

2.19 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [24]:
a1 = np.array(l1)
b1 = np.array(l2)

In [25]:
%timeit a1*b1

8.39 µs ± 177 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [28]:
a1

array([       0,        1,        4, ..., 99940009, 99960004, 99980001])

In [29]:
a1 * a1

array([       0,        1,        4, ..., 99940009, 99960004, 99980001])

so I can do everything without even writing a loop? yes... ohh wao

# Creating Numpy array

There are multiple ways to create numpy array. Lets walk over them

In [105]:
np.arange(2,12)

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [106]:
np.arange(2,12,2)

array([ 2,  4,  6,  8, 10])

In [43]:
np.zeros((4,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [44]:
np.ones((4,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [45]:
np.eye(8)

array([[1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1.]])

In [48]:
np.full((5,4),9.09)

array([[9.09, 9.09, 9.09, 9.09],
       [9.09, 9.09, 9.09, 9.09],
       [9.09, 9.09, 9.09, 9.09],
       [9.09, 9.09, 9.09, 9.09],
       [9.09, 9.09, 9.09, 9.09]])

In [49]:
np.full((7,6),8.9, dtype= np.int)

array([[8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8]])

In [50]:
np.diag([7,9,8,1,6,8,5])

array([[7, 0, 0, 0, 0, 0, 0],
       [0, 9, 0, 0, 0, 0, 0],
       [0, 0, 8, 0, 0, 0, 0],
       [0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 6, 0, 0],
       [0, 0, 0, 0, 0, 8, 0],
       [0, 0, 0, 0, 0, 0, 5]])

In [52]:
v = np.array([1,2,3])
np.tile(v,(4,5)) # stack 3 copies of v on top of each other

array([[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
       [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
       [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
       [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]])

In [55]:

# between 0 and 1
np.random.random()

0.18607202563237057

In [95]:
# so let say I want a random value between 2 and 50
50*np.random.random()

24.602837644883174

In [130]:
np.random.random([3,3])

array([[0.05549042, 0.96330508, 0.95799428],
       [0.93257085, 0.15308321, 0.95919652],
       [0.10933166, 0.48792935, 0.3782894 ]])

In [2]:
# 100 values between 1 and 50
a = np.linspace(1,50,100)

NameError: name 'np' is not defined

In [52]:
a

array([  1.        ,   1.49494949,   1.98989899,   2.48484848,
         2.97979798,   3.47474747,   3.96969697,   4.46464646,
         4.95959596,   5.45454545,   5.94949495,   6.44444444,
         6.93939394,   7.43434343,   7.92929293,   8.42424242,
         8.91919192,   9.41414141,   9.90909091,  10.4040404 ,
        10.8989899 ,  11.39393939,  11.88888889,  12.38383838,
        12.87878788,  13.37373737,  13.86868687,  14.36363636,
        14.85858586,  15.35353535,  15.84848485,  16.34343434,
        16.83838384,  17.33333333,  17.82828283,  18.32323232,
        18.81818182,  19.31313131,  19.80808081,  20.3030303 ,
        20.7979798 ,  21.29292929,  21.78787879,  22.28282828,
        22.77777778,  23.27272727,  23.76767677,  24.26262626,
        24.75757576,  25.25252525,  25.74747475,  26.24242424,
        26.73737374,  27.23232323,  27.72727273,  28.22222222,
        28.71717172,  29.21212121,  29.70707071,  30.2020202 ,
        30.6969697 ,  31.19191919,  31.68686869,  32.18

In [1]:
#memory used by each array element in bytes
a.itemsize
#a.shape


NameError: name 'a' is not defined

In [101]:
np.arange(24)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

In [133]:
np.arange(18).reshape(2,3,-1)

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]]])

In [30]:
# -1 will automatically adjust dimention
np.arange(18).reshape(2,3,-1)

array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]]])

# accessing Numpy array element

In [149]:
a = np.array([2,4,6,8,10,12,14,16])


In [140]:
a

array([ 2,  4,  6,  8, 10, 12, 14, 16])

In [141]:
a[2]


6

In [142]:
a[[2,4]]

array([ 6, 10])

In [143]:
a[2:]

array([ 6,  8, 10, 12, 14, 16])

In [144]:
a[2:5]

array([ 6,  8, 10])

In [151]:
a[1::2]

array([ 4,  8, 12, 16])

Lets check the same for 2 D array

In [152]:
a = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [148]:
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [124]:
a[2,2]

9

In [127]:
a > 2

array([[False, False,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

In [128]:
a[a > 2]

array([3, 4, 5, 6, 7, 8, 9])

In [129]:
a[(a > 2) & (a < 5)]

array([3, 4])

# subset of numpy array

In [31]:
a = np.arange(10)

In [32]:
b = a

In [33]:
b

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [34]:
b[0] = 11

In [35]:
b

array([11,  1,  2,  3,  4,  5,  6,  7,  8,  9])

In [36]:
# Notice a is also changed
a

array([11,  1,  2,  3,  4,  5,  6,  7,  8,  9])

In [37]:
np.shares_memory(a,b)

True

In [38]:
a = np.arange(10)

In [39]:
b = a.copy()

In [40]:
b[0] = 11

In [41]:
b

array([11,  1,  2,  3,  4,  5,  6,  7,  8,  9])

In [42]:
np.shares_memory(a,b)

False

In [43]:
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

# More operations

In [44]:
a = np.array([[1,2,3],[4,5,6]])

In [45]:
a


array([[1, 2, 3],
       [4, 5, 6]])

In [46]:
a.T

array([[1, 4],
       [2, 5],
       [3, 6]])

In [47]:
b = np.array([[7,8,9],[10,11,12]])

In [48]:
a

array([[1, 2, 3],
       [4, 5, 6]])

In [49]:
b

array([[ 7,  8,  9],
       [10, 11, 12]])

In [50]:
a == b

array([[False, False, False],
       [False, False, False]])

In [51]:
np.vstack((a,b))

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [52]:
np.hstack((a,b))

array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

# MAthmatical operation

In [190]:
a = np.arange(1,10)

In [131]:
np.sin(a)

array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 , -0.95892427,
       -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

In [132]:
np.cos(a)

array([ 0.54030231, -0.41614684, -0.9899925 , -0.65364362,  0.28366219,
        0.96017029,  0.75390225, -0.14550003, -0.91113026])

In [133]:
np.exp(a)

array([  2.71828183e+00,   7.38905610e+00,   2.00855369e+01,
         5.45981500e+01,   1.48413159e+02,   4.03428793e+02,
         1.09663316e+03,   2.98095799e+03,   8.10308393e+03])

In [191]:
np.sum(a)

45

In [193]:
np.median(a)

5.0

In [194]:
a.std()

2.5819888974716112

In [54]:
a = np.arange(1,10).reshape(3,3)
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [56]:
np.linalg.det(a)
help(np.linalg.det)

Help on function det in module numpy.linalg.linalg:

det(a)
    Compute the determinant of an array.
    
    Parameters
    ----------
    a : (..., M, M) array_like
        Input array to compute determinants for.
    
    Returns
    -------
    det : (...) array_like
        Determinant of `a`.
    
    See Also
    --------
    slogdet : Another way to represent the determinant, more suitable
      for large matrices where underflow/overflow may occur.
    
    Notes
    -----
    
    .. versionadded:: 1.8.0
    
    Broadcasting rules apply, see the `numpy.linalg` documentation for
    details.
    
    The determinant is computed via LU factorization using the LAPACK
    routine z/dgetrf.
    
    Examples
    --------
    The determinant of a 2-D array [[a, b], [c, d]] is ad - bc:
    
    >>> a = np.array([[1, 2], [3, 4]])
    >>> np.linalg.det(a)
    -2.0
    
    Computing determinants for a stack of matrices:
    
    >>> a = np.array([ [[1, 2], [3, 4]], [[1, 2], [2, 1]], 

In [58]:
np.linalg.inv(a)
help(np.linalg.inv)

Help on function inv in module numpy.linalg.linalg:

inv(a)
    Compute the (multiplicative) inverse of a matrix.
    
    Given a square matrix `a`, return the matrix `ainv` satisfying
    ``dot(a, ainv) = dot(ainv, a) = eye(a.shape[0])``.
    
    Parameters
    ----------
    a : (..., M, M) array_like
        Matrix to be inverted.
    
    Returns
    -------
    ainv : (..., M, M) ndarray or matrix
        (Multiplicative) inverse of the matrix `a`.
    
    Raises
    ------
    LinAlgError
        If `a` is not square or inversion fails.
    
    Notes
    -----
    
    .. versionadded:: 1.8.0
    
    Broadcasting rules apply, see the `numpy.linalg` documentation for
    details.
    
    Examples
    --------
    >>> from numpy.linalg import inv
    >>> a = np.array([[1., 2.], [3., 4.]])
    >>> ainv = inv(a)
    >>> np.allclose(np.dot(a, ainv), np.eye(2))
    True
    >>> np.allclose(np.dot(ainv, a), np.eye(2))
    True
    
    If a is a matrix object, then the return valu

In [61]:
np.linalg.eig(a)
help(np.linalg.eig)

Help on function eig in module numpy.linalg.linalg:

eig(a)
    Compute the eigenvalues and right eigenvectors of a square array.
    
    Parameters
    ----------
    a : (..., M, M) array
        Matrices for which the eigenvalues and right eigenvectors will
        be computed
    
    Returns
    -------
    w : (..., M) array
        The eigenvalues, each repeated according to its multiplicity.
        The eigenvalues are not necessarily ordered. The resulting
        array will be of complex type, unless the imaginary part is
        zero in which case it will be cast to a real type. When `a`
        is real the resulting eigenvalues will be real (0 imaginary
        part) or occur in conjugate pairs
    
    v : (..., M, M) array
        The normalized (unit "length") eigenvectors, such that the
        column ``v[:,i]`` is the eigenvector corresponding to the
        eigenvalue ``w[i]``.
    
    Raises
    ------
    LinAlgError
        If the eigenvalue computation does not 

In [157]:
a

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [158]:
b = a.T

In [159]:
b

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [160]:
np.dot(a,b)


array([[ 14,  32,  50],
       [ 32,  77, 122],
       [ 50, 122, 194]])

In [187]:
a = np.array([1,1,0], dtype = bool)
b = np.array([1,0,1], dtype = bool)


In [176]:
np.logical_or(a,b)

array([ True,  True,  True], dtype=bool)

In [188]:
np.logical_and(a,b)

array([ True, False, False], dtype=bool)

In [189]:
np.all(a == a)

True

In [179]:
a = np.array([[1,2],[3,4]])

In [180]:
a


array([[1, 2],
       [3, 4]])

In [181]:
a.sum()

10

In [182]:
a.sum(axis=0)

array([4, 6])

In [183]:
a.sum(axis=1)

array([3, 7])

In [185]:
a.max()

4

In [186]:
a.argmax()

3

In [195]:
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [197]:
a.shape

(9,)

In [200]:
a[:,np.newaxis].shape # adds a new axis -> 2D

(9, 1)

In [201]:
np.sort(a)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [202]:
np.argsort(a)

array([0, 1, 2, 3, 4, 5, 6, 7, 8])