# Numpy - Getting Started
This is a comprehensive starter for learning numpy and doing basic matrix multiplications.

By default, python numbers are very large and takes a lot of space.

In [1]:
2**1000

10715086071862673209484250490600018105614048117055336074437503883703510511249361224931983788156958581275946729175531468251871452856923140435984577574698574803934567774824230985421074605062371141877954182153046474983581941267398767559165543946077062914571196477686542167660429831652624386837205668069376

That is unnecessary in terms of space and time and hence we need standard 32- or 64-bit integers, which would suffice for our normal numerical calculations.

In [2]:
import numpy as np

In [3]:
np.int64(2)**8

256

In [4]:
np.int64(2)**100

0

In [5]:
np.int64(2)**1000

0

We see that bigger numbers overflow and return zero. Lets create a numpy array. Notice the defautlt datatype.

In [6]:
arr = np.array([1,2,3])

In [7]:
type(arr)

numpy.ndarray

In [8]:
type(arr[0])

numpy.int64

In [9]:
arr*arr

array([1, 4, 9])

Lets put our money on the fast matrix calculations done by the optimized code. Lets do a product of two numpy arrays of million integers each and time how much time is required to compute.

In [11]:
v1 = np.random.rand(1000000)
v2 = np.random.rand(1000000)

In [12]:
%time v1*v2

CPU times: user 4.26 ms, sys: 9.53 ms, total: 13.8 ms
Wall time: 94.4 ms


array([0.06887621, 0.40737216, 0.22646868, ..., 0.0333491 , 0.29314552,
       0.07013759])

Now that's impressive. In normal use cases, it takes about 900 to 1500 milliseconds max to do this. I am running this on the cloud. So, times are this much impressive.

In [13]:
np.dot(arr,arr)

14

That was the dot product. Alternately we can use the @ symbol.

In [14]:
arr @ arr

14

Arange function gives an array with a range.

In [15]:
v= np.arange(12)


In [16]:
v

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [17]:
v.shape

(12,)

Lets do some matrix operations

In [18]:
mat = np.arange(12).reshape(4,3)
mat

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

In [19]:
mat.shape

(4, 3)

In [20]:
mat.T

array([[ 0,  3,  6,  9],
       [ 1,  4,  7, 10],
       [ 2,  5,  8, 11]])

In [21]:
mat[1,:]

array([3, 4, 5])

In [22]:
mat[1:,2:]=7

In [23]:
mat


array([[ 0,  1,  2],
       [ 3,  4,  7],
       [ 6,  7,  7],
       [ 9, 10,  7]])

In [24]:
mat[1:,1:]

array([[ 4,  7],
       [ 7,  7],
       [10,  7]])

In [25]:
mat[1:,:]

array([[ 3,  4,  7],
       [ 6,  7,  7],
       [ 9, 10,  7]])

In [26]:
mat[1:,1]

array([ 4,  7, 10])

In [27]:
arr = np.arange(16).reshape(4,4)
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

## For every row, take the second and the third elements
The first colon in row position represents "Take all elements in the row".

The column position represents a sliced index. 1:3 represents the second and third elements. Note that the fourth element at the index position 3 is not considered, as in python.

In [28]:
arr[:,1:3]

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10],
       [13, 14]])

## For every row, take every element from third element till last
As in the example above only giving a colon : symbol means all elements for the row or column

In [29]:
arr[2:,:]

array([[ 8,  9, 10, 11],
       [12, 13, 14, 15]])

### For every row take every element form third element till last and same for every column.
The elements selected are a common subset. Since this is a 4X4 array, the elements 10, 11, 14 and 15 are selected.

In [30]:
arr[2:,2:]

array([[10, 11],
       [14, 15]])

Assigning a single element to parts of an array is called broadcasting. This is a common technique used to normalise parts of an array.

In [31]:
arr[2:,2:] = 0
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9,  0,  0],
       [12, 13,  0,  0]])

# Boolean Indexing

In [32]:
arr = np.arange(12).reshape(3,4)
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [33]:
arr >7

array([[False, False, False, False],
       [False, False, False, False],
       [ True,  True,  True,  True]])

In [34]:
arr[arr>7]

array([ 8,  9, 10, 11])

In [35]:
arr[(arr>2)&(arr<7)]

array([3, 4, 5, 6])

In [36]:
arr[(arr<2)|(arr>7)]

array([ 0,  1,  8,  9, 10, 11])

In [37]:
arr[~(arr >5)]

array([0, 1, 2, 3, 4, 5])

In [38]:
arr.mean()

5.5

In [39]:
arr.std()

3.452052529534663

# Broadcasting

In [40]:
ar = np.arange(3)

In [41]:
ar

array([0, 1, 2])

Multiplying two elements will multiply corresponding elements in position not give a dot product.

In [42]:
ar * ar

array([0, 1, 4])

In [43]:
ar **2

array([0, 1, 4])

In [44]:
ar /7

array([0.        , 0.14285714, 0.28571429])

In [45]:
ar +4

array([4, 5, 6])

In [46]:
v1 = np.arange(3).reshape(3,1)
v1

array([[0],
       [1],
       [2]])

## Numpy will extend the vectors whenever there is a addition required.
Here we used a column vector and added it to a single row array of value \[0,1,2\]
So numpy took the vector and extended the missing elements for each element in the row of ar.
* So first element in the first row was 0, and as a result numpy added zero to each element of the vector. So we get original values in first column
* The second element in the first row was 1, and as a result numpy added 1 to each element of the vector so we get \[\[1,2,3\]\] as the column.

In [47]:
ar + v1

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

# Array operations

In [48]:
v = np.arange(9).reshape(3,3)
v

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

A quick way to find if any zero elements exist in the array.

In [49]:
v.prod()

0

Generally the functions will give the result for the entire array or matrix.

In [50]:
v.sum()

36

But if we mention the axis, we will get a new array with row or column wise computation. axis = 0 means columns wise computation. axis = 1 means row wise computation for a 2D matrix.

In [51]:
v.sum(axis=0)

array([ 9, 12, 15])

In [52]:
v2 = v.copy()

In [53]:
v2[1,1] = 100

In [54]:
v

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [55]:
v2

array([[  0,   1,   2],
       [  3, 100,   5],
       [  6,   7,   8]])

### Dumping and loading data to and from arrays.
The dumps() and loads() methods use the pickle protocol of python.

In [56]:
data = v.dumps()

In [57]:
data

b'\x80\x02cnumpy.core.multiarray\n_reconstruct\nq\x00cnumpy\nndarray\nq\x01K\x00\x85q\x02c_codecs\nencode\nq\x03X\x01\x00\x00\x00bq\x04X\x06\x00\x00\x00latin1q\x05\x86q\x06Rq\x07\x87q\x08Rq\t(K\x01K\x03K\x03\x86q\ncnumpy\ndtype\nq\x0bX\x02\x00\x00\x00i8q\x0cK\x00K\x01\x87q\rRq\x0e(K\x03X\x01\x00\x00\x00<q\x0fNNNJ\xff\xff\xff\xffJ\xff\xff\xff\xffK\x00tq\x10b\x89h\x03XH\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x05\x00\x00\x00\x00\x00\x00\x00\x06\x00\x00\x00\x00\x00\x00\x00\x07\x00\x00\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\x00\x00q\x11h\x05\x86q\x12Rq\x13tq\x14b.'

In [59]:
v3 = np.loads(data)
v3

  """Entry point for launching an IPython kernel.


array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])