NumPy is the fundamental package for scientific computing in Python. 
It is a Python library that provides a multidimensional array object. NumPy arrays are datastructures to store multi-dimensional data. They are homogeneous and perform vectorized operations by default.

## **Import numpy as np and see the version**

In [None]:
import numpy as np 

np.__version__  # gives numpy version number

'1.21.6'

## **NumPy Basics**

In [None]:
np.array([1,2,3]) # 1D array

array([1, 2, 3])

In [None]:
np.array([(1,2,3),(4,5,6)]) # 2D array

array([[1, 2, 3],
       [4, 5, 6]])

In [None]:
# np.arrange(start,stop,step)  # range array
np.arange(3,7)

array([3, 4, 5, 6])

In [None]:
np.arange(3,7,2)

array([3, 5])

In [None]:
np.arange(0, 2, 0.4)     # accepts float arguments

array([0. , 0.4, 0.8, 1.2, 1.6])

In [None]:
a = np.arange(15).reshape(3,5)
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [None]:
a.shape   # the dimensions of the array. n*m

(3, 5)

In [None]:
a.ndim  #the number of axes (dimensions) of the array.

2

In [None]:
a.dtype.name   # datatype of the elements in the array. By default, numpy assigns a datatype of 'int64' 

# note: within 'int' four different data types are available: int8, int16, int32, int64

'int64'

In [None]:
a.itemsize  #the size in bytes of each element of the array.

8

In [None]:
arr = np.array([1,2,3,4])  # Each item of this array consumes 64bits = 64/8 = 8 bytes of memory
arr.nbytes                 # thus for 4 items in arr, nbytes= 4 * 8 = 32 bytes

32

In [None]:
a.size    #the total number of elements of the array

15

In [None]:
type(a)  #object type

numpy.ndarray

In [None]:
b = np.array([6, 7, 8])

In [None]:
type(b)

numpy.ndarray

In [None]:
np.iinfo('int32')    # some info about 'int32' like max. and min. value it can hold

iinfo(min=-2147483648, max=2147483647, dtype=int32)

## **Array Creation**

In [None]:
a = np.array([5, 6, 7])
a

array([5, 6, 7])

In [None]:
a.dtype

dtype('int64')

In [None]:
b = np.array([1.5, 3.8, 5.6])

In [None]:
b.dtype   

dtype('float64')

In [None]:
c = np.array([[1, 2], [3, 4]], dtype=complex)    #specifying type of array at the time of creation
c

array([[1.+0.j, 2.+0.j],
       [3.+0.j, 4.+0.j]])

In [None]:
np.zeros((3,4))  # OR: np.zeros([3,4])  # by default, the dtype of the created array is float64 

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [None]:
np.ones((2, 3, 4), dtype=np.int16)   # array full of ones

array([[[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=int16)

In [None]:
np.empty((2, 3))    #creates an array whose initial content is random and depends on the state of the memory

array([[2.35686625e-316, 0.00000000e+000, 0.00000000e+000],
       [0.00000000e+000, 0.00000000e+000, 0.00000000e+000]])

## **Arange VS Linspace**
-`linspace` enables to control the precise end value, whereas `arange` gives more direct control over the increments between values in the sequence.

-the interval is specified for `np.arange()` and the number of elements is specified for `np.linspace()`

In [None]:
from numpy import pi

In [None]:
np.linspace(0, 2, 9)  # 9 numbers from 0 to 2

array([0.  , 0.25, 0.5 , 0.75, 1.  , 1.25, 1.5 , 1.75, 2.  ])

In [None]:
x = np.linspace(0, 2 * pi, 100)   # useful to evaluate function at lots of points
f = np.sin(x)

# **Create a boolean array**

In [None]:
np.full((3, 3), True, dtype=bool)  # a 3×3 numpy array of all True’s

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [None]:
np.ones((3,3), dtype=bool)   # alternate method

array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

# **Create array that contains a mix of both numbers as well as characters and even any python object**

In [None]:
arr = np.array(['x', 'y', 'z', 10, None, [1, 2, 3]], dtype = 'object')   # this is possible because of numpy datatype 'object'
arr

array(['x', 'y', 'z', 10, None, list([1, 2, 3])], dtype=object)

# **Import and Export Data**
**Import Data**:
 The main import methods are:

 1. numpy.loadtxt()
 2. numpy.genfromtext()

### Use np.loadtxt when there is no missing data.

In [None]:
import numpy as np

data = np.loadtxt('Datasets/data.txt', delimiter='\t')
data

In [None]:
data[0, :]   # 1st row and all column

array([ 1.     , 87.     , 57.54435])

In [None]:
type(data)

numpy.ndarray

**When there are missing values, it errors out.**

In [None]:
# data = np.loadtxt('Datasets/data_miss.txt', delimiter='\t')
# data

### In this situation, Use *np.genfromtext()*. It fills in missing data with *nan*.

In [None]:
data = np.genfromtxt('Datasets/data_miss.txt', delimiter='\t')
data

### **CSV File**
Let's try to load a csv file with column names

By default, it takes the dtype as *float*. In such case, the text fields go missing.

In [None]:
data = np.genfromtxt('Datasets/Mall_Customers.csv', delimiter=',')
data

So, explicitly mention the datatype.

In [None]:
# Change dtype and skip header
data = np.genfromtxt('Datasets/Mall_Customers.csv',
                     delimiter = ',',
                     dtype = 'object',
                     skip_header = 1)

data[:5, :]   # first 5 rows with their columns.

The problem with this is, the numbers are identified as *bytes* and not as numbers. So doing mathematical calculation is not easy.

In [None]:
# divide 3rd column by 2nd column. 

# data[:, 3] / data[:, 2]    # ERROR !!

Convert to float and then divide. It works!!

In [None]:
output = data[:, 3].astype('float') / data[:, 2].astype('float')
output[:10]

**Better Way: Define the Data Type and then import**

In [None]:
dt = np.dtype({'names':['CustomerID', 'Genre', 'Age', 'Annual_Income', 'Spending_Score'],
               'formats': [np.int16, 'U16', np.int16, np.int16, np.int16]})

In [None]:
# change dtype and skip header
data = np.genfromtxt('Datasets/Mall_Customers.csv',
                     delimiter = ',',
                     dtype = dt,
                     skip_header = 1)
data[:10]

In [None]:
data.shape

(200,)

In [None]:
data[0]['Age']

19

In [None]:
data[0]['Genre']

'Male'

In [None]:
data['Age']

**Export Data and Load it back**

If it's a single array, save it in *`.npy`* format. If we have multiple arrays to save in same file, use *`.npz`* format.

In [None]:
# Store arrays back to disk

# single array
np.save('TEMP/output.npy', output)

# Multiple arrays: arrays will be saved with names 'arr_0', 'arr_1', ...
np.savez('TEMP/output_data.npz', output, data)

Load it back

In [None]:
# Single array
a = np.load('TEMP/output.npy')
a[:5]

Set `allow_pickle=True` for multidimensionsal arrays.

In [None]:
# Multiple arrays
b = np.load('TEMP/output_data.npz', allow_pickle=True)
b

See the arrays stored in it.

In [None]:
b.files

In [None]:
b['arr_0'][:5]

In [None]:
b['arr_1']