# Limitations of Lists

You have already seen what lists are and what you can do with them. To quickly recap, a list is a basic Python data structure which can hold multiple values of multiple data types such as integers, floats, strings, and so on. 

In this example, we have two lists “distance” and “speed.” Each list holds four different readings. These readings correspond to each other. 

In [1]:
distance = [100, 15, 900, 60]
speed = [20, 0.75, 90, 5]

In [2]:
time = distance/speed
time

# TypeError: unsupported operand type(s) for /: 'list' and 'list'

TypeError: unsupported operand type(s) for /: 'list' and 'list'

In [3]:
def DivList(L1, L2):
    l1 = len(L1)
    l2 = len(L2)
    if l1 is not l2:
        print ("Error can not multiply two lists")
    for i in range(0, l1):
        L3.append(L1[i] / L2[i])
    return L3

distance = [100, 15, 900, 60]
speed = [20, 0.75, 90, 5]
L3 = []
DivList(distance, speed)

[5.0, 20.0, 10.0, 12.0]

Using these lists, if you try to calculate the time, which is equal to distance over time, it will give you an error. That’s because mathematical functions can’t be applied over an entire list. Now, let’s see how **NumPy** solves this problem.

**A Python library is a collection of related modules. It contains bundles of code that can be used repeatedly in different programs.**

# Numpy

NumPy is a Python library that supports a data container called arrays. **Arrays are like lists, but they can do something that lists can’t.** They easily allow you to **apply mathematical operations over the entire dataset**. Let’s see how this works. 

First, you need to import the NumPy library. You then need to convert the ‘distance’ and ‘time’ lists into NumPy arrays. Apply the formula for time using these arrays. You can see that NumPy easily generates an output for each of the four readings. 

This property of NumPy makes a Data Scientist’s job much easier, because it helps them to easily manipulate data by applying mathematical functions over a given dataset.

In [4]:
import numpy as np # calling the library with import and giving it an alias

# sqrt = np.sqrt

In [5]:
np_distance = np.array(distance)
np_speed = np.array(speed)

In [6]:
np_distance

array([100,  15, 900,  60])

In [7]:
np_speed

array([20.  ,  0.75, 90.  ,  5.  ])

In [8]:
time = np_distance/np_speed
time

array([ 5., 20., 10., 12.])

### Arrays

- Arrays can be one-dimensional, two-dimensional, or multi-dimensional. 
- The best way to visualize an array is as rows and columns. You can also look at it by its dimensional axes or rank. 
- You can think of a one-dimensional array as a single row of values. So, a one-dimensional array has one axis or rank 1. 
- Two-dimensional arrays can be visualized with rows and columns. This means that it has two axes or rank 2. 
- A three-dimensional array has three axes and can be visualized as a cube which has height, width, and length.
- Similarly, multi-dimensional arrays have multiple axes.

# Creating and Printing an array

In [9]:
# Create a One Dimensional Array == Row

list1 = [1,2,3,4,5,6]
print (list1)
print ()

oneDimExample = np.array(list1)
print (oneDimExample) # array looks just like a list
print (type(oneDimExample)) # numpy.ndarray
oneDimExample

# n-dimensional array

[1, 2, 3, 4, 5, 6]

[1 2 3 4 5 6]
<class 'numpy.ndarray'>


array([1, 2, 3, 4, 5, 6])

In [10]:
# Create a two dimensional array == Matrix

twoDimExample = np.array([[1,2,3],
                         [4,5,6],
                         [7,8,9]])
twoDimExample

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [11]:
# Create a three dimensional array == Cube

arr_3d = np.array([[[1,2,3], [4,5,6], [7,8,9],[10,11,12]], 
                   [[10,20,30], [40,50,60], [70,80,90], [100,110,120]]])
arr_3d

array([[[  1,   2,   3],
        [  4,   5,   6],
        [  7,   8,   9],
        [ 10,  11,  12]],

       [[ 10,  20,  30],
        [ 40,  50,  60],
        [ 70,  80,  90],
        [100, 110, 120]]])

# Class and Attributes of ndarray

Numpy’s array class is “ndarray,” also referred to as “numpy.ndarray.” The attributes of ndarray are:

## dim

ndarray.ndim – This refers to the number of axes or the dimensions of the array, which means that it will indicate whether it is a one-dimensional, two-dimensional. It is also called the rank of the array.

In [12]:
print (oneDimExample)
oneDimExample.ndim

[1 2 3 4 5 6]


1

In [13]:
print (twoDimExample)
twoDimExample.ndim

[[1 2 3]
 [4 5 6]
 [7 8 9]]


2

In [14]:
print (arr_3d)
arr_3d.ndim

[[[  1   2   3]
  [  4   5   6]
  [  7   8   9]
  [ 10  11  12]]

 [[ 10  20  30]
  [ 40  50  60]
  [ 70  80  90]
  [100 110 120]]]


3

## shape

This consists of a tuple of integers showing the size of the array in each dimension. The length of the “shape tuple” is the rank or ndim.

ndarray.shape describes the structure of an array. It consists of a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, the shape will be (n, m). So, the length of the shape tuple is the rank or ndim. 

The shape tuple of both the arrays indicate their size along each dimension.

In [15]:
print (oneDimExample)
oneDimExample.shape

[1 2 3 4 5 6]


(6,)

In [16]:
print (twoDimExample)
twoDimExample.shape

[[1 2 3]
 [4 5 6]
 [7 8 9]]


(3, 3)

In [17]:
print (arr_3d)
arr_3d.shape

# (no. of matrix, no. of rows, no. of cols)

[[[  1   2   3]
  [  4   5   6]
  [  7   8   9]
  [ 10  11  12]]

 [[ 10  20  30]
  [ 40  50  60]
  [ 70  80  90]
  [100 110 120]]]


(2, 4, 3)

## size

It gives the total number of elements in the array. It is equal to the product of the elements of the shape tuple.
ndarray.size indicates the number of elements present in an array. It is equal to the product of the elements of its shape tuple.

In [18]:
print (oneDimExample)
oneDimExample.size

[1 2 3 4 5 6]


6

In [19]:
print (twoDimExample)
twoDimExample.size

[[1 2 3]
 [4 5 6]
 [7 8 9]]


9

In [20]:
print (arr_3d)
arr_3d.size

[[[  1   2   3]
  [  4   5   6]
  [  7   8   9]
  [ 10  11  12]]

 [[ 10  20  30]
  [ 40  50  60]
  [ 70  80  90]
  [100 110 120]]]


24

# Built-in Mathematical Functions

In [21]:
A = [10,15,17,26]
B = [12,11,21,24]

A = np.array(A)
B = np.array(B)

In [22]:
A + B

array([22, 26, 38, 50])

In [23]:
A - B

array([-2,  4, -4,  2])

In [24]:
A * B

array([120, 165, 357, 624])

In [25]:
A / B

array([0.83333333, 1.36363636, 0.80952381, 1.08333333])

In [26]:
A **2 # ** power of

array([100, 225, 289, 676], dtype=int32)

In [27]:
sqrt(A)

NameError: name 'sqrt' is not defined

In [28]:
np.sqrt(A)

array([3.16227766, 3.87298335, 4.12310563, 5.09901951])

In [29]:
print (np.floor([1.2,1.3,4.9,2.6,3.3,5.6])) # nearest lowest integer

[1. 1. 4. 2. 3. 5.]


In [30]:
print (np.round([1.2,1.3,4.9,2.6,3.3,5.6]))

[1. 1. 5. 3. 3. 6.]


In [31]:
print (np.ceil([1.2,1.3,4.9,2.6,3.3,5.6])) # nearest greater integer

[2. 2. 5. 3. 4. 6.]


In [32]:
array_a = np.array([2, 3, 5, 8])
array_a

array([2, 3, 5, 8])

In [33]:
array_d = np.array([2, 3])
array_d

array([2, 3])

In [34]:
array_a + array_d

ValueError: operands could not be broadcast together with shapes (4,) (2,) 

# Broadcasting

NumPy uses broadcasting to carry out arithmetic operations between arrays of different shapes. 

To understand how this works, let’s look at the examples shown here. Array_a and array_b have the same shape, which is one row and four columns. So, to calculate the product of the two arrays, NumPy conducts an element-wise multiplication. 

However, scalar_c is a single scalar value. Its shape doesn’t match with array_a. In this case, NumPy doesn’t have to create copies of the scalar value to multiply it element-wise with the array elements. Instead, it broadcasts the scalar value over the entire array to find the product. This saves memory space as an array takes a lot more memory than a scalar.


Though broadcasting can help carry out mathematical operations between different-shaped arrays, they are subject to certain constraints as listed below:

When NumPy operates on two arrays, it compares their shapes element-wise. It finds these shapes compatible only if:
Their dimensions are the same or one of them has a dimension of size 1.
If these conditions are not met, a "ValueError” is thrown, indicating that the arrays have incompatible shapes.


In [35]:
# [2, 3, 5, 8] * 0.3

In [36]:
array_a = np.array([2, 3, 5, 8])
array_a

array([2, 3, 5, 8])

In [37]:
array_b = np.array([0.3,0.3,0.3,0.3])
array_b

array([0.3, 0.3, 0.3, 0.3])

In [38]:
array_a * array_b

array([0.6, 0.9, 1.5, 2.4])

In [39]:
array_a * 0.3 # it broadcasts the value over the entire array to find the product

array([0.6, 0.9, 1.5, 2.4])

Let’s look at an example to see how broadcasting works:
- The two datasets represent a worker’s earnings over a period of two weeks, excluding weekends.
- The total earning after two weeks is the vector addition where **element-wise arithmetic operation is performed**.
- To calculate the number of hours worked for each day in week one, you need to divide the week one dataset by 15, which is the hourly wage. This arithmetic operation is carried out through **broadcasting**.

In [40]:
np_week_one = np.array([105,135,195,120,165])
np_week_two = np.array([123,156,230,200,147])

In [41]:
# **element-wise arithmetic operation is performed**

total_earning = np_week_one + np_week_two
print (total_earning)

[228 291 425 320 312]


In [42]:
# **broadcasting**

np_week_one_hrs = np_week_one/15    # 15, which is the hourly wage
print (np_week_one_hrs)

[ 7.  9. 13.  8. 11.]
