# Week 2: NumPy Mastery

*notebook for classes: January 8th to January 10th, 2024*

Welcome to Week 2 of our Data Analysis with Python course! As we dig deeper into the intricacies of this powerful library, be prepared for a week filled with insightful exploration and hands-on learning.

**Let's connect** 
> **Course Instructor:** Zartashia Afzal: https://www.linkedin.com/in/zartashiaafzal/
>
> 
> **Moderators:**
> Muhammad Qasim Ali: https://www.linkedin.com/in/muhammad-qasim-ali/
>
> Ayesha Mehboob: https://www.linkedin.com/in/ayesha-mehboob-379643284/
  


# Table of content

1- Array dimensionality

2- Basic indexing and slicing

3- Boolean indexing

4- Fancy indexing

# Array dimensionality
Sometimes mathematicians refer terms vector, matrix and tensor for arrays of NumPy.

> 1 D Array is also  called as Vector 

> 2 D Array is also called as Matrix

> 3 D Array is also called as Tensor

4 dimensional array is infact a 2D array of 3D arrays

## Basic indexing and slicing


In [None]:
arr = np.arange(15)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [None]:
arr[8]

8

In [None]:
arr[4:6]

array([4, 5])

## Broadcasting and Array Slices in NumPy

### 1. Broadcasting Scalar Values

In NumPy, the assignment of a scalar value to a slice results in the scalar being broadcasted across the entire selected slice. Broadcasting is a powerful feature that simplifies operations on arrays, allowing for efficient element-wise computations without the need for explicit loops.

### 2. Views vs. Copies: Lists vs. ndarrays

A key distinction between Python lists and NumPy ndarrays lies in the behavior of array slices. For ndarrays, slices act as views to the original arrays, avoiding unnecessary data copying. This means that modifications made to a view impact the underlying original array. In contrast, when slicing lists, a new list is created with a copy of the selected elements.

### 3. Efficient Operations and Memory Management

- **Broadcasting Efficiency:** Broadcasting enhances the efficiency of array operations by eliminating the need for redundant copies and explicit looping.

- **View-Based Slices:** The view-based nature of array slices in NumPy contributes to memory efficiency, as modifications to slices directly affect the original array without creating additional copies.

Understanding these features is crucial for efficient and memory-conscious data manipulation in NumPy. While broadcasting simplifies array operations, users should be mindful of the view-based behavior of slices to ensure desired outcomes when modifying array elements.


In [None]:
arr[5:7] = 12

In [None]:
arr

array([ 0,  1,  2,  3,  4, 12, 12,  7,  8,  9, 10, 11, 12, 13, 14])

>   In NumPy, a "bare" slice, denoted by `[:]`, is a concise notation that allows for assigning values to all elements in an array. This simple syntax provides a convenient way to perform operations on the entire array without specifying individual indices.

In [None]:
arr[:] = 23
arr

array([23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23])

>   NumPy's design prioritizes efficient handling of very large arrays.It modifies the original array instead of making a new copy for new changes. Restricting the automatic copying to array slices is a deliberate choice aimed at avoiding potential performance and memory issues. In NumPy, if there is a need to create a copy, it requires explicit mention to ensure intentional and controlled memory usage.


In [None]:
arr_copy = arr[3:6].copy()



In [None]:
arr_copy

array([23, 23, 23])

In [None]:
arr_copy[:] = 10

In [None]:
arr_copy

array([10, 10, 10])

In [None]:
arr # Now in this case when changes were made in copy the original array will remain the same

array([23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23])

### Accessing Elements in Higher-Dimensional Arrays

1. **Array Elements as Arrays:**
   - In higher-dimensional arrays, elements at each index are arrays rather than scalars.

2. **Accessing Individual Elements:**
   - Individual elements can be accessed either recursively or through a comma-separated list.

3. **Understanding Axes:**
   - Axis 0 represents rows, and axis 1 represents columns in the context of array operations.


In [None]:
arr2d  = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2d[1]

array([4, 5, 6])

In [None]:
arr2d[0][2]# Accessing the element at row index 0 and column index 2 in a 2D array (arr2d)


3

In [None]:
arr2d[0,2]

3

In [None]:
arr2d[1,2]

6

### Omitting Lower Indices and Assigning Values to Higher-Dimensional Arrays

1. **Retrieving Values Without Lower Indices**

When lower indices are omitted, all values present in the higher index will be retrieved. This provides a concise way to access and display a subset of data without specifying lower indices explicitly.

2. **Assigning Scalar and Array Values to Higher-Dimensional Arrays**

Higher-dimensional arrays in NumPy support the assignment of both scalar and array values. This flexibility allows for efficient manipulation and updating of data in arrays of varying dimensions.


In [None]:
arr2d[0]

array([1, 2, 3])

In [None]:
# Create a copy of the second row of the 2D array 'arr2d'
arr_copy = arr2d[1].copy()

# Update the entire first row of 'arr2d' with a new value (12)
arr2d[0] = 12

# Display the modified 2D array 'arr2d'
arr2d


array([[12, 12, 12],
       [ 4,  5,  6],
       [ 7,  8,  9]])

In [None]:
arr2d[0]=arr_copy
arr2d

array([[4, 5, 6],
       [4, 5, 6],
       [7, 8, 9]])

### Indexing with Slices
Just like Python lists, ndarrays can also be sliced in the same format.

In [None]:
arr

array([23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23])

In [None]:
arr[1:6]

array([23, 23, 23, 23, 23])

#### Slicing higher dimensional arrays is a bit different.
#### A slice selects elements from the axis 0 or rows.
#### To select multiple axis elements, we need to pass multiple slices.

![image.png](attachment:image.png)

In [None]:
arr2d

array([[4, 5, 6],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
arr2d[:2]     # select the first two rows of arr2d.

array([[4, 5, 6],
       [4, 5, 6]])

In [None]:
arr2d

array([[4, 5, 6],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
arr2d[:2, 1:]  # pass multiple slices just like you can pass multiple indexes
# When slicing like this, you always obtain array views of the same number of dimen‐
# sions. 

array([[5, 6],
       [5, 6]])

**:2** in the first dimension: This specifies that you want to include rows from the beginning up to (but not including) index 2. It effectively selects the first two rows.

**1:** in the second dimension: This specifies that you want to include columns from index 1 onwards. It effectively selects columns starting from the second column (index 1) to the end of the array.

#### Slicing will always get you array views of the higher arrays.
#### By mixing the integer indexes and slices, you can get lower arrays.

In [None]:
arr2d

array([[4, 5, 6],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
arr2d[1:, :3]

array([[4, 5, 6],
       [7, 8, 9]])

In [None]:
arr2d[1,:3]

array([4, 5, 6])

In [None]:
arr2d[:3,2]

array([6, 6, 9])

#### Assigning single value to a slice changes all the elements in the slice

In [None]:
arr2d[:2,2] = 0
arr2d

array([[4, 5, 0],
       [4, 5, 0],
       [7, 8, 9]])

![image.png](attachment:image.png)

# Boolean Indexing

Boolean indexing in NumPy involves vectorized comparisons with arrays, resulting in a Boolean array. This Boolean array can be used for indexing the original array.


In [None]:
names = np.array(['Mahnoor', 'Ayesha', 'Will', 'Mahnoor', 'Will', 'Ayesha', 'Ayesha'])
names =='Mahnoor'

array([ True, False, False,  True, False, False, False])

In [None]:
names[names != 'Mahnoor']

array(['Ayesha', 'Will', 'Will', 'Ayesha', 'Ayesha'], dtype='<U7')

In [None]:
names

array(['Mahnoor', 'Ayesha', 'Will', 'Mahnoor', 'Will', 'Ayesha', 'Ayesha'],
      dtype='<U7')

In [None]:
data = np.random.randn(7,4) # name arrays has seven names in it
data

array([[-0.3843114 ,  0.36393136, -0.14186255,  0.39032766],
       [ 0.45071762, -0.87970416, -0.11478368,  0.29576847],
       [-0.73863148, -0.60716899,  0.41679211, -1.08330498],
       [-0.51108175, -0.34601297,  1.17651688,  0.87251259],
       [-1.16616224, -2.31543321, -0.05657731,  1.33964592],
       [-0.93537209, -0.10719635,  1.06200373, -0.54938915],
       [ 0.17555208,  0.49760441,  2.15389122,  0.72504489]])

In [None]:
data[names=='Mahnoor']

array([[-0.3843114 ,  0.36393136, -0.14186255,  0.39032766],
       [-0.51108175, -0.34601297,  1.17651688,  0.87251259]])

### Boolean Indexing in NumPy

When using boolean indexing in NumPy, it's crucial to note that the boolean arrays employed for indexing must have the same length as the array they are indexing. Failure to adhere to this length requirement might not result in an immediate error, making it necessary to exercise caution when utilizing this feature.

Key points:

1. **Length Consistency:** Ensure that boolean arrays used for indexing have the same length as the array being indexed.

2. **Error Handling:** The boolean selection will not raise an error if the boolean array's length is incorrect. Be cautious to avoid unintended outcomes.

3. **Mixing with Slices and Integers:** NumPy allows the mixing of boolean arrays with slices or integers, providing flexibility in indexing operations.


In [None]:
data[names == 'Mahnoor', 2:]

array([[-0.14186255,  0.39032766],
       [ 1.17651688,  0.87251259]])

In [None]:
names

array(['Mahnoor', 'Ayesha', 'Will', 'Mahnoor', 'Will', 'Ayesha', 'Ayesha'],
      dtype='<U7')

In [None]:
data

array([[-0.3843114 ,  0.36393136, -0.14186255,  0.39032766],
       [ 0.45071762, -0.87970416, -0.11478368,  0.29576847],
       [-0.73863148, -0.60716899,  0.41679211, -1.08330498],
       [-0.51108175, -0.34601297,  1.17651688,  0.87251259],
       [-1.16616224, -2.31543321, -0.05657731,  1.33964592],
       [-0.93537209, -0.10719635,  1.06200373, -0.54938915],
       [ 0.17555208,  0.49760441,  2.15389122,  0.72504489]])

In [None]:
data[names == 'Mahnoor', 3]

array([0.39032766, 0.87251259])



1. **Selection Outside a Condition:**
   - Use the `!=` operator or negate the condition using `~` to select elements outside a specified condition.

2. **Negation with the ~ Operator:**
   - The `~` operator is useful for negating an existing general condition, providing a straightforward way to invert the selection.

3. **Combining Multiple Conditions:**
   - Employ arithmetic operators `&` and `|` to combine multiple conditions. Note that standard Python 'and' and 'or' do not work in this context.

4. **Boolean Indexing and Data Copy:**
   - Be aware that boolean indexing always results in a copy of the data, even if there are no modifications. This behavior ensures data integrity during indexing operations.


>   The `~ operator` in Python is called the `bitwise NOT` operator. It performs the bitwise negation of the bits in its operand. When applied to integers, it flips each bit from 0 to 1 and vice versa.

In [None]:
names != 'Mahnoor'

array([False,  True,  True, False,  True,  True,  True])

In [None]:
data[names == 'Mahnoor']

array([[-0.3843114 ,  0.36393136, -0.14186255,  0.39032766],
       [-0.51108175, -0.34601297,  1.17651688,  0.87251259]])

In [None]:
data[~(names == 'Mahnoor')]

# The expression `(names == 'Mahnoor')` creates a boolean mask where True corresponds to the condition being satisfied.

# The tilde (~) operator is used to negate the boolean values in the mask.
# Therefore, `~(names == 'Mahnoor')` results in a boolean mask where True corresponds to elements NOT equal to 'Mahnoor'.

# Finally, this boolean mask is used to index the 'data', selecting only those rows where the corresponding 'names' are NOT equal to 'Mahnoor'.


array([[ 0.45071762, -0.87970416, -0.11478368,  0.29576847],
       [-0.73863148, -0.60716899,  0.41679211, -1.08330498],
       [-1.16616224, -2.31543321, -0.05657731,  1.33964592],
       [-0.93537209, -0.10719635,  1.06200373, -0.54938915],
       [ 0.17555208,  0.49760441,  2.15389122,  0.72504489]])

In [None]:
cond = names == 'Mahnoor'
data[~cond]

array([[ 0.45071762, -0.87970416, -0.11478368,  0.29576847],
       [-0.73863148, -0.60716899,  0.41679211, -1.08330498],
       [-1.16616224, -2.31543321, -0.05657731,  1.33964592],
       [-0.93537209, -0.10719635,  1.06200373, -0.54938915],
       [ 0.17555208,  0.49760441,  2.15389122,  0.72504489]])

In [None]:
mask = (names == 'Mahnoor') | (names == 'Will')
mask

array([ True, False,  True,  True,  True, False, False])

In [None]:
data[mask]

array([[-0.3843114 ,  0.36393136, -0.14186255,  0.39032766],
       [-0.73863148, -0.60716899,  0.41679211, -1.08330498],
       [-0.51108175, -0.34601297,  1.17651688,  0.87251259],
       [-1.16616224, -2.31543321, -0.05657731,  1.33964592]])


> Setting whole rows and columns using one-dimensional boolean array is also easy.

In [None]:
data[data < 0] = 0  
#To set all of the negative values in data to 0 data

In [None]:
names

array(['Mahnoor', 'Ayesha', 'Will', 'Mahnoor', 'Will', 'Ayesha', 'Ayesha'],
      dtype='<U7')

In [None]:
data[names != 'Ayesha'] = 7
data

array([[7.        , 7.        , 7.        , 7.        ],
       [0.45071762, 0.        , 0.        , 0.29576847],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [7.        , 7.        , 7.        , 7.        ],
       [0.        , 0.        , 1.06200373, 0.        ],
       [0.17555208, 0.49760441, 2.15389122, 0.72504489]])

## Fancy Indexing

### 1. Definition
Fancy Indexing refers to the technique of indexing using integer arrays.

### 2. Selecting Subset of Rows
It allows us to choose a subset of rows in a specific order by utilizing integer arrays.

### 3. Negative Indices
When using Fancy Indexing, negative indices can be employed to select rows from the end of the array.


In [None]:
# Create an empty NumPy array with shape (8, 4)
arr = np.empty((8, 4))

# Iterate through each row (axis 0) in the array
for i in range(8):
    # Assign the value of 'i' to each element in the current row
    arr[i] = i

# Display the resulting array after the loop
arr


# The loop is added to populate each row of the NumPy array with a specific value. 
# In this case, the loop iterates through the rows of the array, and for each row, it assigns the row index (i) as 
# the value for all elements in that row.
#  This creates a pattern where each row contains the same value as its row index.

# Without the loop, you would end up with an array where each row has the same values.
#  The loop ensures that each row gets a unique set of values based on its index. 
# The resulting array has a pattern where the values in each row correspond to the row index.

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [None]:
arr[[4,3,0,2]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [2., 2., 2., 2.]])

In [None]:
arr[[-3,-5,-7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

### Exploring Multiple Index Arrays in NumPy

**1. Selecting 1D Arrays with Multiple Indices**

Passing multiple index arrays in NumPy allows the selection of a 1D array where each element corresponds to a tuple of respective indices. For instance, given index arrays `[[1,5,7,2],[0,3,1,2]]`, the selected elements would be (1,0), (5,3), (7,1), and (2,2).

**2. Obtaining a 1D Array from Multiple Indices**

When using multiple indices, the result is always a 1D array.

**3. Obtaining a Rectangular Subset**

To obtain a rectangular subset of data, these indices can be used as a subset.

**Note That**

>   Fancy Indexing, involving the use of multiple indices, always results in a copy of the data. This is a key consideration for memory management and data integrity.


In [None]:
arr = np.arange(32).reshape((8,4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [None]:
arr[[1,5,7,2],[0,3,1,2]]

array([ 4, 23, 29, 10])

In this case, the selection comprised elements `(1, 0), (5, 3), (7, 1), and (2, 2)`. Regardless of the array's dimensionality, which, in this instance, is two, the outcome of fancy indexing consistently yields a one-dimensional result

In [None]:
arr[[1,5,7,2]][:,[0,3,1,2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]] selects specific rows from the original array arr and then reorders the columns of those selected rows based on the specified indices. The final result is a new array with a subset of rows and columns from the original array.

### Transposing and Swapping Axes

**1. Transposing - Special Form of Reshaping**

Transposing is a special form of reshaping in NumPy. ndarrays provide a 'transpose' method and a special 'T' attribute for this purpose.

**2. Swapping Axes - 'T' and 'swapaxes' Method**

- 'T' is a shorthand for transposing and represents a special case of swapping axes.
- Numpy also offers the 'swapaxes' method, allowing the switching of indicated axes by providing a pair of axis numbers.

**3. Dot Products with np.dot**

For computing dot products of arrays, the np.dot function is commonly used. It efficiently performs the dot product operation.

**4. Handling Higher Arrays**

When dealing with higher-dimensional arrays, the 'transpose' method can accept a tuple of axis numbers to permute the axes accordingly.

**5. Return Type - Views, Not Copies**

It's important to note that all these methods, including 'transpose' and 'swapaxes', return a view of the array and do not create a new copy. This behavior contributes to memory efficiency.


In [None]:
arr = np.arange(15).reshape((3,5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [None]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])