# Summer of Code - Artificial Intelligence
## Week 04: Exploratory Data Analysis
### Day 01: Introduction to Numpy

In this notebook, we will learn about **Numpy Arrays and Common Operations**.

# Installing Numpy (or any other package)
To install Numpy, you can use the following command:
`pip install numpy`

In [18]:
!pip install numpy



## But What is a Package?
A Python package is a collection of related modules (Python files) organized in a directory with an `__init__.py` file.


In [20]:
# Check Installation
import numpy

numpy.__version__

'2.2.6'

In [22]:
import numpy as np
np.__version__

'2.2.6'

# What is NumPy?

- NumPy is the fundamental package for numerical computing in Python.
- Provides the `ndarray` (n-dimensional array) for fast, vectorized operations.
- Written in C under the hood for performance; operations run much faster than pure Python loops.

## NumPy arrays vs Python lists

- Fixed-type, contiguous memory vs. variable-type, object arrays.
- Vectorized operations (element-wise) vs. explicit Python loops.
- Rich slicing, indexing, and memory views.


In [37]:
numbers_list = list(range(1000000))
numbers_array = np.array(numbers_list)

In [38]:
type(numbers_list)

list

In [39]:
type(numbers_array)

numpy.ndarray

In [40]:
import time


time.time()

1758538941.1334672

In [None]:
# Calculating execution time for list operation
start_time = time.time()
sum_of_list = sum(numbers_list)
end_time = time.time()

list_exec_time = end_time - start_time

# Calculating execution time for numpy array operation
start_time = time.time()
sum_of_list = np.sum(numbers_array)
end_time = time.time()

arr_exec_time = end_time - start_time


print(f"List operation: {list_exec_time:.5f}")
print(f"Array operation: {arr_exec_time:.5f}")

List operation: 0.06027
Array operation: 0.00196


# Creating NumPy arrays

You can create arrays from Python lists, or using NumPy's array-creation utilities.


In [44]:
arr = np.array([1, 2, 3])
print(arr)

[1 2 3]


In [45]:
type(arr)

numpy.ndarray

In [46]:
a_list = [1, 2, 3, 4]
arr = np.array(a_list)
arr

array([1, 2, 3, 4])

In [47]:
arr = np.array(range(10))
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [51]:
arr = np.arange(2, 10, 3)
arr

array([2, 5, 8])

In [57]:
# array of zeros (0)
np.zeros((2, 3))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [58]:
np.ones(4)

array([1., 1., 1., 1.])

In [59]:
np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [62]:
np.full((2, 5), 10)

array([[10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10]])

In [63]:
np.full(5, 5)

array([5, 5, 5, 5, 5])

In [65]:
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

## Random arrays

In [117]:
rng = np.random.default_rng()
rng

Generator(PCG64) at 0x2A364D90900

In [138]:
rng.random((2, 3))

array([[0.45568715, 0.77925921, 0.4863272 ],
       [0.81900088, 0.38214844, 0.58665039]])

In [140]:
rng.integers(0, 10, (2, 3))

array([[9, 4, 1],
       [5, 0, 0]])

In [143]:
rng.normal(0.5, 0.5, (2, 3))

array([[ 0.41728445,  0.7541373 , -0.1847521 ],
       [-0.27199782,  0.42009481,  1.05093212]])

## Array attributes and reshaping

Key attributes:
- `ndim`: number of dimensions
- `shape`: size along each dimension
- `size`: total number of elements
- `dtype`: data type of elements

Reshaping changes the view of the data without changing the underlying memory when possible.


In [145]:
arr = np.array([1, 2, 3])
arr.size

3

In [146]:
arr.ndim

1

In [147]:
arr.shape

(3,)

In [148]:
arr.dtype

dtype('int64')

In [150]:
arr = np.array([1, 2, 3], dtype='int8')
arr.dtype

dtype('int8')

In [152]:
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
print(arr_2d)

[[1 2 3]
 [4 5 6]]


In [154]:
arr_2d.size

6

In [155]:
arr_2d.shape

(2, 3)

In [156]:
arr_2d.ndim

2

In [157]:
arr_2d = np.zeros((5, 3))
print(arr_2d)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [158]:
arr_2d.shape

(5, 3)

In [159]:
arr_2d.ndim

2

# Array Operations

## Aggregations and reductions

Common methods:
- `max`: maximum value
- `min` : minimum value
- `argmax` : index of maximum value
- `argmin` : index of minimum value
- `sum` : sum of values
- `mean` : mean (average) of values
- `std` : standard deviation
- `var` : variance

These use the `axis` parameter to compute along rows or columns.


In [165]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([1, 1, 1])


In [166]:
arr1 * arr2

array([1, 2, 3])

In [167]:
arr1 ** 2

array([1, 4, 9])

In [177]:
arr1.max()

np.int64(3)

In [178]:
np.max(arr1)

np.int64(3)

In [179]:
arr1.argmax()

np.int64(2)

In [None]:
arr1.mean()

2.0

In [182]:
arr2 = np.full((3, 4), 2)
arr2

array([[2, 2, 2, 2],
       [2, 2, 2, 2],
       [2, 2, 2, 2]])

In [186]:
arr2.sum(axis=1)

array([8, 8, 8])

In [184]:
arr2.max()

np.int64(2)

## Copying vs views and structural operations

- Indexing and slicing often return views (share memory) not copies.
- Use `.copy()` to force a real copy.
- Structural operations: `append`, `insert`, `delete` return new arrays (they do not modify in place).


In [187]:
arr2d = rng.integers(0, 20, (3, 4))
print(arr2d)

[[ 9  0  6 16]
 [ 7  9  5 14]
 [17  6  5  6]]


In [198]:
arr = rng.integers(0, 20, (18))
print(arr)

[11 17  7 19 11 15  5 13  0 14  1 13 11 17 16 15  8  0]


In [192]:
arr.reshape(2, 9)

array([[ 8, 15,  4,  6, 14, 18, 18,  6, 18],
       [ 5, 16, 19, 10, 10, 15,  3,  3, 16]])

In [194]:
arr.reshape(3, 6)

array([[ 8, 15,  4,  6, 14, 18],
       [18,  6, 18,  5, 16, 19],
       [10, 10, 15,  3,  3, 16]])

In [200]:
arr.reshape(9, -1)

array([[11, 17],
       [ 7, 19],
       [11, 15],
       [ 5, 13],
       [ 0, 14],
       [ 1, 13],
       [11, 17],
       [16, 15],
       [ 8,  0]])

In [201]:
print(arr2d)

[[ 9  0  6 16]
 [ 7  9  5 14]
 [17  6  5  6]]


In [None]:
arr2d[1, 3] # arr[row, col]

np.int64(14)

In [209]:
arr2d[:2, 3]

array([16, 14])

In [212]:
arr2d = rng.integers(0, 10, (20, 20))
print(arr2d)

[[3 9 6 6 0 3 9 5 4 7 7 3 0 1 4 2 5 8 5 4]
 [7 6 7 3 5 0 2 9 4 4 7 2 7 0 7 2 1 3 4 0]
 [4 8 1 7 7 6 0 5 2 7 9 2 3 5 1 6 8 8 1 4]
 [9 8 6 0 0 3 0 9 7 9 0 3 4 6 8 9 2 5 0 3]
 [3 1 4 8 7 4 1 1 3 4 0 3 3 6 7 4 5 9 4 6]
 [1 6 0 7 5 2 3 9 0 8 0 8 4 7 6 1 7 3 7 6]
 [5 2 4 4 1 8 3 8 8 3 5 8 4 2 9 1 8 4 4 0]
 [6 5 4 0 7 7 5 4 9 8 2 4 2 9 5 3 8 8 5 1]
 [3 5 8 6 9 1 3 7 1 9 4 2 5 8 5 1 3 3 4 0]
 [4 8 1 6 5 8 4 5 1 5 0 4 6 1 8 1 5 4 5 9]
 [5 2 6 4 9 6 2 7 9 8 3 3 1 8 5 5 8 0 0 8]
 [5 2 6 9 9 4 8 1 7 6 1 3 4 3 5 1 9 9 9 1]
 [2 4 2 5 2 6 1 1 5 8 9 1 8 5 0 5 3 2 8 9]
 [2 0 6 3 0 9 1 9 1 3 8 0 2 5 9 1 1 4 0 6]
 [5 0 3 5 2 6 9 3 6 8 0 9 5 8 9 6 9 0 2 2]
 [9 1 6 7 1 4 9 7 2 0 4 0 8 1 6 4 2 1 3 6]
 [2 1 3 7 5 7 0 1 0 2 3 0 1 6 9 0 3 3 1 1]
 [9 5 7 1 9 2 8 5 1 1 6 6 6 4 6 9 8 3 7 0]
 [0 1 2 2 2 3 2 8 1 7 3 7 0 5 7 3 6 3 3 3]
 [9 3 6 3 7 9 8 9 6 6 8 7 1 8 7 2 7 8 8 9]]


In [213]:
arr2d[[5, 10, 15]]

array([[1, 6, 0, 7, 5, 2, 3, 9, 0, 8, 0, 8, 4, 7, 6, 1, 7, 3, 7, 6],
       [5, 2, 6, 4, 9, 6, 2, 7, 9, 8, 3, 3, 1, 8, 5, 5, 8, 0, 0, 8],
       [9, 1, 6, 7, 1, 4, 9, 7, 2, 0, 4, 0, 8, 1, 6, 4, 2, 1, 3, 6]])

In [217]:
print(arr2d < 5)

[[ True False False False  True  True False False  True False False  True
   True  True  True  True False False False  True]
 [False False False  True False  True  True False  True  True False  True
  False  True False  True  True  True  True  True]
 [ True False  True False False False  True False  True False False  True
   True False  True False False False  True  True]
 [False False False  True  True  True  True False False False  True  True
   True False False False  True False  True  True]
 [ True  True  True False False  True  True  True  True  True  True  True
   True False False  True False False  True False]
 [ True False  True False False  True  True False  True False  True False
   True False False  True False  True False False]
 [False  True  True  True  True False  True False False  True False False
   True  True False  True False  True  True  True]
 [False False  True  True False False False  True False False  True  True
   True False False  True False False False  True]


In [220]:
arr2d = rng.integers(0, 10, (3, 3))
print(arr2d)

[[4 5 1]
 [5 7 1]
 [7 2 0]]


In [221]:
arr2d < 3

array([[False, False,  True],
       [False, False,  True],
       [False,  True,  True]])

In [223]:
arr2d[arr2d<3]

array([1, 1, 2, 0])

## Sorting arrays
- `array.sort()`: sorts the array in place.
- `np.sort(array)`: returns a sorted copy of the array.

In [224]:
arr

array([11, 17,  7, 19, 11, 15,  5, 13,  0, 14,  1, 13, 11, 17, 16, 15,  8,
        0])

In [225]:
arr.sort()

In [226]:
arr

array([ 0,  0,  1,  5,  7,  8, 11, 11, 11, 13, 13, 14, 15, 15, 16, 17, 17,
       19])

In [227]:
arr = rng.integers(0, 10, 20)
arr

array([0, 3, 4, 2, 0, 3, 8, 2, 8, 7, 8, 4, 0, 5, 5, 7, 6, 0, 9, 0])

In [228]:
np.sort(arr)

array([0, 0, 0, 0, 0, 2, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8, 8, 9])

In [229]:
arr

array([0, 3, 4, 2, 0, 3, 8, 2, 8, 7, 8, 4, 0, 5, 5, 7, 6, 0, 9, 0])

## Uniqueness
- `np.unique(array)`: returns the unique elements of an array.
- `return_counts=True`: also returns the counts of each unique element.

# Combining Arrays
- `np.concatenate`: join a sequence of arrays along an existing axis.
- `np.stack`: join a sequence of arrays along a new axis.
- `np.hstack`: stack arrays in sequence horizontally (column-wise).
- `np.vstack`: stack arrays in sequence vertically (row-wise).

In [None]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

np.concatenate([arr1, arr2])

array([1, 2, 3, 4, 5, 6])

In [237]:
arr1 = np.array([[1, 2],
                 [2, 3]])
arr2 = np.array([[4, 5],
                 [6, 7]])

np.concatenate([arr1, arr2], axis=1)

array([[1, 2, 4, 5],
       [2, 3, 6, 7]])

## Splitting
- `np.split`: split an array into multiple sub-arrays.
- `np.hsplit`: split an array into multiple sub-arrays horizontally.
- `np.vsplit`: split an array into multiple sub-arrays vertically.
- `np.array_split`: split an array into multiple sub-arrays of equal or near-equal size.

## Exercises

Try these exercises. Each has a hidden or following solution cell.

1) Create a 1D array of the even numbers from 2 to 40 inclusive using two different methods.
2) Create a 3x3 identity-like array but with 5s on the diagonal.
3) Generate a 5x4 matrix of random integers from 10 to 99, then compute column-wise max and the index of the min in each row.
4) Given `a = np.arange(24)`, reshape it to shape `(4, 3, 2)`, then compute the sum along axis 2.
5) Concatenate two arrays `p = [1,2,3]` and `q = [4,5,6]` as:
   - a) a single 1D array
   - b) a 2x3 stacked array
   - c) a 3x2 stacked array
6) Split `np.arange(20)` into 3 nearly equal parts and sort each part descending.
7) Demonstrate view vs copy by slicing and modifying the original array vs a copy.

