# NumPy & Pandas â€” Beginner Example (Expanded)

**Learning objectives:**
- Understand NumPy arrays vs Python lists
- Create and manipulate basic NumPy arrays
- Build a Pandas DataFrame, select rows/columns, and perform aggregations
- Write small helper functions and run simple tests

Run cells in order; exercises and quizzes follow examples.

## Why NumPy & Pandas?
- NumPy: efficient numerical arrays
- Pandas: table-oriented data structures (Series, DataFrame).

In [1]:
import numpy as np
import pandas as pd
print('NumPy', np.__version__, 'Pandas', pd.__version__)

NumPy 2.3.3 Pandas 2.3.3


## NumPy basics

In [2]:
arr = np.array([1,2,3,4,5])
print('arr', arr, 'mean', arr.mean(), 'sum', arr.sum())

arr [1 2 3 4 5] mean 3.0 sum 15


### Exercise 1
Create an array of even numbers 2..20 and compute mean & std (use np.arange).

In [3]:
even = np.arange(2,21,2)
print('even', even, 'mean', even.mean(), 'std', even.std())

even [ 2  4  6  8 10 12 14 16 18 20] mean 11.0 std 5.744562646538029


## Pandas basics

In [4]:
df = pd.DataFrame({'value': arr, 'value_squared': arr**2})
print(df.head())
print('\nmean value', df['value'].mean())

   value  value_squared
0      1              1
1      2              4
2      3              9
3      4             16
4      5             25

mean value 3.0


### Exercise 2
Add `is_large` = value > mean and groupby it to get counts and mean squared value.

In [5]:
df['is_large'] = df['value'] > df['value'].mean()
print(df.groupby('is_large').agg(count=('value','count'), mean_sq=('value_squared','mean')))

          count    mean_sq
is_large                  
False         3   4.666667
True          2  20.500000


## Normalization helper

In [6]:
def normalize_series(s):
    return (s - s.mean())/s.std()

print(normalize_series(df['value']))

0   -1.264911
1   -0.632456
2    0.000000
3    0.632456
4    1.264911
Name: value, dtype: float64


## Quiz (write answers in a code cell)
1) Difference between .loc and .iloc?
2) Why use NumPy arrays instead of lists?
3) What does z-score normalization do?

In [7]:
# Example answers as strings
answers = ['loc uses labels; iloc uses integer positions',
           'NumPy arrays are vectorized and efficient',
           'Subtract mean and divide by std to get mean 0 and std 1']
print(answers)

['loc uses labels; iloc uses integer positions', 'NumPy arrays are vectorized and efficient', 'Subtract mean and divide by std to get mean 0 and std 1']


In [8]:
assert arr.mean() == 3
assert arr.sum() == 15
print('Basic tests passed')

Basic tests passed
