(**Click the icon below to open this notebook in Colab**)

[![Open InColab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/xiangshiyin/machine-learning-for-actuarial-science/blob/main/2025-spring/week01/notebook/demo.ipynb)

## Python basics refresher

### Variables

* A Python variable is a reserved memory location to store values. It is a placeholder where you could use to save parameters that'll be repeatly used in your program. The assignment statement (a.k.a. definition) of a variable normally follows the pattern of `<variable_name> = value`. Unlike other programming languages (such as C/C++/Java), you don't need to clarify the data type of the variable.
* `=` is the assigning operator in Python, not the logic operator (`equal`, we'll talk about this later)
* Value can be of any data structure, such as integer, float, string, list, tuple, dictionary, etc.

In [1]:
# Declare a variable and initialize it
a = 123
print(a)

123


In [2]:
# Change the value of the variable
a = 456
print(a)

456


In [3]:
# Delete the variable
del a

In [4]:
# Initialize a new variable with a pre-existing variable
a = 123
b = a
print(b)

123


### Primitive Data Types in Python
Python supports several primitive data types:
- int: Integer
- float: Floating point number
- str: String
- bool: Boolean

#### Numbers

In [5]:
x = 1
y = 2.0

**Arithmetic Operations**

| Operation       | Result                                                                      |
|-----------------|-----------------------------------------------------------------------------|
| x + y           | sum of x and y                                                              |
| x - y           | difference of x and y                                                       |
| x * y           | product of x and y                                                          |
| x / y           | quotient of x and y                                                         |
| x // y          | floored quotient of x and y                                                 |
| x % y           | remainder of x / y                                                          |
| -x              | x negated                                                                   |
| +x              | x unchanged                                                                 |
| abs(x)          | absolute value or magnitude of x                                            |
| pow(x, y)       | x to the power y                                                            |
| x ** y          | x to the power y                                                            |

#### String
* Strings are textual data, and are normally written in a variety of ways:
    * Single quotes: 'allows embedded "double" quotes'
    * Double quotes: "allows embedded 'single' quotes"
    * Triple quoted: '''Three single quotes''' or """Three double quotes"""
* Some programming languages use single quotes for literal char s, double quotes for literal String s. In Python, there isn't such distinction. You can use them in both char and String
* You can use `\` to escape special characters

In [None]:
x = '123'
print(type(x))

In [None]:
x = "allows embedded 'single' quotes"
print(x)

In [7]:
# Indexing and slicing
x = "Hello, World!"
print(x[0])

H


In [8]:
x[-1]

'!'

In [9]:
x[0:5]

'Hello'


* `+` concatenates two strings
* `*` repeats given string a certain number of times
* str.`lower()` and str.`upper()` change the case of strings

In [10]:
## concatenate strings
x = 'abc'
y = ' def'
x + y

'abc def'

In [11]:
# String formatting
a = 1
b = 2
c = a + b
# '{} plus {} is {}'.format(b,a,c) 
'{B} plus {A} is {C}'.format(B=b,A=a,C=c) 

'2 plus 1 is 3'

#### Boolean
This built-in data type that can take up the values: `True` and `False`, which often makes them interchangeable with the integers 1 and 0. Booleans are useful in conditional and comparison expressions.

| Operator                                                              | Description                                                                        |
|-----------------------------------------------------------------------|------------------------------------------------------------------------------------|
| or,\|                                                                 | Boolean OR                                                                         |
| and,\&                                                                | Boolean AND                                                                        |
| not x                                                                 | Boolean NOT                                                                        |

### Other data types
- list
- tuple
- dictionary
- set

In [None]:
# list
x = [1,2,3,4,5]

In [None]:
# tuple
y = (1,2,3,4,5)

In [None]:
# dictionary
z = {'a': 1, 'b': 2, 'c': 3}

In [None]:
# set
xx = set(x)

### Functions
Functions are defined using the `def` keyword. The `return` statement is used to return values from the function.

In [None]:
def my_function():
    print("Hello from a function")

### `Class` and `Object`
In general
- A `class` is a blueprint for declaring and creating objects
- An `object` is a class instance that allows programmers to use variables and methods from inside the class
- A class defines a set of `attributes` (`<--> properties`) and `methods` (`<--> functions`) that the objects of that class will have.

In [None]:
class table:
    def __init__(self, l, w, h):
        self.l = l
        self.w = w
        self.h = h
        self.has_a_flat_top = True
    
    def hold_weight(self, weight):
        print('Holding a weight of {weight} kg')

## Numerical operations with `numpy`

`Numpy` is Python library supporting linear algebra operations with a large collection of functions to operate on large, multi-dimensional arrays and matrices. The core functionality of Numpy is the `ndarray`, for n-dimensional array, data structure.

- What’s the difference between a Python list and a NumPy array? [[source](https://numpy.org/doc/stable/user/absolute_beginners.html#whats-the-difference-between-a-python-list-and-a-numpy-array)]
  - NumPy gives you an enormous range of fast and efficient ways of creating arrays and manipulating numerical data inside them. While a Python list can contain different data types within a single list, all of the elements in a NumPy array should be homogeneous. The mathematical operations that are meant to be performed on arrays would be extremely inefficient if the arrays weren’t homogeneous.

- Why use NumPy?
  - NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use. NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

- What is an array?
  - An array is a central data structure of the NumPy library. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in various ways. The elements are all of the same type, referred to as the array dtype.
  - An array can be indexed by a tuple of nonnegative integers, by booleans, by another array, or by integers. The rank of the array is the number of dimensions. The shape of the array is a tuple of integers giving the size of the array along each dimension.

In [12]:
import numpy as np

### 1-D array

In [13]:
x = np.array([1,2,3,4,5])
x

array([1, 2, 3, 4, 5])

In [16]:
x.shape

(5,)

In [17]:
x.ndim

1

### 2-D array

In [18]:
y = np.array([[1,2,3],[4,5,6],[7,8,9]])

In [19]:
y.shape

(3, 3)

In [20]:
y.ndim

2

In [21]:
# Transpose a 2-D array
y.T

array([[1, 4, 7],
       [2, 5, 8],
       [3, 6, 9]])

In [23]:
z = np.eye(3)

In [24]:
z.T

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [25]:
# Matrix multiplication
np.dot(y,z)

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

In [26]:
y.dot(z)

array([[1., 2., 3.],
       [4., 5., 6.],
       [7., 8., 9.]])

In [27]:
# Element-wise multiplication
y*z

array([[1., 0., 0.],
       [0., 5., 0.],
       [0., 0., 9.]])

### Common aggregation functions

In [29]:
# mean of an array
np.mean([1,2,3,4,5])

np.float64(3.0)

In [31]:
np.mean(y, axis=0)

array([4., 5., 6.])

In [32]:
# centering an array
y - np.mean(y, axis=0)

array([[-3., -3., -3.],
       [ 0.,  0.,  0.],
       [ 3.,  3.,  3.]])

In [33]:
# standard deviation of an array
np.std(y)

np.float64(2.581988897471611)

In [34]:
# standard deviation of an array along a specific axis
np.std(y, axis=0)

array([2.44948974, 2.44948974, 2.44948974])

### Quick exercise of numpy
Normalize a given 5x5 random matrix along the columns

In [38]:
x = np.random.randn(5,5)
x

array([[-0.39735116,  1.57292476,  1.58020188, -1.23987347,  0.39947478],
       [-0.25825565, -0.00703489,  2.17030601,  0.19227513, -0.29264149],
       [-0.22780548,  1.11740092, -0.92124464,  1.98902403, -1.11710449],
       [-1.50034768,  1.46781242, -2.1833489 ,  0.74478051, -0.66615892],
       [-1.04091004,  0.94355383,  0.11770931,  1.12926823,  0.87818638]])

## Data manipulation with `pandas`
* `pandas` is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.
* It is included in the installation of the Anaconda distribution
* When working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean and process your data. In pandas, a data table is called a `DataFrame`.

In [39]:
import pandas as pd

In [None]:
# Create a pandas DataFrame

## Visualization with `matplotlib`