<div style="
  text-align: center; 
  padding: 1em; 
  margin: 0 auto 1em auto; 
  max-width: 90%; 
  border: 2px solid #00B050; 
  border-radius: 10px; 
  background-color: #f9fff9; 
  box-sizing: border-box;
">
  <h1 style="margin-bottom: 0.2em; color: #006400;">From Taylor Expansions to Gradient Descent</h1>
  <h3 style="margin-top: 0; color: #006400; font-style: italic;">
    Aligned with Lecture 2A-II: Gradient Descent (and Beyond)
  </h3>
</div>

In [None]:
import infra.plot as plot
import lib.gradient_descent as GD
# dev-only
# import importlib
# importlib.reload(plot)
# importlib.reload(GD)

## 2.1: Visualizing local approximations w/ Taylor Series

Recall from lecture of our goal: "minimize a function $l$ efficiently"
- What is available:
  - **You can evaluate the function** - for any $W$, you can compute $l(W)$
  - **Function is differentiable** - you can compute the gradient (first derivative) at any given point $W$
  - **Local information is accessible** - you know what's happening in the neighborhood of your current point
- What is unknown:
  - **No global structure** - you don't know where is the global minimum
  - **No closed-form minimum** - you cannot analytically solve for $argmin_W\ l(W)$
  - **No guarantee of convexity** - there may exist local minima
- Notation in ML context:
  - $l$: loss function that measures model error
  - $W$: the set of learnable parameters adjusted via optimization methods to reduce model error

To build intuition for gradient-based optimization, we visualize how a function behaves near a point by using its Taylor approximation.

### $1^{st}$ Taylor Approximation on an "ugly" function

In [None]:
func_str = "sin(3*W) * exp(-W**2) + 0.3 * W"
plot.ml2_show_taylor_order_k(func_str, k=1, w0=0.7, w_range=(0, 1.8))

### When first-order isn't enough: a case for $2^{nd}$ insight

In [None]:
func_str = "W**4 - 3*W**2"
plot.ml2_show_taylor_order_k(func_str, k=1, w0=0, w_range=(-1.5, 1.5))
plot.ml2_show_taylor_order_k(func_str, k=2, w0=0, w_range=(-1.5, 1.5))

### [Lab Exercise] Test your own function

In [None]:
# TODO: define your own function in `func_str` as a python string
#   Available math function: 
#       sin, cos, tan, cot, sec, csc, asin, acos, atan, 
#       sinh, cosh, tanh, exp, log, ln, sqrt, abs
#   Available constants:
#       pi, E (Euler’s number), oo (Inf)
#   Available operators:
#       +, -, *, /, **, ()
func_str = "sin(W)"

# TODO: modify the function parameters below
#   `k`: order of taylor expansion
#   `w0`: the expansion is around W=w0
#   `w_range`: a tuple representing start/end value on x-axis
plot.ml2_show_taylor_order_k(
    func_str, 
    k = 1, 
    w0 = 0, 
    w_range = (-1.5, 6.5)
)

## 2.2: Full-batch Gradient Descent

Conceptual clarification
- **Binary Classification**: a task whose goal is to classify inputs into one out of two classes.
- **Linear Classification**: a method (e.g., perceptron) that uses a linear decision boundary.

### Base version of GD

In [None]:
cfg = {"case": 2, "lr": 0.1, "tr_mode": "gd",}

exp = GD.LinearGDExp(case_study=cfg['case'], lr=cfg['lr'], mode=cfg['tr_mode'])
exp.exec(verbose=True)
plot.ml2_show_dataset_2d(case=cfg['case'])
plot.ml2_show_stats(cfg)

In [None]:
plot.ml2_gen_w_seq(cfg, lastepoch=-1)
plot.ml2_animate(cfg)

### [Lab Exercise] Find a better LR

Now we switch to a more challanging dataset (w/ larger noise):

In [None]:
cfg = {"case": 3, "lr": 0.1, "tr_mode": "gd",}

exp = GD.LinearGDExp(case_study=cfg['case'], lr=cfg['lr'], mode=cfg['tr_mode'])
exp.exec(verbose=True)
plot.ml2_show_dataset_2d(case=cfg['case'])
plot.ml2_show_stats(cfg)

Under fixed number of epoch, try to change the LR parameter and observe the effect (e.g., 0.005, 1, 100).

In [None]:
# TODO: modify the param `lr` below
cfg = {"case": 3, "lr": 0.1, "tr_mode": "gd",}

exp = GD.LinearGDExp(case_study=cfg['case'], lr=cfg['lr'], mode=cfg['tr_mode'])
exp.exec(verbose=True)
plot.ml2_show_stats(cfg)

## 2.3: Compare SGD with GD

A faster convergence

In [None]:
cfg = {"case": 3, "lr": 0.05, "tr_mode": "sgd",}

exp = GD.LinearGDExp(case_study=cfg['case'], lr=cfg['lr'], mode=cfg['tr_mode'])
exp.exec(verbose=False)
plot.ml2_show_stats(cfg)
plot.ml2_gen_w_seq(cfg, lastepoch=-1)
plot.ml2_animate(cfg)

Visualize the trajectory

In [None]:
cfg1 = {"case": 1, "lr": 0.5, "tr_mode": "gd",}
exp = GD.LinearGDExp(case_study=cfg1['case'], lr=cfg1['lr'], mode=cfg1['tr_mode'])
# exp.exec(verbose=False)
cfg2 = {"case": 1, "lr": 0.5, "tr_mode": "sgd",}
exp = GD.LinearGDExp(case_study=cfg2['case'], lr=cfg2['lr'], mode=cfg2['tr_mode'])
# exp.exec(verbose=False)
plot.show_trajectory(cfg1, cfg2)

In [None]:
plot.ml2_show_stats(cfg1)
plot.ml2_show_stats(cfg2)