# Gradient Descent Exercises


Solve the following optimization problems using gradient descent. If you are stuck or just want to check the solution, you can use the prompt after each problem to obtain example code from co-pilot (keep in mind that LLM might make mistakes!).

> Additional question: Write one function with optional arguments that can handle all three problems.

### Question 1 

Optimization problem: 
$$\hat{\theta}= \underset{\theta}{\arg \min}  \left[ y - \max(0, \theta x) \right]^2,$$
where $y=10$ and $x=2$. 
- Here use $\theta^{(0)}=3$ and $\eta = 0.1$. 
- Let the convergence threshold be $10^{-4}$
- (Sub) Gradient:
$$
\nabla\mathcal{L}(\theta) =
\begin{cases}
-(y - \theta x) \cdot x, & \text{if } \theta x > 0 \\
0, & \text{if } \theta x \le 0
\end{cases}
$$


In [2]:
# Prompt to use to reveal the code:
# Write a python script that solves the following problem using gradient descent: $$\hat{\theta}= \underset{\theta}{\arg \min}  \left[ y - \max(0, \theta x) \right]^2,$$ where $y=10$ and $x=2$. Here $\theta^{(0)}=3$ and $\eta = 0.1$. Let the convergence threshold be $10^{-4}$


In [4]:
# Solution: 
# Note: the minimizer should be $\hat{\theta} = 5$ with possible numerical errors.

### Question 2 

Optimization problem
$$
\hat{\theta}= \underset{\theta}{\arg \min} \left( y - \frac{1}{1+\exp(-\theta x)} \right)^2,
$$
where $y=0.3$ and $x=2$. 

- Here $\theta^{(0)}=0$ and $\eta = 0.1$. 
- Let the convergence threshold be $10^{-4}$
- Gradient:
$$
\nabla\mathcal{L}(\theta) = - (y - \sigma(\theta x)) \cdot \sigma(\theta x) \cdot (1 - \sigma(\theta x)) \cdot x
$$

In [39]:
# Prompt to use to reveal the code:
# Write a python script that solves the following problem using gradient descent: Optimization problem $$\mathcal{L}_\text{Sigmoid}(\theta) = \frac{1}{2} \left( y - \frac{1}{1+\exp(-\theta x)} \right)^2,$$ where $y=0.3$ and $x=0$.  Here $\theta^{(0)}=0$ and $\eta = 0.1$. Let the convergence threshold be $10^{-4}$

In [35]:
# Solution: 
# Note: the minimizer should be around -0.42 with possible numerical errors.

### Question 3

Optimization problem
$$
\hat{\theta}= \underset{\theta}{\arg \min} \left[ y - \tanh(\theta x) \right]^2,
$$ 
where $\tanh(\theta x)=\frac{e^{\theta x}-e^{-\theta x}}{e^{\theta x}+e^{-\theta x}}$,  $y=0.3$ and $x=2$. 
- Here $\theta^{(0)}=0$ and $\eta = 0.1$. 
- Let the convergence threshold be $10^{-4}$
- Gradient:
$$
\nabla\mathcal{L}(\theta) = - [y - \tanh(\theta x)] \cdot [1 - \tanh^2(\theta x)] \cdot x
$$


In [None]:
# Prompt to use to reveal the code:
# Write a python script that solves the following problem using gradient descent:  $$\hat{\theta}= \underset{\theta}{\arg \min} \left[ y - \tanh(\theta x) \right]^2,$$  where $\tanh(\theta x)=\frac{e^{\theta x}-e^{-\theta x}}{e^{\theta x}+e^{-\theta x}}$,  $y=0.3$ and $x=2$. Here $\theta^{(0)}=0$ and $\eta = 0.1$. Let the convergence threshold be $10^{-4}$

In [None]:
# Solution: 
# Note: the minimizer should be around 0.155 with possible numerical errors.
# helpful function: np.tanh()

## Question 4: Stochastic Gradient Descent

Optimization problem: 
$$\hat{\theta}= \underset{\theta}{\arg \min}  \frac{1}{2N} \sum_{i=1}^N \left[ y_i - \max(0, \theta x_i) \right]^2,$$
where $\{(x_i,y_i)\}_{i=1}^N$ are generated in the cell below. The rest of the setup is the same with that of Question 1. 

Tasks:
1. Use gradient descent to find solve the problem; denote the optimal solution as $\hat{\theta}_{\rm GD}$
2. Use stochastic gradient descent with mini-batch size $n=10$ to find $\hat{\theta}_{\rm SGD}$

> Optional: Try a range of $n$ from $5$ to $50$ and compare the number of iterations and/or computing time each algorithm takes. 



In [None]:
# Data generation:
# True theta = 8
import numpy as np

# Settings
np.random.seed(42)
N = 100  # sample size c
true_theta = 8
noise_std = 0.5

x = np.random.randn(N)  # x_i ~ N(0,1)
noise = np.random.normal(0, noise_std, N)
y = np.maximum(0, true_theta * x) + noise


In [43]:
# Prompt to use to reveal the code:
# Write a python script that solves the following problem using stochastic gradient descent with batch size 20: $$\hat{\theta}= \underset{\theta}{\arg \min}  \frac{1}{2N} \sum_{i=1}^N \left[ y_i - \max(0, \theta x_i) \right]^2,$$ where $\{(x_i,y_i)\}_{i=1}^N$ are generated and saved in x and y with N=100. Here $\theta^{(0)}=0$ and $\eta = 0.1$. Let the convergence threshold be $10^{-4}$  Here $\theta^{(0)}=3$ and $\eta = 0.1$. Let the convergence threshold be $10^{-4}$