## 1) Optimization / Learning Rate (Conceptual)
**1. Why can the same learning rate feel “too large” later in training, even if it worked early on?**
- Early in training we are far from the optimum, so large steps help rapid progress. Near convergence, curvature can be high, so the same step size may overshoot and destabilize updates. 
- Does worked early on mean that the gradient descent was working, i.e. loss was decreasing. 
- The LR has to be decayed in order to find precise minimum point that converges at a point. 

**2. What is “oscillation” near a minimum in gradient descent, and why does it slow/unstabilize convergence?**
- The update jumps across the minimum repeatedly, preventing fine-grained convergence. 

**3. Explain why learning-rate decay is useful. What is the trade-off if the learning rate becomes too small?**
- LR decay can accelerate the training performance at the initial stage as the LR can be setted to a big number. But if it becomes too small, the gradient won't update and might not find the minima. 

**4. What is a cosine learning rate schedule (cosine annealing), and what is its intended benefit compared to a step decay?**
- Cosine scheduling is periodical, so it doesn't keep decreasing until the magnitude of the LR becomes too small that the gradient doesn't update significantly (like gradient vanishing). 

**5. What is warm restart (SGDR-style), and why might restarting the learning rate help in non-convex optimization?**
- **SGDR** Meaning: Stochastic Gradient Descent with Restarts. 
- If the loss landscape is non-convex, there may be many local minima so the learning rate may be beneficial when its big 
* basin: loss surface에서 하나의 골짜기 영역. 
* stagnation: 진전이 거의 없는 상태. (plateau, saddle point)

### Small Step Approximation 
아주 조금 움직이면 함수는 거의 직선처럼 보인다.

$f(x + \Delta x) \approx f(x) + \nabla f(x)^T \Delta x$

GD: $f(x - \eta \nabla f(x)) 
\approx 
f(x) - \eta \|\nabla f(x)\|^2$

뒤 항이 항상 음수 → loss 감소. 그래서 gradient 반대로 가는 게 합리적 


## 2) Probability / Quant Intuition
6. Consider a strategy with win probability p = 0.55.
   Each win gives +1 and each loss gives -1.
   After N = 100 independent trades:
   (a) Compute the expected total return E[S].
   (b) Compute the variance Var(S).
   (c) Give a rough argument for why the probability of ending with negative total return is not zero.
   (Optional) Approximate P(S < 0) using a normal approximation.

7. What does it mean for a return distribution to be “heavy-tailed”?
   Why can assuming a normal distribution be dangerous in finance?

8. In risk terms, why does lower variance often imply lower risk, and what important risks can remain even if variance is low?

## 3) Coding Test (Python)
9. Implement a simulation to estimate the probability of a negative total return for the strategy in Q6.
   Requirements:
   - Run M = 10_000 Monte Carlo trials.
   - Each trial simulates N = 100 trades with p = 0.55, payoff ±1.
   - Report:
     (a) estimated P(S < 0)
     (b) sample mean of S
     (c) sample variance of S
     (d) min(S), max(S)

10. Implement `top_k(nums, k)` that returns the k largest elements.
    Requirements:
    - Use a min-heap approach (O(n log k)).
    - Edge cases: k = 0, k >= len(nums), nums may contain negatives/duplicates. 

In [1]:
import random

def one_person():
    money = 0
    for _ in range(100):
        if random.random() < 0.55:
            money += 1
        else:
            money -= 1
    return money


results = [one_person() for _ in range(1000)]
results

[22,
 10,
 18,
 4,
 16,
 18,
 8,
 6,
 0,
 18,
 -2,
 10,
 18,
 12,
 24,
 4,
 20,
 12,
 16,
 8,
 2,
 -10,
 4,
 -4,
 18,
 12,
 14,
 -2,
 10,
 10,
 6,
 12,
 20,
 10,
 10,
 4,
 14,
 4,
 22,
 16,
 -6,
 4,
 0,
 2,
 6,
 2,
 -2,
 4,
 -2,
 24,
 0,
 2,
 0,
 4,
 -4,
 20,
 18,
 2,
 -2,
 2,
 24,
 18,
 4,
 20,
 10,
 20,
 12,
 8,
 6,
 10,
 18,
 -8,
 14,
 -6,
 14,
 -6,
 4,
 10,
 8,
 6,
 10,
 6,
 8,
 20,
 18,
 18,
 2,
 18,
 20,
 8,
 2,
 2,
 18,
 -2,
 10,
 12,
 10,
 12,
 8,
 10,
 8,
 26,
 -2,
 12,
 -4,
 16,
 18,
 10,
 0,
 -10,
 24,
 16,
 -6,
 2,
 6,
 2,
 12,
 20,
 -2,
 6,
 14,
 10,
 34,
 28,
 22,
 30,
 18,
 8,
 6,
 22,
 2,
 10,
 6,
 0,
 10,
 12,
 0,
 6,
 0,
 28,
 18,
 14,
 10,
 -4,
 10,
 18,
 6,
 10,
 4,
 0,
 0,
 18,
 20,
 8,
 2,
 2,
 0,
 14,
 6,
 22,
 24,
 22,
 10,
 16,
 18,
 18,
 -4,
 -2,
 8,
 6,
 6,
 8,
 14,
 20,
 10,
 20,
 10,
 18,
 -8,
 22,
 6,
 16,
 22,
 14,
 22,
 -8,
 16,
 14,
 12,
 0,
 4,
 22,
 6,
 10,
 26,
 8,
 18,
 -6,
 18,
 12,
 6,
 24,
 38,
 34,
 26,
 14,
 -10,
 -6,
 -12,
 10,
 22,
 6,
 10,
 