<a href="https://colab.research.google.com/github/tufts-mathmodeling/HW/blob/master/HW6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
%config InlineBackend.figure_formats = ['svg']
from random import sample
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d
from numpy import linalg as LA

# Math Modeling Homework 6 (Spring 2020)

## Basics

Here's some code that gives you eigen-data for the matrix $M=\begin{bmatrix}2&1\\ 1&1\end{bmatrix}$. Then we can iterate the matrix $M$, normalize the output, and study the long-term dynamics.

In [0]:
M = np.matrix([[2, 1], [1, 1]])
evals, evecs = LA.eig(M)
print('eigenvalues:',evals)
print('corresponding eigenvectors are the columns:')
print(evecs)

In [0]:
M = np.matrix([[2, 1], [1, 1]])
v= np.array([1,1])
for n in range(1,11):
    print(n,'th step, normalized:',M**n@v/LA.norm(M**n@v),'vector length:',LA.norm(M**n@v))

## Problem 1: Power Method and Markov Chains (8 points)

(a) By entering several different $v$ vectors and experimenting with the number of iterates, describe the dynamics of $M^n\cdot v$.  Can you find a vector that shrinks instead of expands?  Is that still true if you increase the range of powers you consider?  *Optional but recommended*: write code that plots the normalized iterates and include some pictures.

*Warning for the "shrinking" part of the previous problem: unless you're extra careful, you'll eventually start to get rounding errors in python, so don't be too alarmed if weird things start happening when you're demanding lots of precision.*

(b) Repeat a similar experiment with the iteration matrix $P$ from the stock market example in class.  Explain why this follows a different pattern of vector length than in the previous example.



## Monte Carlo Everything


## Problem 2: Scrabble tiles (4 points)

Here are all the letters in a Scrabble set, with frequency and value.

---

In [0]:
letters = {
    ' ': (2, 0),
    'a': (9, 1),
    'b': (2, 3),
    'c': (2, 3),
    'd': (4, 2),
    'e': (12, 1),
    'f': (2, 4),
    'g': (3, 2),
    'h': (2, 4),
    'i': (9, 1),
    'j': (1, 8),
    'k': (1, 5),
    'l': (4, 1),
    'm': (2, 3),
    'n': (6, 1),
    'o': (8, 1),
    'p': (2, 3),
    'q': (1, 10),
    'r': (6, 1),
    's': (4, 1),
    't': (6, 1),
    'u': (4, 1),
    'v': (2, 4),
    'w': (2, 4),
    'x': (1, 8),
    'y': (2, 4),
    'z': (1, 10)
}
letter_freq = {letter: v[0] for letter, v in letters.items()}
letter_score = {letter: v[1] for letter, v in letters.items()}
letter_pool = []
for letter, freq in letter_freq.items():
    letter_pool += [letter] * freq

In [0]:
n_trials = 1000
expected_score = sum(letter_freq[letter] * score
                     for letter, score in letter_score.items()) / len(letter_pool)
avg_scores = []
total_score = 0
for trial in range(1, n_trials + 1):
    total_score += letter_score[sample(letter_pool, 1)[0]]
    avg_scores.append(total_score / trial)

In [0]:
plt.xlabel('Trial')
plt.ylabel('Average score')
plt.title('Expected Scrabble tile score')
plt.plot([0, n_trials], [expected_score, expected_score],
         color='orange',
         label='Expected score (exact)')
plt.ylim(expected_score - 1, expected_score + 1)
plt.xlim(0, n_trials)
plt.plot(avg_scores, label='Expected score (estimate)')
plt.legend()

Explain what's happening in the plotting code above and why.  Report how much you have to extend the trials to get accuracy within .1, .01, .001, .0001.  Re-run a few times at each experiment size to get a sense of how much things are stable or changeable due to stochasticity.

## Problem 3: Distances and volumes (16 points)
The simulation above is not deep—we can just calculate the expected point value of a Scrabble tile directly, so why bother with Monte Carlo sampling? Indeed, Monte Carlo sampling is most useful for problems that don't have an easy (or known) closed-form solution. Consider the problem of determining the expected distance between two points in a cube. This problem _does_ have a closed-form solution, expressed as the big ol' integral

$$\int_{0}^{1} \int_{0}^{1} \int_{0}^{1} \int_{0}^{1} \int_{0}^{1} \int_{0}^{1} \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2 + (x_3 - y_3)^2} \ dx_1 dx_2 dx_3 dy_1 dy_2 dy_3$$

which evaluates to

$$\frac{4 + 17 \sqrt{2} - 6 \sqrt{3} + 21 \log(1 + \sqrt{2}) + 42 \log(2 + \sqrt{3}) - 7\pi}{105} \approx 0.6617 $$

We can avoid dealing with the integrals by using the Monte Carlo method to estimate the answer. Rather than sampling individual Scrabble tiles, we'll sample _pairs_ of points in the cube. First let's plot some pairs to get an idea of the distribution.

In [0]:
n_pairs_plot = 50
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for _ in range(n_pairs_plot):
    # We uniformly sample endpoints from (0, 0, 0)..(1, 1, 1).
    p1 = np.random.random(3)
    p2 = np.random.random(3)
    pair = np.array([p1, p2]).T
    ax.scatter(*pair, color='green')
    ax.plot(*pair, color='red')
plt.show()

In [0]:
n_pairs_monte = 1000
total_distance = 0
avg_distances = []
distances = []
expected_distance = (4 + 17 * np.sqrt(2) - 6 * np.sqrt(3) +
                     21 * np.log(1 + np.sqrt(2)) + 42 * np.log(2 + np.sqrt(3)) -
                     7 * np.pi) / 105
print('exact distance expectation is {:.3f}'.format(expected_distance))
for trial in range(1, n_pairs_monte + 1):
    # We uniformly sample endpoints from (0, 0, 0)..(1, 1, 1).
    p1 = np.random.random(3)
    p2 = np.random.random(3)
    pair_dist = np.sqrt(np.sum(np.power(p1 - p2, 2)))
    total_distance += pair_dist
    distances.append(pair_dist)
    avg_distances.append(total_distance / trial)

In [0]:
plt.plot(avg_distances, label='Expected distance (estimate)')
plt.plot([0, n_pairs_monte], [expected_distance, expected_distance], label='Expected distance (exact)')
plt.xlabel('Trial')
plt.ylabel('Distance')
plt.xlim(0, n_pairs_monte)
plt.ylim(expected_distance - 0.2, expected_distance + 0.2)
plt.title('Distance between pairs in the unit cube')
plt.legend()
plt.show()

In [0]:
plt.hist(distances)
plt.axvline(expected_distance,color='red')

(a) Recall from class that the "cube" in any $\mathbb R^n$ can be modeled by letting each of the $n$ variables range over $[0,1]$.  Use Monte Carlo methods to estimate the average distance between points in the cube in dimension 2 (that's a square!), 4, and 5.  

Now let's look at some area and volume estimates.

The method below illustrates visually how to estimate the area of the circle: you drop random points and count how many are in it.

In [0]:
x = np.linspace(0, 1, 100)
quarter_circle = np.sqrt(1 - x**2)
fig,ax = plt.subplots()
ax.plot(x, quarter_circle)
ax.scatter(np.random.random(100), np.random.random(100), color='red')
ax.set_aspect('equal')
plt.title('Area under the unit circle')
plt.xlim(0, 1)
plt.ylim(0, 1)

(b) Recall that the "unit ball" in any $\mathbb R^n$ is just the points $(x_1,\dots,x_n)$ satisfying $x_1^2+\dots+x_n^2=1$.  Estimate the volumes of the unit balls in dimension 2,3,4,5 by estimating what share of random points with coordinates $-1\le x_i \le 1$ are in them.

(c) Draw yourself a unit circle and GUESS what the average distance between points inside it will be.  Now use Monte Carlo to estimate the actual answer.  Repeat for dimension 3,4,5.  Discuss.

## Problem 3: Buffon's needle (4 points)

In class we're doing the example of Buffon's needle, where you drop a needle on a floor and see if it hits the seams.  Fully explain how the code below works.

In [0]:
def plot_needle(x, y, theta, size):
    x_end = x + size * np.cos(theta)
    y_end = y + size * np.sin(theta)
    plt.plot([x, x_end], [y, y_end], markersize=2)

n_lanes = 5
needle_size = 1
for idx in range(n_lanes + 1):
    plt.plot([idx, idx], [-2 * needle_size, 2 * needle_size],color='black')
plt.ylim(-2 * needle_size, 2 * needle_size)
plt.xlim(-0.5, n_lanes + 0.5)
for _ in range(20):
    plot_needle(n_lanes * np.random.random(), 2 * np.random.random() - 1, 2 * np.pi * np.random.random(), needle_size)
plt.title("Buffon's needle")
plt.show()

In [0]:
buffon_est = []
hits = 0
for trial in range(1, 1001):
    x = np.random.random()
    theta = 2 * np.pi * np.random.random()
    x_end = x + np.cos(theta)
    if min(x, x_end) % 1 > max(x, x_end) % 1:
        hits += 1
    if hits > 0:
        p = hits / trial
        buffon_est.append(2 / p)
plt.title("Value")
plt.xlabel('Trial')
plt.plot(buffon_est)