Skip to content

Commit d9361bf

Browse files
committed
Added CLT section
1 parent e8e9d8c commit d9361bf

File tree

2 files changed

+95
-2
lines changed

2 files changed

+95
-2
lines changed

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Quantitative interview questions and strategies
1+
# Quantitative Finance Interview Questions and Strategies
22

33
## Introduction
44

@@ -92,7 +92,7 @@ Probability problems should be fun to solve and let's begin!
9292
- simplex
9393
- frog jump
9494

95-
- Approximation trick1: Central limit theorem
95+
- Approximation trick1: [Central limit theorem](prob_clt.ipynb)
9696
- fake coin
9797
- monte carlo integration
9898

prob_clt.ipynb

+93
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Central Limit Theorem (CLT)\n",
8+
"\n",
9+
"## Definition:\n",
10+
"Let $X_{1}$, $X_{2}$, $X_{3}$,... be i.i.d with mean $\\mu$ and variance $\\sigma^{2}$. As $n \\rightarrow \\infty$, let $S=\\sum_{k=1}^n X_{i}$, we have $S \\rightarrow \\mathcal{N}(n\\mu, n\\sigma^{2})$ and $\\frac{S-n\\mu}{\\sqrt{n\\sigma^{2}}} \\rightarrow \\mathcal{N}(0,1)$\n",
11+
"\n",
12+
"Equivalently, let $M=\\frac{1}{n}\\sum_{k=1}^n X_{i}$, we have\n",
13+
"$M \\rightarrow \\mathcal{N}(\\mu,\\sqrt{\\frac{\\sigma^2}{n}})$ and $\\frac{M-\\mu}{\\sqrt{\\frac{\\sigma^2}{n}}} \\rightarrow \\mathcal{N}(0,1)$"
14+
]
15+
},
16+
{
17+
"cell_type": "markdown",
18+
"metadata": {},
19+
"source": [
20+
"## Discussions:\n",
21+
"\n",
22+
"Naturally CLT appears in questions that invovles sum or average of a large number of random variable and especially when the question only asks for an approximate answer. Here are a few quick examples."
23+
]
24+
},
25+
{
26+
"cell_type": "markdown",
27+
"metadata": {
28+
"collapsed": true
29+
},
30+
"source": [
31+
"<br>\n",
32+
"***Example1:***\n",
33+
"*Supposed we have a fair coin and we flip it 400 times. What is the probability you will see 210 heads or more.*\n",
34+
"\n",
35+
"<br>\n",
36+
"**Exact answer**\n",
37+
"\n",
38+
"Let the outcome of each coin flip be a random variable $I_{i}$. Thus we are dealing with the random variable $S=\\sum_{i=1}^{400}I_{i}$. $S$ is te sume of a series of i.i.d Bernoulie trials, thus it follows Binomial distribution. So the exact answer is: $P(S\\geq210)= \\sum_{k=210}^{400}C_{400}^{k}\\left(\\frac{1}{2}\\right)^{400}$ which requires a program to calculate (Actually try implementing this, beware of roudoff errors and compare it against the approximate answer below.).\n",
39+
"\n",
40+
"<br>\n",
41+
"**Approximation**\n",
42+
"\n",
43+
"We use CLT to easily get an approxmate answer quickly. First recognize that for each $I_{i}$ we have $\\mu=0.5$ and $\\sigma^2=0.5\\times(1-0.5)=0.25$. Then, $Z=\\frac{S-400*0.5}{\\sqrt{400*0.25}}=\\frac{S-200}{10}$ is approximately $\\mathcal{N}(0,1)$. For $S \\geq 210$, we have $Z\\geq1$. The 68-95-99.7 rule tells us that for a standardized normal distribution, the probability of the random variable taking value more than 1 standard deviation away from the center is $1-0.68=0.32$ and thus the one sided probability for $P(Z\\geq1) = 0.16$."
44+
]
45+
},
46+
{
47+
"cell_type": "markdown",
48+
"metadata": {},
49+
"source": [
50+
"<br>\n",
51+
"***Example2:***\n",
52+
"\n",
53+
"*Supposed you are going to use Monte Carlo simulation to estimate value of $\\pi$. How would you implement it? If we require an error of 0.001, how many trials/ data points do you need?*\n",
54+
"\n",
55+
"**Solution**\n",
56+
"\n",
57+
"One possible implementation is to have a rectangle, say $x \\in [-1,1], y\\in[-1,1]$. If we uniformly randomly draw a point from this rectangle, the probability of the point following into the circle region $x^2+y^2\\lt1$ is the ratio of the area between the circle and rectangle. \n",
58+
"\n",
59+
"Formally, let random indicator variable $I$ take value 1 if the point falls in the circle and 0 otherwise, then $p=P(I=1)=\\frac{\\pi}{4}$ and $E(I)=p$. If we do $n$ such trials, and define $M=\\frac{1}{n}\\sum_{k=1}^n I_{i}$, then $M$ follows approximately $\\mathcal{N}(\\mu_{I},\\frac{\\sigma_{I}^2}{n})$. In this setup, $\\mu_{I}=p=\\frac{\\pi}{4}$ and $\\sigma_{I}^2=p(1-p)$.\n",
60+
"\n",
61+
"One thing we need to clarify with the interviewer is what error really means? She might tell you just consider this it as the standard deviation of your estimated $\\pi$. Therefore the specified error translates into a required sigma of $\\sigma_{req}=\\frac{error}{4}$ for random variable $M$. Thus $n = \\frac{\\sigma_{I}^2}{\\sigma_{req}^2}$, it is about 2.7 million for our particular case.\n",
62+
"\n",
63+
"We can see that the number of trials $n$ scales with $\\frac{1}{error^2}$, which is caused by the $\\frac{1}{\\sqrt{n}}$ scaling of the $\\sigma_{M}$ in the CLT, and is generally the computation complexity in Monte Carlo integration.\n"
64+
]
65+
},
66+
{
67+
"cell_type": "markdown",
68+
"metadata": {},
69+
"source": []
70+
}
71+
],
72+
"metadata": {
73+
"kernelspec": {
74+
"display_name": "Python 2",
75+
"language": "python",
76+
"name": "python2"
77+
},
78+
"language_info": {
79+
"codemirror_mode": {
80+
"name": "ipython",
81+
"version": 2
82+
},
83+
"file_extension": ".py",
84+
"mimetype": "text/x-python",
85+
"name": "python",
86+
"nbconvert_exporter": "python",
87+
"pygments_lexer": "ipython2",
88+
"version": "2.7.13"
89+
}
90+
},
91+
"nbformat": 4,
92+
"nbformat_minor": 1
93+
}

0 commit comments

Comments
 (0)