Skip to content

Commit b9520cc

Browse files
authored
Update README.md
1 parent dbaad88 commit b9520cc

File tree

1 file changed

+146
-137
lines changed

1 file changed

+146
-137
lines changed

README.md

Lines changed: 146 additions & 137 deletions
Original file line numberDiff line numberDiff line change
@@ -1,188 +1,197 @@
1-
# Quantitative Interview Preparation Guide
1+
## What is this
22

3-
### Quantitative interviews
4-
in finance industry tend to cover a diverse set of topics in math, statistics, computer science and machine learning, and thus can pose some challenges for preparation. After going through the interview process and/or worked in the industry for a while, a few of us decided to put toegther a preparation guide. We come from a technical background (advanced degrees in quantitative field and/or worked in technology industry) and hope this writing will help people with similar background make a successful transition into quantitative finance (if that is what you decided you want to do, of course).
3+
A short list of resources and topics covering the essential quantitative tools for data scientists, AI/machine learning practitioners, quant developers/researchers and those who are preparing to interview for these roles.
54

6-
Depending on the types of firm (e.g. buy-side vs sell-side), functional areas (risk modelling, portfolio optimization, trading signal generation etc), the amount of code vs. math involved (e.g. developer vs researcher), the distance to trading and the types of trading (strategy, frequency etc), there are many types of quant jobs and we encourage you to explore and understand the distinctions between them in order to better gauge your interest and fit.
5+
At a high-level we can divide things into 3 main areas:
76

8-
### In this writing
9-
we selected 7 technical areas, they range from "old" math to the industry's new favorate: machine learning. For each area we will have a number of topics. We have intentionally left out finance topics, since we believe we won't be able to do a better job than the existing classic text on them.
7+
1. Machine Learning
8+
2. Coding
9+
3. Math (calculus, linear algebra, probability, etc)
1010

11-
We strive to make sure to only include content that is pertinent and deliver it in a concise, intuitive, and self-contained fashion. We will focus on generalizable knowledge points, methods and problem solving strategies rather than exact questions. (For those interested in interview question pool please visit *link_to_other_sites* instead).
11+
Depending on the type of roles, the emphasis can be quite different. For example, AI/ML interviews might go deeper into the latest deep learning models, while quant interviews might cast a wide net on various kinds of math puzzles. Interviews for research-oriented roles might be lighter on coding problems or at least emphasize on algorithms instead of software designs or tooling.
1212

13-
This writing will use Jupyter notebooks due to its easy of displaying LaTex equations, code blocks and graphics. One notebook per topic. This README file will keep a list of topics we plan to write about. Once a topic is drafted, the plain text with be replaced with a link pointing to the [nbviewer](https://nbviewer.jupyter.org/) rendering of the notebook.
1413

15-
### Contribute
16-
If you have feedback or wish to contribute to this writing, please do feel free to open an issue or submit a pull request.
14+
## List of resources
1715

18-
---
19-
## Table of Content
16+
A minimalist list of the best/most practical ones:
2017

21-
* [Calculus](#calculus)
22-
* [Linear algebra](#linear-algebra)
23-
* [Probability](#probability)
24-
* [Statistics](#statistics)
25-
* [Programming essentials](#programming-essentials)
26-
* [Numerical methods and optimization](#numerical-methods-and-optimization)
27-
* [Machine learning](#machine-learning)
18+
![]({{ "cs229.png" | absolute_url }})
19+
![]({{ "mit6006.jpg" | absolute_url }})
20+
![]({{ "stats110.jpg" | absolute_url }})
2821

29-
Created by [gh-md-toc](https://github.com/ekalinin/github-markdown-toc.go)
22+
Machine Learning:
3023

31-
---
24+
- Course on classic ML: Andrew Ng's CS229 (there are several different versions, [the Cousera one](https://www.coursera.org/learn/machine-learning) is easily accessible. I used this [older version](https://www.youtube.com/playlist?list=PLA89DCFA6ADACE599))
25+
- Book on classic ML: Alpaydin's Intro to ML [link](https://www.amazon.com/Introduction-Machine-Learning-Adaptive-Computation/dp/026201243X/ref=la_B001KD8D4G_1_2?s=books&ie=UTF8&qid=1525554938&sr=1-2)
26+
- Course with a deep learing focus: [CS231](http://cs231n.stanford.edu/) from Stanford, lectures available on Youtube
27+
- Book on deep learning: [Deep Leanring] (https://www.deeplearningbook.org/) by Ian Goodfellow et al.
28+
- Book on deep laerning NLP: Yoav Goldberg's [Neural Network Methods for Natural Language Processing](https://www.amazon.com/Language-Processing-Synthesis-Lectures-Technologies-ebook/dp/B071FGKZMH)
29+
- Hands on exercises on deep learning: Pytorch and MXNet/Gluon are easier to pick up compared to Tensorflow. For anyone of them, you can find plenty of hands on examples online. My biased recommendation is [https://d2l.ai/](https://d2l.ai/) using MXNet/Gluon created by people at Amazon (it came from [mxnet-the-straight-dope](https://github.com/zackchase/mxnet-the-straight-dope))
30+
31+
32+
Coding:
33+
34+
- Course: MIT OCW 6006 [link](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/)
35+
- Book: Cracking the Coding Interview [link](https://www.amazon.com/Cracking-Coding-Interview-Programming-Questions/dp/098478280X)
36+
- SQL tutorial: from [Mode Analytics](https://community.modeanalytics.com/sql/)
37+
- Practice sites: [Leetcode](https://leetcode.com/), [HackerRank](https://www.hackerrank.com/)
38+
39+
40+
Math:
41+
42+
- Calculus and Linear Algebra: undergrad class would be the best, refresher notes from CS229 [link](http://cs229.stanford.edu/section/cs229-linalg.pdf)
43+
- Probability: Harvard Stats110 [link](https://projects.iq.harvard.edu/stat110/home); [book](https://www.amazon.com/Introduction-Probability-Chapman-Statistical-Science/dp/1466575573/ref=pd_lpo_sbs_14_t_2?_encoding=UTF8&psc=1&refRID=5W11QQ7WW4DFE0Q89N7V) from the same professor
44+
- Statistics: Shaum's Outline [link](https://www.amazon.com/Schaums-Outline-Statistics-5th-Outlines/dp/0071822526)
45+
- Numerical Methods and Optimization: these are two different topics really, college courses are probably the best bet. I have yet to find good online courses for them. But don't worry, most interviews won't really touch on them.
46+
47+
48+
49+
## List of topics
50+
51+
Here is a list of topics from which interview questions are often derived. The depth and trickiness of the questions certainly depend on the role and the company.
52+
53+
Under topic I try to add a few bullet points of the key things you should know.
54+
55+
### Machine learning
56+
- Models (roughly in decreasing order of frequency)
57+
- Linear regression
58+
- e.g. assumptions, multicollinearity, derive from scratch in linear algebra form
59+
- Logistic regression
60+
- be able to write out everything from scratch: from definitng a classficiation problem to the gradient updates
61+
- Decision trees/forest
62+
- e.g. how does a tree/forest grow, on a pseudocode level
63+
- Clustering algorithms
64+
- e.g. K-means, agglomerative clustering
65+
- SVM
66+
- e.g. margin-based loss objectives, how do we use support vectors, prime-dual problem
67+
- Generative vs discriminative models
68+
- e.g. Gaussian mixture, Naive Bayes
69+
- Anomaly/outlier detection algorithms (DBSCAN, LOF etc)
70+
- Matrix factorization based models
71+
- Training methods
72+
- Gradient descent, SGD and other popular variants
73+
- Understand momentum, how they work, and what are the diffrences between the popular ones (RMSProp, Adgrad, Adadelta, Adam etc)
74+
- Bonus point: when to not use momentum?
75+
- EM algorithm
76+
- Andrew's [lecture notes](http://cs229.stanford.edu/notes/cs229-notes8.pdf) are great, also see [this](https://dingran.github.io/EM/)
77+
- Gradient boosting
78+
- Learning theory / best practice (see Andrew's advice [slides](http://cs229.stanford.edu/materials/ML-advice.pdf))
79+
- Bias vs variance, regularization
80+
- Feature selection
81+
- Model validation
82+
- Model metrics
83+
- Ensemble method, boosting, bagging, bootstraping
84+
- Generic topics on deep learning
85+
- Feedforward networks
86+
- Backpropagation and computation graph
87+
- I really liked the [miniflow](https://gist.github.com/dingran/154a524003c86ecab4a949c538afa766) project Udacity developed
88+
- In addition, be absolutely familiar with doing derivatives with matrix and vectors, see [Vector, Matrix, and Tensor Derivatives](http://cs231n.stanford.edu/vecDerivs.pdf) by Erik Learned-Miller and [Backpropagation for a Linear Layer](http://cs231n.stanford.edu/handouts/linear-backprop.pdf) by Justin Johnson
89+
- CNN, RNN/LSTM/GRU
90+
- Regularization in NN, dropout, batch normalization
91+
92+
### Coding essentials
93+
The bare minimum of coding concepts you need to know well.
94+
95+
- Data structures:
96+
- array, dict, link list, tree, heap, graph, ways of representing sparse matrices
97+
- Sorting algorithms:
98+
- see [this](https://brilliant.org/wiki/sorting-algorithms/) from brilliant.org
99+
- Tree/Graph related algorithms
100+
- Traversal (BFS, DFS)
101+
- Shortest path (two sided BFS, dijkstra)
102+
- Recursion and dynamic programming
103+
104+
### Calculus
105+
106+
Just to spell things out
32107

33-
## Calculus
34108
- Derivatives
35-
- product rule, chain rule, power rule, L'Hospital's rule,
36-
- partial and total derivative (e.g. z = y*x, y= 2*x)
37-
- things worth remembering
109+
- Product rule, chain rule, power rule, L'Hospital's rule,
110+
- Partial and total derivative
111+
- Things worth remembering
38112
- common function's derivatives
39113
- limits and approximations
40-
- Applications of derivatives
41-
- monotonicity, max/min
114+
- Applications of derivatives: e.g. [this](https://math.stackexchange.com/questions/1619911/why-ex-is-always-greater-than-xe)
42115
- Integration
43-
- power rule, integration by sub, integration by part
44-
- change of coordinates
116+
- Power rule, integration by sub, integration by part
117+
- Change of coordinates
45118
- Taylor expansion
46-
- single and multiple variables
119+
- Single and multiple variables
47120
- Taylor/McLauren series for common functions
48121
- Derive Newton-Raphson
49-
- ODEs
50-
- PDEs
51-
122+
- ODEs, PDEs (common ways to solve them analytically)
52123

53-
## Linear algebra
54-
- vector and matrix multiplication
55-
- matrix operations (transpose, determinant, inverse etc)
56-
- types of matrices (symmetric, hermition, orthogonal etc)
57-
- eigenvalue and eigenvectors
58-
- matrix calculus (gradients, hessian etc)
59-
- useful theorems
60-
- matrix decomposition
61-
- applications, types of problems
62124

125+
### Linear algebra
126+
- Vector and matrix multiplication
127+
- Matrix operations (transpose, determinant, inverse etc)
128+
- Types of matrices (symmetric, Hermition, orthogonal etc) and their properties
129+
- Eigenvalue and eigenvectors
130+
- Matrix calculus (gradients, hessian etc)
131+
- Useful theorems
132+
- Matrix decomposition
133+
- Concrete applications in ML and optimization
63134

64135

65-
## Probability
66-
Probability puzzles are reported to be the hardest and most unpredictable type of interview questions. This does not have to be the case for you.
67-
Solving probability interview questions is really all about pattern recognition and then applying the correct tools/theorems.
68-
Once you recognize the underlying mechanics of a problem it is usually no more than two or three quick steps away from the answer.
69-
What this requires is a thorough and, more importantly, intuitive understanding of the key concepts, coupled with sufficient amount of practice to improve your patter recognition skills.
136+
### Probability
70137

71-
Probability problems should be fun to solve and let's begin!
138+
Solving probability interview questions is really all about pattern recognition. To do well, do plenty of exercise from [this](https://www.amazon.com/Introduction-Probability-Chapman-Statistical-Science/dp/1466575573/ref=pd_lpo_sbs_14_t_2?_encoding=UTF8&psc=1&refRID=5W11QQ7WW4DFE0Q89N7V) and [this](https://www.amazon.com/Practical-Guide-Quantitative-Finance-Interviews/dp/1438236662). This topic is particularly heavy in quant interviews and usually quite light in ML/AI/DS interviews.
72139

73-
74-
- [Basic concepts](https://nbviewer.jupyter.org/github/rd1019/quant-interview-tips/blob/master/prob/prob_concepts.ipynb)
140+
- Basic concepts
75141
- Event, outcome, random variable, probability and probability distributions
76-
77142
- Combinatorics
78143
- Permutation
79144
- Combinations
80145
- Inclusion-exclusion
81-
82146
- Conditional probability
83147
- Bayes rule
84148
- Law of total probability
85-
 
86149
- Probability Distributions
87150
- Expectation and variance equations
88151
- Discrete probability and stories
89152
- Continuous probability: uniform, gaussian, poisson
90-
91153
- Expectations, variance, and covariance
92-
- linearity of expectation
154+
- Linearity of expectation
93155
- solving problems with this theorem and symmetry
94-
- law of total expectation
95-
- covariance and correlation
96-
- independence implies zero correlation
97-
- hash collision probability
98-
156+
- Law of total expectation
157+
- Covariance and correlation
158+
- Independence implies zero correlation
159+
- Hash collision probability
99160
- Universality of Uniform distribution
100-
- proof
101-
- circle problem
102-
161+
- Proof
162+
- Circle problem
103163
- Order statistics
104-
- expectation of min and max and random variable
105-
164+
- Expectation of min and max and random variable
106165
- Graph-based solutions involving multiple random variables
107-
- breaking stick
108-
- meeting at the train station
109-
- simplex
110-
- frog jump
111-
112-
- [Approximation method: Central Limit Theorem](https://nbviewer.jupyter.org/github/rd1019/quant-interview-tips/blob/master/prob/central_limit_theorem.ipynb)
166+
- e.g. breaking sticks, meeting at the train station, frog jump (simplex)
167+
- Approximation method: Central Limit Theorem
113168
- Definition, examples (unfair coins, Monte Carlo integration)
114-
115-
- [Approximation method: Poisson Paradigm](https://nbviewer.jupyter.org/github/rd1019/quant-interview-tips/blob/master/prob/poisson_paradigm.ipynb)
169+
- [Example question](https://github.com/dingran/quant-notes/blob/master/prob/central_limit_theorem.ipynb)
170+
- Approximation method: Poisson Paradigm
116171
- Definition, examples (duplicated draw, near birthday problem)
117-
118-
119172
- Poisson count/time duality
120-
- poisson from poissons
121-
173+
- Poisson from poissons
122174
- Markov chain tricks
123-
- various games
124-
- introduction of martingale
125-
175+
- Various games, introduction of martingale
126176

127-
## Statistics
128-
- z score, t-test, F-test, chi2 test
129-
- p-value
130-
- sampling
177+
### Statistics
178+
- Z-score, p-value
179+
- t-test, F-test, Chi2 test (know when to use which)
180+
- Sampling methods
131181
- AIC, BIC
132182

133-
134-
## Programming essentials
135-
The bare minimum of coding concept you need to know well.
136-
137-
Material on these topics are widely available elsewhere, so we will just cite them here.
138-
139-
- Data structures:
140-
- array, dict, link list, tree, heap, graph, ways of representing sparse matrix
141-
142-
- Sorting:
143-
- brilliant.org
144-
145-
- Tree/Graph related algorithms
146-
- traversal (BFS, DFS)
147-
- shortest path (two sided BFS, djikstra)
148-
- Recuision, iteration and DP
149-
150-
## Numerical methods and optimization
151-
- computer errors (e.g. float)
152-
- root finding (newton method, bisection, secant etc)
153-
- interpolating
154-
- numerical integration and difference
155-
- finite difference
156-
- numerical method in linear algebra
157-
- solving linear equations
158-
- matrix decompositions
159-
- eigen problems
160-
- special cases
161-
162-
## Machine learning
163-
- Models: See [these blog series](http://dshacker.blogspot.com/2015/07/machine-learning-and-statistics-unified.html) for reference
164-
- linear regression
165-
- logistic regression
166-
- tree methods
167-
- SVM
168-
- generative vs descriptive models
169-
- Gaussian mixture, Naive bayes
170-
- Optimization algorithms
171-
- gradient descent, sgd and other popular variants (RMSProp, Adagrad, Adam etc)
172-
- EM algorithm
173-
- Learning theory
174-
- bias vs variance
175-
- feature selection
176-
- model validation
177-
- model metrics
178-
- ensemble method, boosting, bagging
179-
- Deep learning topics
180-
- feedforward networks
181-
- backprop
182-
- cnn
183-
- rnn/lstm/gru
184-
- dropout, batchnorm
185-
183+
### [Optional] Numerical methods and optimization
184+
- Computer errors (e.g. float)
185+
- Root finding (newton method, bisection, secant etc)
186+
- Interpolating
187+
- Numerical integration and difference
188+
- Numerical linear algebra
189+
- Solving linear equations, direct methods (understand complexities here) and iterative methods (e.g. conjugate gradient)
190+
- Matrix decompositions/transformations (e.g. QR, LU, SVD etc)
191+
- Eigenvalue (e.g. power iteration, Arnoldi/Lanczos etc)
192+
- ODE solvers (explicit, implicit)
193+
- Finite-difference method, finite-element method
194+
- Optimization topics: TBA
186195

187196

188197

0 commit comments

Comments
 (0)