# Data Science Interview Prep

Interview questions and practice problems.

---
---

## Table of Contents

* [Probability](#probability)
  * [Fundamentals](#fundamentals)
  * [Combinatorics](#combinatorics)
  * [Bayesian Inference](#bayesian-inference)
  * [Distributions](#distributions)
* [Statistics](#statistics)
  * [Population and Sample](#population-and-sample)
  * [Descriptive Statistics](#descriptive-statistics)
  * [Inferential Statistics](#inferential-statistics)
  * [Hypothesis Testing](#hypothesis-testing)
* [Regression](#regression)
  * [Linear Regression](#linear-regression)
  * [Logistic Regression](#logistic-regression)
  * [Non-Linear Regression](#non-linear-regression)
* [Data Engineering](#data-engineering)
  * [SQL](#sql)
* [Python](#python)
  * [Pandas](#pandas)
  * [NumPy](#numpy)
  * [Scikit-learn](#scikit-learn)
* [Machine Learning](#machine-learning)
  * [Predictive Modeling](#predictive-modeling)
  * [Natural Language Processing](#natural-language-processing)
  * [Neural Networks](#neural-networks)

---
---

## Resources

* [Data Science Interview Questions](https://github.com/alexeygrigorev/data-science-interviews)
* How to Ace Data Science Interviews
  * [SQL](https://towardsdatascience.com/how-to-ace-data-science-interviews-sql-b71de212e433)
  * [Statistics](https://towardsdatascience.com/how-to-ace-data-science-interviews-statistics-f3d363ad47b)
* [109 Practice Questions](https://www.springboard.com/blog/data-science-interview-questions/)
* [Over 100 DS Qs and As](https://towardsdatascience.com/over-100-data-scientist-interview-questions-and-answers-c5a66186769a)
* [100+ DS interview Qs for 2020](https://www.mygreatlearning.com/blog/most-common-data-science-interview-questions?utm_source=SlackPOS1)

---
---

## Probability

---

### Fundamentals

* Basic probability formula
* Expected values
* Frequency
* Events and complements

* What is the probability of winning a 3 out of 5 coin toss?

---

### Combinatorics

* Permutations
* Factorial operations
* Variation with/without repetition
* Combinations
  * Symmetry of combinations
  * Combinations with separate sample spaces

---

### Bayesian Inference

* Sets and events
* Sets interactions
  * Intersection
  * Union
  * Mutuall-exclusive
  * Dependence and independence
* Conditional probability formula
* The Law of Total Probability
* The Additive Rule
* The Multiplication Law
* Bayes' Law

---

### Distributions

* Discrete
  * Characteristics
  * Uniform
  * Bernoulli
  * Binomial
  * Poisson
* Continuous
  * Normal
  * Standard Normal
  * Students' T-distribution
  * Chi-squared
  * Exponential
  * Logistic

---
---

## Statistics

---

### Descriptive Statistics

* Types of Data
* Levels of Measurement
* Categorical Variables
* Numerical Variables
* Histogram
* Cross Tables and Scatter Plots
* Mean, Median, and Mode
* Skewness
* Variance
* Standard Deviation and Coefficient of Variation
* Covariance
* Correlation Coefficient

---

### Inferential Statistics

* Population and Sample
* Distributions
  * Normal Distribution
  * Standard Normal
  * Central Limit Theorem
  * Standard Error
  * Estimators and Estimates
* Confidence Intervals
  * Population Variance Known: Z-Score
  * Student's T-Distribution
  * Population Variance Unknown: T-Score
  * Margin of Error
  * Confidence Intervals: Two Means, Dependent
  * Confidence Intervals: Two Means, Independent

* What is the central limit theorem and why is it important?
* What is sampling? Explain two different sampling methods.
* What is a statistical interaction?

---

### Hypothesis Testing

* Framework for experiment design
* Null vs Alternative Hypothesis
* Rejection Region and Significance Level
* Type I and Type II Error
* Test for the Mean: Known Population Variance
* P-Value
* Test for the Mean: Unknown Population Variance
* Test for the Mean: Dependent Samples
* Test for the Mean: Independent Samples

#### Framework for Experimental Design

1. Formulate the research question
2. Identify variables: independent vs dependent
3. Generate hypothesis
4. Determine experimental design
    * How am I going to test the hypothesis?
    * What variables will be involved?
5. Develop experimental task & procedure
    * What algorithms and techniques best support the experiment design
6. Detemine data manipulation & measurements
7. Analyze results

* Explain how you'd design an experiment to determine user behavior.
* What is the difference between type I / type II errors?

---
---

## Regression

* What is regression?
  * Which models can be used to solve a regression problem?

---

### Linear Regression

* What is linear regression?
  * For the following terms, meaning and significance:
    * p-value?
    * coefficient?
    * r-squared?
  * What are the assumptions required for linear regression?

---

### Logistic Regression

---

### Non-Linear Regression

---
---

## Machine Learning

* How are supervised and unsupervised machine learning algorithms different?
* What is the bias-variance tradeoff?

---

### Model validation

* What is overfitting?
* How can a model be validated?
  * Why should a dataset be split into train, val, test?
* Explain how (k-fold) cross-validation works

---

### Classification

---

### Regularization

---

### Feature selection

---

### Natural language processing

---

### Neural networks and deep learning

* What is gradient descent and how does it work?
  * What is SGD  —  stochastic gradient descent? What’s the difference with the usual gradient descent?
* What layers make up a CNN?

---
---

## Technical / Engineering

---

### Python

* Pandas
* NumPy
* Scikit-learn
* TensorFlow/PyTorch

---

### SQL

---
---

## General / Unclassified


* What are 5 predictions you have for the next 20 years?
* What is survivorship bias and why is it important?
* How are extrapolation and interpolation different?