# Estimation and Inference, and Hypothesis Testing
## Estimation and Inference
- estimate is just going to give us an estimate of a certain parameter, such as the mean from our sample data.
- When performing statistical inference, we're trying to understand the underlying distribution of the population, including our estimates of the mean, as well as other parameters such as the standard error of the underlying properties of the population that we're sampling from

## Machine Learning and Statistical Inference
Machine learning and statistical inference are similar

(a case of computer science borrowing from a long history in statistics).

In both cases, we're using data to learn/infer qualities of a distribution that generated the data
(often termed the data -generating process).

We may care either about the whole distribution or just features (e.g. mean).

Machine learning applications that focus on understanding parameters and individual effects
involve more tools from statistical inference (some applications are focused only on results).

## Parametric versus Non-parametric. 
A statistical inference is about finding the underlying data generating process of our data, then the statistical model is going to be a set of the possible distributions or even reggressions that that data can take

parametric model is a particular type of statistical model. What differentiates a parametric model? Some of the major characteristics, or that a parametric model is constrained to a finite number of parameters, and that'll rely on some strict assumptions made about the distributions from which that data is pulled.

non-parametric models, will mean that our inference will not rely on as many assumptions, such as it will not have to rely on the data being pulled from a particular distribution, it'll be a distribution free inference.

## common distributions
- Uniform distribution
- Normal/Gaussian distribution
- Log-normal distribution => log transformation => normal distribution
- exponential curve
- poison distribution

## Bayesian and frequentist statistics
- Frequentist statistics is concerned with repeated observations to the limit
- processes may have true frequencies in their real population mean or whatever it is. But we're interested in modeling probabilities as many, many repeats of an experiment

## Frequentist vs. Bayesian: Bayesian
A Bayesian describes parameters by probability distributions.

Before seeing any data, a prior distribution (based on the experimenters' belief) is formulated.

This prior distribution is then updated after seeing data (a sample from the distribution).

After updating, the distribution is called the posterior distribution.


## Hypothesis Testing
A hypothesis is a statement about a population parameter.

We create two hypotheses:
- The null hypothesis (Ho)
- The alternative hypothesis (H1, or HA)

We decide which one to call the null depending on how the problem is set up.

A hypothesis testing procedure gives us a rule to decide:
- For which values of the test statistic do we accept Ho
- For which values of the test statistic do we reject Ho, and accept H1,

### Type 1 vs Type 2 Error

- A type I error (false-positive) occurs if an investigator rejects a null hypothesis that is actually true in the population; 
- A type II error (false-negative) occurs if the investigator fails to reject a null hypothesis that is actually false in the population.

### Hypothesis Testing Terminology

- The likelihood ratio is called a test statistic: we use it to decide whether to accept/reject Ho-
- The rejection region: is the set of values of the test statistic that lead to rejection of Ho-
- The acceptance region: is the set of values of the test statistic that lead to acceptance of Ho-
- The null distribution: is test statistic's distribution when the null is true.

Hypothesis Testing: Marketing Intervention

Testing marketing intervention effectiveness:
- For a new direct mail marketing campaign to existing customers, the null
hypothesis (Ho), suggests the campaign does not impact purchasing.
- The alternative hypothesis (H,), suggests it has an impact.

Hypothesis Testing: Product Quality/Size

Testing whether a product meets expected size threshold:
- Suppose a product is produced in various factories, with expected size S
- To confirm that the product size meets the standard within a margin of error, the
company might:
- randomly sample from each production source,
- establish Ho (product size is not significantly different from S),
- and H, (there is a significant deviation in product size),
- test whether Ho can be rejected in favor of H1, based on the observed mean
and standard deviation.

## Significance Level and P-Values

Significance Level and P-Values
- A significance level (α) is a probability threshold below which the null hypothesis
will be rejected.
- We must choose an a before computing the test statistic!
- If we don't, we might be accused of p-hacking.
- Choosing α is somewhat arbitrary, but often .01 or .05.

Important terminology:
- The p-value: smallest significance level at which the null hypothesis would be rejected.
- The confidence interval: the values of the statistic for which we accept the null.

## F Statistic

Power: Bonferroni Correction
- The Bonferroni Correction: says "choose threshold
SO that the probability of making a Type
error (assuming no effect) is 5%".

Typically choose:
- threshold = 0.05 / (# tests)
- Bonferroni Correction allows the probability of a Type I error to be controlled,
but at the cost of power.
- Effects either need to be larger or the tests need larger samples, to be detected.
- Best practice is to limit the number of comparisons done to a few well-motivated cases.

## Correlation vs Causation

### How Correlations are Important
We should be careful about changing X with the hope of changing Y.
- X and Y can be correlated for different reasons:
- X causes Y (what we want).
- Y causes X (mixing up cause-and-effect).
- X and Y are both caused by something else (confounding).
- X and Y aren't really related, we just got unlucky in the sample (spurious).

### Mixing Up Cause and Effect
1. Student test scores are positively correlated with amount of time studied.

This doesn't mean we should get students to study more by curving everyone's grades
upward (this would likely have the opposite effect!). It is more likely that studying helps
students learn material, so studying causes better performance.

2. Customer satisfaction is negatively correlated with customer service call volume.

This doesn't mean that we should remove or hide the customer service numbers,
with the hope of improving customer satisfaction.


### Confounding Variables
Examples of confounding variables:

1. The number of annual car accidents and the number of people named John are positively
correlated (both are correlated with the population size).

2. The amount of ice-cream sold and the number of drownings in a week are positively
correlated (both are positively correlated with temperature).

3. Number of factories a chip manufacturer owns and the number of chips sold are positively
correlated (but both are driven by demand from the market).

### Spurious Correlations
These are correlations that are just "coincidences" due to the particular sample,
and would probably not hold on longer samples / different samples


## Summary/Review
### Estimation and Inference

Inferential Statistics consist in learning characteristics of the population from a sample. The population characteristics are parameters, while the sample characteristics are statistics. A parametric model, uses a certain number of parameters like mean and standard deviation.

The most common way of estimating parameters in a parametric model is through maximum likelihood estimation.

Through a hypothesis test, you test for a specific value of the parameter.

Estimation represents a process of determining a population parameter based on a model fitted to the data.

The most common distribution functions are: uniform, normal, log normal, exponential, and poisson.

A frequentist approach focuses in observing man repeats of an experiment. A bayesian approach describes parameters through probability distributions.

### Hypothesis Testing
A hypothesis is a statement about a population parameter. You commonly have two hypothesis: the null hypothesis and the alternative hypothesis.

A hypothesis test gives you a rule to decide for which values of the test statistic you accept the null hypothesis and for which values you reject the null hypothesis and accept he alternative hypothesis.

A type 1 error occurs when an effect is due to chance, but we find it to be significant in the model.

A type 2 error occurs when we ascribe the effect to chance, but the effect is non-coincidental.

### Significance level and p-values
A significance level is a probability threshold below which the null hypothesis can be rejected. You must choose the significance level before computing the test statistic. It is usually .01 or .05.

A p-value is the smallest significance level at which the null hypothesis would be rejected. The confidence interval contains the values of the statistic for which we accept the null hypothesis.

Correlations are useful as effects can help predict an outcome, but correlation does not imply causation.

When making recommendations, one should take into consideration confounding variables and the fact that correlation across two variables do not imply that an increase or decrease in one of them will drive an increase or decrease of the other.

Spurious correlations happen in data. They are just coincidences given a particular data sample.