<a href="https://colab.research.google.com/github/tomanizer/stats_in_10_minutes/blob/master/Random_Variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# What is a Random Variable?

A *random variable* is a variable whose value depend on the outcomes of a random phenomenon.

Random phenomena can be:
1. A yet to be performed experiment. Because the experiment is in the future, its outcomes are uncertain. 
2. A past experiment where some already-existing value is uncertain, e.g. quantum uncertainty
3. The results of an *objectively random process*, e.g. rolling dice
4. *Subjective randomness* that results from incomplete knowledge, where a result could be known with certainty in theory, but where the examiner does not have sufficinet knowledge of the underlying process.

Important terms relating to random variables:
- The *domain* of a random variable is the set of possible outcomes.
- A random variable must be *measurable*, in a sense that a value can be assigned to an outcome.
- The *sample space* of the random variable is the set of possible values which the random variable can take.
- *Discrete random variables* have a countable list of values.
- *Continous random variables* take numerical values in an interval and have an uncountable range.
- In statistical texts *random variables* are often denoted as $X$.

## Creating a random variable

We can easily create a random variable from an *objectively random process* (3).

We get get Python do randomly draw a number from a known random distribution and store the result for this *random variable*.




In [20]:
from random import randint

x = randint(0,10)
x

1

We can now create a series of instances of this random variable.

In [28]:
n = 20
X = [randint(0,10) for x in range(n)]
X

[3, 2, 10, 6, 6, 6, 0, 1, 3, 3, 2, 7, 8, 5, 0, 1, 4, 3, 8, 9]

And plot the distribution of this random variable.

In [34]:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_histogram(x=X, name=f"X for {n}", histnorm="probability density")

And increase the number of samples taken for X.

In [35]:

n = 100
X = [randint(0,10) for x in range(n)]
fig.add_histogram(x=X, name=f"X for {n}", histnorm="probability density")


In [36]:
n = 1000
X = [randint(0,10) for x in range(n)]
fig.add_histogram(x=X, name=f"X for {n}", histnorm="probability density")

The true distribution of randint is a discrete uniform distribution where each number in the range has exactly the same probability to be picked.

So in this case 
$X \sim \frac{1}{n}$ where $n = 10$.

In [39]:
x = list(range(10))
y = [1/10 for i in x]
fig.add_scatter(x=x, y=y, name="True uniform distribution")

# Probability Distribution

A *probability distribution* is a mathematical function which
- takes as inputs a random variable's domain/sample space. 
- and outputs the probability for each outcome.

Example:
For a toin coss the domain is {heads, tails}, and the respective probabilites are {0.5, 0.5}.

`P({heads, tails}) = {0.5, 0.5}`

The probability distribution 
- for discrete random variables is called *probability mass function*.
- for continous random variables ias valled *probability density function*.

A *univariate distribution* is a distribution where the sample space is one-dimensional. It maps the probabiliies of a single variable to various alternative values.

Examples:
- uniform distribution
- binomial distribution
- hypergeometric distribution
- normal distribution

A *random vector* is a list of two or more random variables.

A *multivariate distribution* or *joint probability distribution* gives the probabilities of a *random vector*  taking on a combination of values.

In [0]:
§