# Overview
There are many types of classifications of funnctions. In this notebook we define the various function types encountered in other notebooks

# One-to-one (injective) functions
A function is said to be "one to one" (aka. injective) if every element in the domain maps to a single unique element in the co-domain.

If we think of the function $f$ that maps elements of the domain $x$ to the co-domain (range) $y$ we have $f:x \rightarrow y$. If for every $x_i$ in $x$ the function $f$ maps to a single $y_i$ which is unique for every $x_i$ then the function is one-to-one.

If multiple $x_i$ map to the same $y_i$ then the function is not injective.

For more information see the [wikipedia page](https://en.wikipedia.org/wiki/Injective_function)

# Onto (surjective) functions

A function is onto (aka. surjective) if every element of the co-domain is mapped to by an element in the domain. 

If we think of the function $f$ that maps elements of the domain $x$ to the co-domain (range) $y$ we have $f:x \rightarrow y$.

If there is an element in the co-domain that is not mapped to by $f$ (ie. there is no $x$ value that will produce the value $y_*$, the function is not onto.

# One to one and onto (bijective) functions
A function is one to one and onto (aka. bijective) if it is both injective and surjective.

## Applications

### Change of variable

A bijective function allows for a change of variable. By definition if $f:x \rightarrow y$ then $f^{-1}: y \rightarrow x$. As such, knowing $f$ or $f^{-1}$ allows us to change a variable from $x$ to $y$ or vice versa.

# Diffeomorphism

A function $f$ is a diffeomorphism (ie. performs a change of variables) if it, and its inverse $f^{-1}$ are differentiable and bijective.

A linear combination is a transformation that exhibits the characteristics of a diffeomorphism. For example: $y = 2x+1$.

## Applications

### Change of variable in probability spaces

The bijective function is very useful. For example in probability spaces, the probability density and cumulative density functions are bijective. Not only can we perform a change of variable with these types of functions, but we can also derive the probability density and cumulative distribution functions when the random variable is defined as a diffeomorphism.

For example, in joint probability we often deal with unknown variables which are defined as linear combinations of other known variables which follow known distributions. We can exploit the properties of the bijective function and probabilistic relationships to derive missing information about the unknown variables.

Assume $X$ follows a known distribution and $Y$ is defined as a bijective function $f_b$ of $X$. We can derive the distribution function $f_Y$ for the variable $Y$ as based on our knowledge of $f_X$ and the bijective relationship between $X$ and $Y$

$$ Y = f_b(X)$$

#### Proof:
Let the support of $X$ be the set of $x_i$ in $X$ which have a probability greater than zero:

$$ sup(X) = X_S = \{x_i \in X | p(x) > 0\} $$

Given the definition $Y = f_b(X)$, we then have a corresponding set in the co-domain:

$$ Y_S = f_b(X_S) $$

##### Calculate the CDF
We calculate the cdf:

$$ F_Y(y) = \mathbb{P}(Y \le y)$$

$$= \mathbb{P}(f(X) \le y)$$

$$= \mathbb{P} \left( f^{-1}(f(X)) \le f^{-1}(y) \right)$$

$$= \mathbb{P}(X \le x) $$

We previously performed a change of variable. The original equation is in terms of $Y$ but we now have an equation in terms of $X$.

$$= \int f_X(x) \ \ dx $$

For a more rigorous treatment of this topic see [Penn State's STAT-414 Lession 22.2](https://online.stat.psu.edu/stat414/lesson/22/22.2) or the [wikipedia article](https://en.wikipedia.org/wiki/Integration_by_substitution#Substitution_for_multiple_variables).