# BIOEE 4940 : **Introduction to Quantitative Analysis in Ecology**
### ***Spring 2021***
### Instructor: **Xiangtao Xu** ( ✉️ xx286@cornell.edu)
### Teaching Assistant: **Yanqiu (Autumn) Zhou** (✉️ yz399@cornell.edu)

---

## <span style="color:royalblue">Lecture 2</span> *How to Describe Your Data: a revisit to probability and distributions*
*Partly adapted from [How to be a quantitative ecologist](https://www.researchgate.net/publication/310239832_How_to_be_a_Quantitative_Ecologist_The_'A_to_R'_of_Green_Mathematics_and_Statistics) and [All of Statistics](https://www.stat.cmu.edu/~larry/all-of-statistics/)*




### 1. Data, Process, and Probability

Statistics play a pivotal role in quantitative analysis in real world data because the observationas we collected are usually composite of a series of deterministic and stochastic processes. In statistics, probability theory helps us to undersatnd how these processes interact to generate samples/data we can see, while inversely, statistical inference helps us to parse **signal** out from **noise** and gain knowledge about the deterministic processes and the invariant properties of the stochastic processes.

<img src="./img/Probability_and_inference.png" alt="Probability and Inference" style="width: 800px;"/>

*Source: All of Statistics*


* **Discussion**: Suppose we got samples on the 15min-average flying speed of birds from gps trackers. What are possible data generating processes? How would signal and noise change with different research targets? (Image From: news.cornell.edu)

<img src="https://news.cornell.edu/sites/default/files/styles/story_thumbnail_xlarge/public/2018-01/0123_tags_0.jpg?h=ebb7e033&itok=FGuLXHhl" alt="Solar-powered tracker" style="width: 200px;"/>


#### 1.1. Definition of Probability
Probability is a mathematical language for quantifying uncertainty for data generating processes.

We need a few more concepts before a rigorous definition of property:

* *Experiment*: A repeatable data collection process that has a well-defined possible outcomes
* *Events*: Collections of experimental outcomes
* *Event/sample space*: All possible outcomes from an experiment

<img src="./img/Venn_diagram.png" alt="Venn Diagram for different relationships between events" style="width: 800px;"/>

*Source: How to be a quantitative ecologist*


* *Probability* is non-negative real number associated with each event that reflects the *frequency* (Frequentist) or *degree of beliefs* (Bayesian) of the events.
* An idealistic but intuitive frequentist definition, $p = \lim_{N \to \infty} \frac{n}{N}$ where N is the total number of trials and n is the counts of trials with the target event.

* **Example**: Sampling tree density from a tropical moist forest at Barro Colorado Island, Panama

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# first read data

df_bci = pd.read_excel('./data/bci_census.xlsx')
# a 200 m by 500 m quadrat of the whole census (10 Ha)

In [None]:
# visualize the plot

# only show trees > 50cm

df_plot = df_bci[df_bci['dbh'] >= 500.]

from matplotlib.patches import Circle
from matplotlib.collections import PatchCollection

fig, ax= plt.subplots(1,1,figsize=(3,6))

# prepare circles to represent each tree
# we use a 5 m radius circule to represent a 500mm dbh tree
# we use a 25 m radius circle to represent a 2000mm dbh tree
patches = [Circle((tree['gx'],tree['gy']),radius=(tree['dbh'] - 500.) / 1500. * 20. + 5.,
                   color='xkcd:forest green'
                  )
           for index, tree in df_plot.iterrows()]
p = PatchCollection(patches, facecolor='xkcd:forest green',alpha=0.8)
ax.add_collection(p)

ax.set_xlim((0.,200.))
ax.set_ylim((0.,500.))
ax.set_aspect('equal')

plt.show()

In [None]:
# sample the probability of finding at least 1 50cm+ tree in a 5-by-5 meter quadrat

# record 1000 sampling
sample_N = 1000

site_x = 200.
site_y = 500.
quadrat_size = 5.

# sample the upper left corner of the quadrat
# I used a legacy style, the newest numpy has some updated functions 
# check https://numpy.org/doc/stable/reference/random/index.html#module-numpy.random
np.random.RandomState() # generate a random seed
sample_x = np.random.sample(sample_N) * (site_x - quadrat_size)
sample_y = np.random.sample(sample_N) * (site_y - quadrat_size)

tree_number = [df_plot[
    (df_plot['gx'] >= x) & (df_plot['gx'] < x + quadrat_size)
   &(df_plot['gy'] >= y) & (df_plot['gy'] < y + quadrat_size)].count()['treeID'] >= 1
               for x, y in zip(sample_x,sample_y)
    ]

prob = np.cumsum(tree_number) / np.arange(1,sample_N+1)

fig, ax = plt.subplots(1,1)
ax.plot(np.arange(1,sample_N+1),prob)
ax.set_xlabel('# of trials')
ax.set_ylabel('Frequency')
plt.show()

#### 1.2 Quantitative Properties of Probabilities

* Union and Intersection

<img src="./img/probability_addition.png" alt="Probability Addition" style="width: 800px;"/>

*Source: How to be a quantitative ecologist*

$P(A \cup B) = P(A) + P(B) - P(A \cap B)$ \[commonly abbreviated as $P(AB)$\]

* Conditional Probability and Independent Events

Often, the probability that an event E2 occurs is affected by the prior occurrence of another event E1

<img src="./img/conditional_probability.png" alt="Conditional Probability" style="width: 800px;"/>
*Source: How to be a quantitative ecologist*


In this case, we define the conditional probability of E2 given E1 as
$P(E_2 | E_1) = \frac{P(E_{1}E_{2})}{P(E_1)}$

We can also derive **Bayes' law** from the definition:
$P(E_2 | E_1) = \frac{P(E_{1} | E_{2})P(E_2)}{P(E_1)}$


E1 and E2 are considered independent if $P(E_{1}E_{2}) = P(E_1)P(E_2)$. In this case $P(E_2 | E_1) = P(E_2)$

* Discussion: Field census in tropical forests requires tracking a large number of species. Someone proposes to replace the usage of a 4-character code (e.g. POTR) with a new 6-digit code (e.g. POSTRE) to avoid code conflicts for species with similar initials. This proposal receives some criticism in terms of increasing typo errors. Could you explain this criticism from a probability point of view?

* Example: Probabiliy of species and size


In [None]:
# Compare event A: finding a tree of species cecrin and event B: finding a 20+cm tree in a 5-by-5 meter quadrat
sample_N = 1000

site_x = 200.
site_y = 500.
quadrat_size = 5.

# sample the upper left corner of the quadrat
np.random.RandomState() # generate a random seed
sample_x = np.random.sample(sample_N) * (site_x - quadrat_size)
sample_y = np.random.sample(sample_N) * (site_y - quadrat_size)


event_A = [df_bci[
    (df_bci['gx'] >= x) & (df_bci['gx'] < x + quadrat_size)
   &(df_bci['gy'] >= y) & (df_bci['gy'] < y + quadrat_size)
   &(df_bci['sp'] == 'cecrin')].count()['treeID'] >= 1
               for x, y in zip(sample_x,sample_y)
    ]

event_B = [df_bci[
    (df_bci['gx'] >= x) & (df_bci['gx'] < x + quadrat_size)
   &(df_bci['gy'] >= y) & (df_bci['gy'] < y + quadrat_size)
   &(df_bci['dbh'] >= 100.)
   ].count()['treeID'] >= 1.
               for x, y in zip(sample_x,sample_y)
    ]

event_AB = [df_bci[
    (df_bci['gx'] >= x) & (df_bci['gx'] < x + quadrat_size)
   &(df_bci['gy'] >= y) & (df_bci['gy'] < y + quadrat_size)
   &(df_bci['sp'] == 'cecrin') & (df_bci['dbh'] >= 100.)].count()['treeID'] >= 1
               for x, y in zip(sample_x,sample_y)
    ]

# convert to numpy arrays for easier indexing
event_A = np.array(event_A)
event_B = np.array(event_B)
event_AB = np.array(event_AB)

In [None]:
P_A = sum(event_A) / sample_N
P_B = sum(event_B) / sample_N
P_AB = sum(event_AB) / sample_N
P_A_bar_B = sum(event_AB[event_B > 0]) / sum(event_B > 0)
P_B_bar_A = sum(event_AB[event_A > 0]) / sum(event_A > 0)
print(f'P(A) = {P_A}, P(B) = {P_B}, P(AB) = {P_AB}, P(A)*P(B)={P_A * P_B}')
print(f'P(A|B) = {P_A_bar_B}, P(AB)/P(B) = {P_AB / P_B}')
print(f'P(B|A) = {P_B_bar_A}, P(AB)/P(A) = {P_AB / P_A}')

---
### 2. Random variable and Probability distribution

In reality, we are often concerned with the observable values of the event space of an experiment, which are usually called as **random variables** or statistical variables.


<img src="./img/statistical_variables.png" alt="Classification of Statistical Variables" style="width: 800px;"/>

*Source: How to be a quantitative ecologist*


In addition, we usually seek to obtain a probability associated with every possible value across the event space, i.e. **probability distribution**.
    
* For discrete variables, we can define a *probability mass function* (PMF) as $f_X(x) = P(X=x)$ and *cumulate distribution function* (CDF) as $F_X(x) = P(X \leq x)$
* For continuous variables, it is straightforward to first define the CDF as $F_X(x) = P(X \leq x)$ and then a *probability density function* (PDF) as $f_X(x) = \frac{dF_X(x)}{dx}$. PDFs have the following key mathematical properties:
    * $CDF(x) = \int_{-\infty}^{x} PDF(x) \,dx$
    * $P(a < X \leq b) = \int_{a}^{b} PDF(x) \,dx$
    * $\int_{-\infty}^{\infty} PDF(x) \,dx = 1$
    
* Example from tree size distribution

In [None]:
fig, ax = plt.subplots(3,1,figsize=(3,6))

# plot frequency
df_bci.plot(y='dbh',kind='hist',bins=np.linspace(0,1000,100),density=False,ax=ax[0])

# get an approximation of PDF by dividing probability with bin size.
df_bci.plot(y='dbh',kind='hist',bins=np.linspace(0,1000,100),density=True,ax=ax[1])

# CDF
df_bci.plot(y='dbh',kind='hist',bins=np.linspace(0,1000,100),density=True,cumulative=True,ax=ax[2])

ax[2].set_xlabel('DBH (mm)')


#### 2.1 Descriptive statistics

For a given distribution, there are several basic metrics to describe its pattern and property:
* mean (arithmetric average, useful for spatio-temporal upscaling of quasi-linear processes)
* mode (peaks among distribution, useful to identify centrality of the distribution)
* range (max - min, some information on variability at the extreme scenario)
* percentiles (common to show in a box-whisker plot, more information on variability and the eveness of the sample distributions)

In [None]:
df_bci.describe()

In [None]:
print(df_bci['sp'].mode())
print(df_bci[df_bci['dbh'] > 200.]['sp'].mode())

In [None]:
print(df_bci['sp'].value_counts())

Additional characteristics of distributions can be more formally decrived using the concept of **expectation**. Generally, the expected value of a certain function of a random variable is defined as

$E(g(x)) = \int_{-\infty}^{\infty}g(x)PDF(x)\,dx$

In [None]:
# calculate the distribution for the # of neighbouring trees within 20 by 20 meter window of a focal tree.
# only count trees with 20cm+ dbh to reduce computation time

df_20cm = df_bci[df_bci['dbh'] >= 200.]

window_size = 20.

neighbour_num = [df_20cm[
    (df_20cm['gx'] >= tree['gx'] - window_size / 2.)
   &(df_20cm['gx'] <  tree['gx'] + window_size / 2.)
   &(df_20cm['gy'] >= tree['gy'] - window_size / 2.)
   &(df_20cm['gy'] <  tree['gy'] + window_size / 2.)
    ].count()['treeID'] for i, tree in df_20cm.iterrows()]
df_20cm['N_num'] = neighbour_num

In [None]:
N_num_diff = df_20cm['N_num'].values - np.nanmean(df_20cm['N_num'].values)
df_20cm['N_num_p1'] = N_num_diff
df_20cm['N_num_p2'] = N_num_diff ** 2
df_20cm['N_num_p3'] = N_num_diff ** 3
df_20cm['N_num_p4'] = N_num_diff ** 4

plot_names = [f'N_num_p{p}' for p in range(1,4+1)]
fig, axes = plt.subplots(2,2,figsize=(6,6))
for i, ax in enumerate(axes.ravel()):
    df_20cm.plot(y=plot_names[i],kind='hist',ax=ax,density=True)

fig.tight_layout()

The expections, produced by raising the values of the variable to integer powers are known as the **moments of the distribution**:
* First order moment -> mean, $\mu = E(X)$
* Second order moment -> variance, $Var(X) = E(X^2) - E(X)^2$
* Third order moment -> skewness (whether the distribution is symmetrical)
* Fourth order momoent -> kurtosis (how peaked a distribution is)

Useful properties of expectations:
* $E(A+B) = E(A) + E(B)$ where A, B are two random variables
* $E(kA) = kE(A)$ where k is a constant
* For independent random variables, $E(AB) = E(A)E(B)$

#### 2.2 Common theoretical distributions
There are a few extensively studied probability distributions that can come handy to model/approximate different types of randomness. In this lecture, we will learn some basic distributions, which we will revisit in future topics like regression analysis. We will also save some more complex distributions such as t-distribution, F-distribution, and chi-square distribution later when we talk about statistical inferences.

* Uniform distribution

$f(x) = \frac{1}{b-a}$

e.g. the distribution of X location of each tree as shown below.

Useful to generate numerical sampling.


In [None]:
# example, the distribution of the X-coordinate of trees in BCI is quasi-uniform
# suggeting the spatial tree distribution is relatively homogenous

df_bci[df_bci['gx'] <= 150].plot(y='gx',kind='hist')

* Bernoulli distribution

The simplest experiment that has one of only two outcomes. Hit or miss, survival or death, success or failure, ....

$f(x) = p^xq^{1-x}$ where $q = 1 - p$



* Binomial distribution

The probability of a certain number of occurrence during a series of independent Bernoulli trials

$f(x) = {n \choose x}p^xq^{n-x}$, where n is the total number of trials, p is the success rate as in the Bernoulli distribution, q = 1 - p.

For a Binomial distribution $\mu = np$, and $\sigma^2=npq$

Bernoulli and Binomial distributions are useful to model counting (occurrance) or demographic processes (birth and death).

In [None]:
# example
# randomly select 20 trees from the forest
# what is the probability of having x number
# of 10cm+ trees?

sample_N = 500
select_N = 20
tree_size = 100

result = [ sum(df_bci.sample(select_N)['dbh'] > tree_size)
          for i in range(sample_N)]

In [None]:
fig, ax = plt.subplots(1,1)

ax.hist(result,bins=range(select_N+1),align='left',density=True,
        label='observed')

ax.set_xlabel('# of 10cm+ trees')
ax.set_ylabel('PDF')

ax.set_xticks(range(0,20,2))

# compare with theoretical distribution
mean=np.nanmean(result) # average number of trees found
var=np.nanvar(result) # variance
print(f'mean (np): {mean}')
print(f'var (npq): {var}')
p = mean / select_N
print(f'p = {mean / select_N}')
from scipy.stats import binom

# define the random variable
rv = binom(select_N,p)
x = range(select_N+1)
ax.plot(x,rv.pmf(x),'r-o',label='theoretical')

ax.legend(loc='upper right')

* Poisson Distribution

Numer of occurrences in a unit of time or space

$f(x) = \frac{e^{-\lambda}\lambda^x}{x!}$, where $\lambda$ is a rate parameter.

The mean and variance of the distributions are both $\lambda$

In [None]:
# revisit tree density distribution

# tree density can be viewed as the 'rate' 
# of occurrence in space

# If you are 'walking' along the Y-direction in the forest plot
# what is the probability distribution of # of 1+cm tree within
# the 1m band of your walking path?

walking_x = 30. # a random selection

sample_Y = range(500) # maximum length of Y-direction

result = [
    df_bci[
        (df_bci['gx'] >= walking_x - 0.5)
       &(df_bci['gx'] <  walking_x + 0.5)
       &(df_bci['gy'] >= y) 
       &(df_bci['gy'] <  y + 1.)
    ].count()['treeID'] for y in sample_Y
]

In [None]:
fig, ax = plt.subplots(1,1)

ax.hist(result,bins=range(10),align='left',density=True,
        label='observed')

ax.set_xlabel('# of 5cm+ trees')
ax.set_ylabel('PDF')

# compare with theoretical distribution
mean=np.nanmean(result) # average number of trees found
var=np.nanvar(result) # variance
print(f'mean (lambda): {mean}')
print(f'var (lambda): {var}')
lambda_val = (mean + var) / 2.
from scipy.stats import poisson

# define the random variable
rv = poisson(lambda_val)
x = range(10)
ax.plot(x,rv.pmf(x),'r-o',label='theoretical')

ax.legend(loc='upper right')

* Geometric distribution, Negative Binomial distribution, Exponential distribution, and Gamma distribution

    All distributions can be interpreted as the probability of *waiting time*.

    * Geometric distribution describes the number of trials until *next* success.

    $f(x)=pq^{x-1}$ with mean as $1/p$, and variance as $q/p^2$

    * Negative binomial distribution describes the number of trials until *k* successes occur

    $f(x) = {x-1 \choose k-1}p^kq^{x-k}$ with mean value as $k/p$ and variance as $kq/p^2$

    Exponential and Gamma distribution can be interpreted as counter-parts for continuous variables
    
    * Exponential distribution describes the waiting time/distance until the next occurrence of the event.
    
    $f(x) = {\lambda}e^{-{\lambda}x}$, where $\lambda$ is the rate of occurrence. The distribution has a mean value of $1/\lambda$ and variance of $1/\lambda^2$
    
    * Gamma distribution decribes the waiting time/distance until the kth occurrence of the event.
    
    $f(x) = x^{k-1}{\lambda}^k\frac{e^{-{\lambda}x}}{\Gamma(k)}$, with a mean value of $k/\lambda$ and variance of $k/\lambda^2$. Here $\Gamma(x)$ is called the gamma function, which can be viewed as a continuous version of integer factorial.



In [None]:
# Optional after-class challenge: linking exponential and poisson distribution

# Consider the same forest 'walk' we had above, count the distance bewteen each
# encounter of a 20cm+ tree, examine its distribution and compare it with a 
# theoretical Poisson distribution

* Beta distribution

Beta distribution is a continuous distribution between 0 and 1 with two parameters, which allows for flexible fitting to any distributions. This is quite useful to set prior distribution in Bayesian analysis.

$f(x) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{(\alpha-1)}(1-x)^{\beta-1}$, where 0 < x < 1.

In [None]:
# example of the shapes of beta distributions
# note that to make the interactive plot work, you need to install ipywidgets using the following conda commands:
# conda install ipywidgets
# conda install nodejs
# jupyter labextension install @jupyter-widgets/jupyterlab-manager

# remember to update your jupyterlab to the newest version
# check https://ipywidgets.readthedocs.io/en/latest/user_install.html for more details

from scipy.stats import beta
from ipywidgets import interactive



# define a plotting fuction
def plot_beta(alpha_val,beta_val):
    fig, ax = plt.subplots(1,1)
    plot_x = np.arange(0.001,1,0.001)
    
    plot_y = beta.pdf(plot_x,alpha_val,beta_val)
    #label = f'a={alpha_val},b={beta_val}'
    ax.plot(plot_x,plot_y)
    
    ax.set_ylim([0,3])
    
    #ax.legend(loc='upper center')
    plt.show()

interactive_plot = interactive(plot_beta, 
                               alpha_val=(0.1, 5,0.1), 
                               beta_val=(0.1, 5, 0.1))

# show the interactive plot
interactive_plot

* Normal (Gaussian) distribution, the central limit theorem, and log-Normal distribution

    * Normal distribution is probably the most requently used probability distribution in quantitative analysis. It has a bell-shaped, symmetric PDF. $f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{1}{2}\frac{(\mu-x)^2}{\sigma^2}}$, where $\mu$ represents the mean and $\sigma^2$ represents the variance. Usually we call the normal distribution with a mean of zero and a variance of 1 as *standard normal distributions*

    * The **central limit theorem** is a strong and important result in statistics. It states that the *sum of many independent, identically distributed* random variables (no matter what kind of distribution they are from) is normally distributed with mean as $N\mu$ and variance as $N\sigma^2$, where N indicates the number of variables, $\mu$ and $\sigma^2$ denotes the mean and variance of each random variable. CLT is one of the cornerstones of inferential statistics and the reason why normally distribtued variables are so prevalent in nature.

    * If a random process consists of multipilcation of many independent and identically distributed random variables, we can infer from CLT that the logarithm of the process (convert product to sum) converges to a normal distribution. We can call the process as **lognormally distributed**. Like Gamma distribution, log-normal distribution is only defined for non-negative values and has a PDF as:

    $f(x)=\frac{1}{x\sqrt{2\pi\sigma^2}}e^{-\frac{(lnx-\mu)^2}{2\sigma^2}}$. Here, $\mu$ and $\sigma^2$ denotes the mean and variance of the *log-transformed* normal distribution. The mean and variance of X is

    $E(X)=e^{\mu+\frac{1}{2}\sigma^2}$ and $var(X)=(e^{\sigma^2}-1)e^{2\mu+\sigma^2}$

In [None]:
# Distribution of average DBH

# First, the distribution of DBH within the forest plot is not normally distributed
fig, ax = plt.subplots(1,1)
df_bci.plot(y='dbh',kind='hist',density=True,bins=np.arange(10,510+1,5), ax=ax)
ax.set_ylabel('Probability Density')
plt.show()
df_bci['dbh'].describe() # show mean and std

In [None]:
# now let's randomly sample N trees and calculate the average dbh
# Do this for 1000 times
sample_N = 1000

# let's try 10 trees first ...
tree_N = 10

dbh_avg = np.array([df_bci['dbh'].sample(tree_N).mean(skipna=True)
               for i in range(sample_N)])

# now let's plot the distribution of dbh_avg
mu = np.mean(dbh_avg)
std = np.std(dbh_avg)
fig, ax = plt.subplots(1,1)

ax.hist(dbh_avg,bins=np.arange(0,100,1),align='left',density=True,
        label=f'N={tree_N},mu={mu:4.2f},sqrt(N)*std={std*(tree_N**0.5):4.2f}')

# overlay a theoretical normal distribution
from scipy.stats import norm
rv = norm(loc=np.mean(dbh_avg),scale=std)
plot_x = np.arange(0,100,1)
ax.plot(plot_x,rv.pdf(plot_x),'r-o',label = 'theoretical')
ax.legend()

In [None]:
# log-normal distribution
# distribution of tree density of every 10-by-10 meter sub-quadrat
# We can view the distribution of a stochastic recruitment process

# Assume initial density is N0 for all sub-qudrat, 
# each year the net relative recruitment rate is a random variable (1 + r)
# r could be positive or negative
# After T years, N(T) = N0*(1+r1)*(1+r2)*...(1+rT)
# If N0 is constant, N(T) would be a lognormally distributed variable.

quadrat_size = 10.
quadrats = [] # record upper left corner of each quadrat
for i, x in enumerate(np.arange(0,150,quadrat_size)):
    for j, y in enumerate(np.arange(0,500,quadrat_size)):
        quadrats.append((x,y))
        
density = [ df_bci[
        (df_bci['gx'] >= q[0] )
       &(df_bci['gx'] <  q[0] + quadrat_size)
       &(df_bci['gy'] >= q[1]) 
       &(df_bci['gy'] <  q[1] + quadrat_size)
    ].count()['treeID'] / quadrat_size**2 for q in quadrats]

In [None]:
fig, ax = plt.subplots(2,1)

ax[0].hist(density,align='left',density=True,bins=40)
mu = np.mean(density)
std = np.std(density)

# overlay a theoretical normal distribution
rv = norm(loc=mu,scale=std)
plot_x = np.linspace(np.amin(density),np.amax(density),20)
ax[0].plot(plot_x,rv.pdf(plot_x),'r-o')


ax[1].hist(np.log(density),align='left',density=True,bins=40)
mu = np.mean(np.log(density))
std = np.std(np.log(density))

# overlay a theoretical normal distribution
rv = norm(loc=mu,scale=std)
plot_x = np.linspace(np.amin(np.log(density)),np.amax(np.log(density)),20)
ax[1].plot(plot_x,rv.pdf(plot_x),'r-o')


### Summary:
1. Probability links processes with observations.
2. Quantitative properties of probabilities (conditional prob. and independent events)
3. Common discrete and continuous probability distributions and their interpretations
4. Central limit theorem (sum of random processes) and log-normal distribution (product of random processes)
