## Lognormal Distribution

The lognormal distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. If X is a random variable with a normal distribution, then Y = exp(X) has a lognormal distribution.

### Probability Density Function (PDF)

For a lognormal distribution with parameters μ and σ, the probability density function is given by:

$$f(x; μ, σ) = \frac{1}{x σ \sqrt{2π}} \exp\left(-\frac{(\ln x - μ)^2}{2σ^2}\right)$$

where:
- x > 0 is the random variable
- μ is the mean of the natural logarithm of the variable
- σ is the standard deviation of the natural logarithm of the variable

### Cumulative Distribution Function (CDF)

The cumulative distribution function is:

$$F(x; μ, σ) = \frac{1}{2} + \frac{1}{2}\text{erf}\left(\frac{\ln x - μ}{σ\sqrt{2}}\right)$$

where erf is the error function.

### Properties

1. Mean: $E[X] = \exp(μ + \frac{σ^2}{2})$
2. Median: $\text{Median}[X] = \exp(μ)$
3. Variance: $\text{Var}[X] = [\exp(σ^2) - 1]\exp(2μ + σ^2)$

The lognormal distribution is often used to model variables that are the product of many independent, identically-distributed variables, due to the central limit theorem.

### Partial Expectation of Lognormal Distribution

For a lognormal distribution, the partial expectation E[X|X<x] represents the expected value of X given that X is less than some value x. This can be calculated using the following formula:

$$E[X|X<x] = \int_0^x t \cdot f(t) dt$$

where f(t) is the probability density function (PDF) of the lognormal distribution.

For a lognormal distribution with parameters μ and σ, this integral can be expressed in terms of the cumulative distribution function of the standard normal distribution, Φ:

$$E[X|X<x] = e^{μ + \frac{σ^2}{2}} \cdot Φ\left(\frac{\ln x - μ - σ^2}{σ}\right)$$

where:
- x is the upper bound of integration
- μ is the mean of the natural logarithm of the variable
- σ is the standard deviation of the natural logarithm of the variable
- Φ is the cumulative distribution function of the standard normal distribution

The function E[X|X<x] starts at 0 when x = 0 and approaches the mean of the lognormal distribution, $e^{μ + \frac{σ^2}{2}}$, as x approaches infinity.

This partial expectation is useful in various applications, such as in finance for calculating expected shortfall, or in reliability engineering for determining expected lifetime under certain conditions.

In [41]:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy.stats import lognorm
from scipy.integrate import cumtrapz

x_range = [0, 5]
y_range = [0, 2]

# Generate x values
x = np.linspace(*x_range, 1000)

# Create the figure
fig = make_subplots(rows=1, cols=1)

# Generate sigma values on a log scale
sigmas = np.logspace(np.log10(0.25), np.log10(2), 20)

# Create lognormal distribution traces for different sigmas
for sigma in sigmas:
    # PDF
    pdf = lognorm.pdf(x, s=sigma, scale=np.exp(0))
    pdf_trace = go.Scatter(x=x, y=pdf, mode='lines', name='PDF', visible=False, line=dict(color='blue'))
    fig.add_trace(pdf_trace)

    # CDF
    cdf = lognorm.cdf(x, s=sigma, scale=np.exp(0))
    cdf_trace = go.Scatter(x=x, y=cdf, mode='lines', name='CDF', visible=False, line=dict(color='red'))
    fig.add_trace(cdf_trace)

    # First moment (Expected value function)
    moment_function = cumtrapz(x * pdf, x, initial=0)
    moment_trace = go.Scatter(x=x, y=moment_function, mode='lines', name='E[X]', visible=False, line=dict(color='green'))
    fig.add_trace(moment_trace)

# Make the first set of traces visible
fig.data[0].visible = True
fig.data[1].visible = True
fig.data[2].visible = True

# Create slider
steps = []
for i in range(len(sigmas)):
    step = dict(
        method="update",
        args=[
            {"visible": [False] * (3 * len(sigmas))},
            {"title": f"Lognormal Distribution"}
        ],
        label=f"{sigmas[i]:.2f}"
    )
    step["args"][0]["visible"][3*i:3*i+3] = [True, True, True]  # Make the i-th set of traces visible
    steps.append(step)

sliders = [dict(
    active=0,
    currentvalue={"prefix": "σ: "},
    pad={"t": 50},
    steps=steps
)]

# Update layout
fig.update_layout(
    sliders=sliders,
    title="Lognormal Distribution",
    xaxis_title='x',
    yaxis_title='Probability / Cumulative Value',
    showlegend=True,
    legend=dict(orientation="h", yanchor="bottom", y=1.02, xanchor="right", x=1),
    xaxis=dict(range=x_range),
    yaxis=dict(range=y_range)
)

# Show the plot
fig.show()

# Density-Cumulative Equilibrium Point

In [37]:
import numpy as np
from scipy.stats import gaussian_kde
from scipy.interpolate import interp1d
from scipy.optimize import brentq

def find_pdf_ex_intersection(volumes, bw_method: float = 0.1, use_scaled: bool = True):
    """
    Find the intersection point of the PDF and E[X], either scaled or original.

    Parameters:
    volumes (array-like): The input volume data.
    use_scaled (bool): If True, use scaled functions. If False, use original functions.

    Returns:
    tuple: (x, y) coordinates of the intersection point.
    """
    # Sort volumes and calculate E[X]
    sorted_volumes = np.sort(volumes)
    ex = sorted_volumes.cumsum()
    
    if use_scaled:
        ex_scaled = ex / ex[-1]
    else:
        ex_scaled = ex

    # Create interpolated E[X] function
    epsilon = 1e-10
    sorted_volumes += epsilon  # To avoid NaNs
    ex_interp = interp1d(sorted_volumes, ex_scaled, kind='linear', fill_value='extrapolate')

    # Create PDF using KDE
    kde = gaussian_kde(volumes, bw_method=bw_method)
    
    if use_scaled:
        max_kde = kde(volumes).max()
        pdf = lambda x: kde(x) / max_kde
    else:
        pdf = kde

    # Function to find the root of (PDF - E[X])
    root_func = lambda x: pdf(x) - ex_interp(x)

    # Find the intersection using Brent's method
    intersection_x = brentq(root_func, sorted_volumes[0], sorted_volumes[-1])

    return intersection_x

# Example usage:
# volumes = your_volume_data
# x_scaled, y_scaled = find_pdf_ex_intersection(volumes, use_scaled=True)
# x_original, y_original = find_pdf_ex_intersection(volumes, use_scaled=False)
# print(f"Scaled intersection point: x = {x_scaled:.4f}, y = {y_scaled:.4f}")
# print(f"Original intersection point: x = {x_original:.4f}, y = {y_original:.4f}")

### Generate random distribution

In [46]:
import numpy as np
from scipy.stats import lognorm

def generate_volumes(n_samples=200, sigma=0.5, target_median=0.025):
    """
    Generate a random distribution of volumes mainly in [0, 0.05) using a lognormal distribution.
    
    :param n_samples: Number of samples to generate
    :param sigma: Shape parameter of the lognormal distribution
    :param target_median: Target median of the distribution
    :return: Array of generated volumes
    """
    # Calculate mu to achieve the target median
    mu = np.log(target_median)
    
    # Generate samples
    volumes = lognorm.rvs(s=sigma, scale=np.exp(mu), size=n_samples)
    
    return volumes

# Generate volumes
sigma = 1.3
volumes = generate_volumes(n_samples=200, sigma=sigma)

print(f"Sigma {sigma}")
print(f"Generated {len(volumes)} volumes")
print(f"Min: {volumes.min():.6f}, Max: {volumes.max():.6f}")
print(f"Mean: {volumes.mean():.6f}, Median: {np.median(volumes):.6f}")

Sigma 1.3
Generated 200 volumes
Min: 0.000991, Max: 0.679946
Mean: 0.053672, Median: 0.023860


### Fit lognormal law

In [47]:
from scipy.stats import lognorm

def fit_lognormal(data):
    """
    Fit a lognormal distribution to the given data.

    The location parameter is fixed at 0 (floc=0), which constrains the
    distribution to start at 0. This is often appropriate for data that cannot
    be negative, such as volumes.
    
    Note:
    - shape: Also known as the log-scale parameter (sigma)
    - location: Fixed at 0 in this case
    - scale: Related to the median of the distribution
    """
    shape, loc, scale = lognorm.fit(data, floc=0)
    return shape, loc, scale

# Fit lognormal distribution to volumes
shape, loc, scale = fit_lognormal(volumes)

print(f"Fitted lognormal distribution:")
print(f"Shape (sigma): {shape:.6f}")
print(f"Location: {loc:.6f}")
print(f"Scale: {scale:.6f}")

Fitted lognormal distribution:
Shape (sigma): 1.212426
Location: 0.000000
Scale: 0.024839


### Plot histogram, PDF, CDF, and E[X]

In [48]:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy.stats import lognorm
from scipy.integrate import cumtrapz


def plot_normalized_distribution(data, shape, loc, scale, x_intersect):

    """
    Plot histogram, PDF, CDF, and E[X] for the given data and fitted lognormal distribution.
    All curves are scaled to match the histogram height. Also includes Q3 line and intersection point.
    
    :param data: Array of data
    :param shape: Shape parameter of the fitted lognormal distribution
    :param loc: Location parameter of the fitted lognormal distribution
    :param scale: Scale parameter of the fitted lognormal distribution
    :param x_intersect: x-coordinate of the intersection point
    """
    # Create figure
    fig = make_subplots(rows=1, cols=1)

    # Generate x values for plotting
    x = np.linspace(0, max(data), 1000)

    # Calculate PDF, CDF, and E[X]
    pdf = lognorm.pdf(x, shape, loc, scale)
    cdf = lognorm.cdf(x, shape, loc, scale)
    ex = cumtrapz(x * pdf, x, initial=0)

    # Calculate histogram
    hist, bin_edges = np.histogram(data, bins='auto', density=True)
    bin_centers = (bin_edges[:-1] + bin_edges[1:]) / 2

    # Scale CDF and E[X] to match histogram height
    max_hist_height = np.max(hist)
    cdf_scaled = cdf * max_hist_height
    ex_scaled = ex * (max_hist_height / ex[-1])

    # Calculate intersection point
    y_intersect = lognorm.pdf(x_intersect, shape, loc, scale)
    
    # Calculate ranges for CDF and E[X]
    cdf_range = f"[{cdf[0]:.2f}, {cdf[-1]:.2f}]"
    ex_range = f"[{ex[0]:.2f}, {ex[-1]:.2f}]"
    intersect_coords = f"({x_intersect:.3f}, {y_intersect:.3f})"

    # Add histogram
    fig.add_trace(go.Bar(x=bin_centers, y=hist, name='Histogram', opacity=0.7))

    # Add PDF
    fig.add_trace(go.Scatter(x=x, y=pdf, mode='lines', name='PDF', line=dict(color='red')))

    # Add scaled CDF
    fig.add_trace(go.Scatter(x=x, y=cdf_scaled, mode='lines', name=f'CDF (scaled)<br>{cdf_range}', line=dict(color='green')))

    # Add scaled E[X]
    fig.add_trace(go.Scatter(x=x, y=ex_scaled, mode='lines', name=f'E[X] (scaled)<br>{ex_range}', line=dict(color='purple')))

    # Calculate Q3 and add vertical line
    q3 = np.percentile(data, 75)
    fig.add_vline(x=q3, line_dash="dash", line_color="goldenrod", line_width=2, name='Q3', annotation_text="Q3", annotation_position="top right")

    # Add intersection point
    fig.add_trace(go.Scatter(
        x=[x_intersect], y=[y_intersect], mode='markers', 
        marker=dict(size=10, color='black', symbol='star'),
        name=f'Intersection<br>{intersect_coords}'
    ))

    # Update layout
    fig.update_layout(
        title=f"Lognormal Distribution (σ = {shape:.4f})",
        xaxis_title='Volume',
        yaxis_title='Density',
        legend_title='Function',
        hovermode='x unified',
    )

    # Show the plot
    fig.show()

# Search for PDF and E[X] intersection
x_intersect = find_pdf_ex_intersection(volumes)
print(f"Intersection point: x = {x_intersect:.4f}")

# Plot the distribution
plot_normalized_distribution(volumes, shape, loc, scale, x_intersect)

Intersection point: x = 0.0520


Depending on the distribution of volumes the Q3, the expected intersection point and the true intersection point can be more or less overlapping.