# Logistic distribution — the “sigmoid” law on ℝ

The **logistic** distribution is a symmetric, bell-shaped continuous distribution on the real line whose CDF is the **logistic (sigmoid) function**.
It is closely tied to **log-odds (logit) transformations**, to **logistic regression** (as an error model), and it provides a simple, heavier-tailed alternative to the normal distribution.

## What you’ll learn
- how the PDF/CDF/quantile relate to the sigmoid and logit
- closed-form moments (mean/variance/skewness/kurtosis), MGF/CF, and entropy
- parameter interpretation (location $\mu$, scale $s$) and how shape changes
- **NumPy-only** sampling via inverse transform + Monte Carlo validation
- practical usage via `scipy.stats.logistic` (`pdf`, `cdf`, `rvs`, `fit`)


In [None]:
import platform

import numpy as np

import plotly.graph_objects as go
import os
import plotly.io as pio
from plotly.subplots import make_subplots

import scipy
from scipy import optimize, stats
from scipy.stats import chi2, logistic, norm

# Plotly rendering (CKC convention)
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")

# Reproducibility
rng = np.random.default_rng(7)
np.set_printoptions(precision=4, suppress=True)

print("Python", platform.python_version())
print("NumPy", np.__version__)
print("SciPy", scipy.__version__)


## 1) Title & Classification

- **Name**: `logistic`
- **Type**: **continuous** distribution
- **Support**: $x \in (-\infty, \infty)$
- **Parameter space**: location $\mu \in \mathbb{R}$ and scale $s > 0$

We write:

$$X \sim \mathrm{Logistic}(\mu, s).$$

The **standard logistic** is $\mathrm{Logistic}(0,1)$.

> SciPy uses the same location/scale form: `stats.logistic(loc=mu, scale=s)`.


## 2) Intuition & Motivation

### 2.1 What it models
The logistic distribution is a good model for **real-valued noise** that is:

- **symmetric** (centered around $\mu$)
- **unimodal** (single peak at $\mu$)
- **heavier-tailed than a normal** (but still exponentially decaying)

A practical intuition: compared to a normal distribution with the same variance, logistic puts **more probability mass in the tails**.

### 2.2 Typical real-world use cases
- **Latent-variable view of logistic regression**: if a latent score is perturbed by *logistic* noise and thresholded, the resulting class probability is a sigmoid.
- **Log-odds modeling**: if $P \in (0,1)$ is a random probability, then $\log\!\left(\frac{P}{1-P}\right)$ lives on $\mathbb{R}$; logistic is a natural simple choice for such log-odds.
- **Convenient alternative to a normal**: similar bell shape, simple CDF/quantile.
- **Mixture models / generative models**: mixtures of logistics are used to model complex continuous densities (notably in some neural image models).

### 2.3 Relations to other distributions
- **Uniform ↔ logistic (logit link)**: if $U\sim\mathrm{Unif}(0,1)$, then
  $$\log\!\left(\frac{U}{1-U}\right) \sim \mathrm{Logistic}(0,1).$$
  Conversely, if $X\sim\mathrm{Logistic}(\mu,s)$ then $F(X)$ is Uniform$(0,1)$.

- **Gumbel difference**: if $G_1, G_2$ are i.i.d. Gumbel with the same scale, then $G_1 - G_2$ is logistic.

- **Normal approximation**: matching variances gives
  $$\mathrm{Logistic}(0, s) \approx \mathcal{N}(0,1) \quad\text{when}\quad s=\sqrt{3}/\pi\approx 0.5513.$$

- **Log-logistic**: if $X\sim\mathrm{Logistic}(\mu,s)$, then $\exp(X)$ is log-logistic.


## 3) Formal Definition

Let

$$z = \frac{x-\mu}{s}.$$

### 3.1 PDF
Different equivalent forms are useful:

$$f(x\mid\mu,s) = \frac{e^{-z}}{s\,(1+e^{-z})^2}
= \frac{1}{s}\,\sigma(z)\bigl(1-\sigma(z)\bigr)
= \frac{1}{4s}\,\operatorname{sech}^2\!\left(\frac{z}{2}\right),$$

where $\sigma(z)=\frac{1}{1+e^{-z}}$.

### 3.2 CDF

$$F(x\mid\mu,s) = \sigma\!\left(\frac{x-\mu}{s}\right)=\frac{1}{1+e^{-z}}.$$

### 3.3 Quantile function (inverse CDF)
For $p\in(0,1)$:

$$F^{-1}(p) = \mu + s\,\log\!\left(\frac{p}{1-p}\right).$$

This closed-form inverse CDF makes **inverse transform sampling** especially simple.


In [None]:
def sigmoid(z: np.ndarray) -> np.ndarray:
    # Stable logistic function σ(z) = 1 / (1 + exp(-z)).

    z = np.asarray(z, dtype=float)
    out = np.empty_like(z)

    pos = z >= 0
    out[pos] = 1.0 / (1.0 + np.exp(-z[pos]))

    ez = np.exp(z[~pos])
    out[~pos] = ez / (1.0 + ez)

    return out


def logistic_cdf(x: np.ndarray, mu: float = 0.0, s: float = 1.0) -> np.ndarray:
    if s <= 0:
        raise ValueError("scale s must be > 0")
    z = (np.asarray(x, dtype=float) - mu) / s
    return sigmoid(z)


def logistic_pdf(x: np.ndarray, mu: float = 0.0, s: float = 1.0) -> np.ndarray:
    if s <= 0:
        raise ValueError("scale s must be > 0")
    z = (np.asarray(x, dtype=float) - mu) / s
    p = sigmoid(z)
    return (p * (1.0 - p)) / s


def logistic_logpdf(x: np.ndarray, mu: float = 0.0, s: float = 1.0) -> np.ndarray:
    # Stable log-PDF using logaddexp:
    # log f(x) = -log s - z - 2 log(1 + exp(-z)), where z=(x-mu)/s.

    if s <= 0:
        raise ValueError("scale s must be > 0")
    z = (np.asarray(x, dtype=float) - mu) / s
    return -np.log(s) - z - 2.0 * np.logaddexp(0.0, -z)


def logistic_ppf(p: np.ndarray, mu: float = 0.0, s: float = 1.0, eps: float = 1e-12) -> np.ndarray:
    if s <= 0:
        raise ValueError("scale s must be > 0")
    p = np.asarray(p, dtype=float)
    p = np.clip(p, eps, 1.0 - eps)
    return mu + s * (np.log(p) - np.log1p(-p))


def logistic_rvs(
    rng: np.random.Generator,
    size: int | tuple[int, ...],
    mu: float = 0.0,
    s: float = 1.0,
) -> np.ndarray:
    # NumPy-only sampling via inverse CDF.

    u = rng.random(size=size)
    return logistic_ppf(u, mu=mu, s=s)


def logistic_moments(mu: float = 0.0, s: float = 1.0) -> dict:
    if s <= 0:
        raise ValueError("scale s must be > 0")

    mean = mu
    var = (np.pi * s) ** 2 / 3.0

    return {
        "mean": mean,
        "variance": var,
        "skewness": 0.0,
        "kurtosis": 4.2,  # non-excess
        "excess_kurtosis": 6.0 / 5.0,
        "median": mu,
        "mode": mu,
    }


def logistic_entropy(s: float = 1.0) -> float:
    if s <= 0:
        raise ValueError("scale s must be > 0")
    return float(np.log(s) + 2.0)


def logistic_mgf(t: np.ndarray, mu: float = 0.0, s: float = 1.0) -> np.ndarray:
    # MGF M_X(t) = E[e^{tX}] for |t| < 1/s.

    if s <= 0:
        raise ValueError("scale s must be > 0")

    t = np.asarray(t, dtype=float)
    x = np.pi * s * t

    out = np.full_like(t, np.nan, dtype=float)
    ok = np.abs(t) < (1.0 / s)

    ratio = np.empty_like(x)
    small = np.abs(x) < 1e-4
    ratio[small] = 1.0 + (x[small] ** 2) / 6.0 + 7.0 * (x[small] ** 4) / 360.0
    ratio[~small] = x[~small] / np.sin(x[~small])

    out[ok] = np.exp(mu * t[ok]) * ratio[ok]
    return out


def logistic_cf(t: np.ndarray, mu: float = 0.0, s: float = 1.0) -> np.ndarray:
    # Characteristic function φ_X(t) = E[e^{itX}] for real t.

    if s <= 0:
        raise ValueError("scale s must be > 0")

    t = np.asarray(t, dtype=float)
    x = np.pi * s * t

    ratio = np.empty_like(x)
    small = np.abs(x) < 1e-4
    ratio[small] = 1.0 - (x[small] ** 2) / 6.0 + 7.0 * (x[small] ** 4) / 360.0
    ratio[~small] = x[~small] / np.sinh(x[~small])

    return np.exp(1j * mu * t) * ratio


## 4) Moments & Properties

Let $X\sim\mathrm{Logistic}(\mu,s)$.

### 4.1 Mean, variance, skewness, kurtosis
- **Mean**: $\mathbb{E}[X] = \mu$.
- **Variance**: $\mathrm{Var}(X) = \dfrac{\pi^2 s^2}{3}$.
- **Skewness**: $0$ (symmetry).
- **Kurtosis**: $4.2$ (so **excess kurtosis** is $6/5=1.2$).

Also:

- **Median**: $\mu$.
- **Mode**: $\mu$.

### 4.2 MGF and characteristic function
The MGF exists only on a strip around 0 (because tails are exponential):

$$M_X(t)=\mathbb{E}[e^{tX}] = e^{\mu t}\,\frac{\pi s t}{\sin(\pi s t)},\qquad |t|<\frac{1}{s}.$$

The characteristic function exists for all real $t$:

$$\varphi_X(t)=\mathbb{E}[e^{itX}] = e^{i\mu t}\,\frac{\pi s t}{\sinh(\pi s t)}.$$

### 4.3 Differential entropy
The logistic distribution has a simple differential entropy:

$$h(X) = \ln(s) + 2.$$

### 4.4 Tail behavior
For large $|x|$, the logistic density behaves like

$$f(x) \approx \frac{1}{s}e^{-|x-\mu|/s},$$

so it has **exponential** tails (heavier than Gaussian, lighter than power-law tails).


In [None]:
# Quick numerical checks: moments + MGF (Monte Carlo)
mu0, s0 = 0.7, 1.3
n = 200_000

samples = logistic_rvs(rng, size=n, mu=mu0, s=s0)

mom = logistic_moments(mu=mu0, s=s0)
mean_mc = samples.mean()
var_mc = samples.var(ddof=0)

skew_mc = stats.skew(samples)
kurt_mc = stats.kurtosis(samples, fisher=False)  # non-excess

mom, mean_mc, var_mc, skew_mc, kurt_mc


In [None]:
# MGF check for a few t in the valid range |t| < 1/s
# (Monte Carlo estimate: mean(exp(tX)))

ts = np.array([-0.4, -0.2, 0.2, 0.4]) / s0  # safely within (-1/s, 1/s)

mgf_theory = logistic_mgf(ts, mu=mu0, s=s0)
mgf_mc = np.array([np.mean(np.exp(t * samples)) for t in ts])

np.column_stack([ts, mgf_theory, mgf_mc])


## 5) Parameter Interpretation

### 5.1 Meaning of the parameters
- **Location $\mu$** shifts the distribution left/right.
  - mean = median = mode = $\mu$

- **Scale $s$** stretches/compresses the distribution.
  - standard deviation: $\sigma = \dfrac{\pi s}{\sqrt{3}}$
  - interquartile range (IQR):
    $$\mathrm{IQR} = F^{-1}(0.75)-F^{-1}(0.25)=2s\log 3.$$

### 5.2 Shape changes
- Increasing $s$ makes the density **wider** and the peak **lower**.
- Decreasing $s$ concentrates mass more tightly around $\mu$.

Because this is a location–scale family, changing $(\mu,s)$ never changes the *fundamental* shape; it only shifts and rescales it.


In [None]:
# Useful scale relationships

def logistic_sd(s: float) -> float:
    return float(np.pi * s / np.sqrt(3.0))


def logistic_iqr(s: float) -> float:
    return float(2.0 * s * np.log(3.0))

for s in [0.5, 1.0, 2.0]:
    print(f"s={s:>4}: sd={logistic_sd(s):.4f}, IQR={logistic_iqr(s):.4f}")


## 6) Derivations

### 6.1 Expectation
A very convenient representation comes from inverse-CDF sampling.
If $U\sim\mathrm{Unif}(0,1)$ then

$$X = \mu + s\,\log\!\left(\frac{U}{1-U}\right).$$

So

$$\mathbb{E}[X]=\mu + s\,\mathbb{E}\left[\log\!\left(\frac{U}{1-U}\right)\right].$$

But the integrand is antisymmetric around $1/2$:

\begin{align}
\mathbb{E}\left[\log\!\left(\frac{U}{1-U}\right)\right]
&=\int_0^1 \log\!\left(\frac{u}{1-u}\right)\,du \\
&= -\int_0^1 \log\!\left(\frac{u}{1-u}\right)\,du \quad (u\mapsto 1-u),
\end{align}

so the integral must be $0$. Therefore $\mathbb{E}[X]=\mu$.

### 6.2 MGF and variance
Let $Z\sim\mathrm{Logistic}(0,1)$ with CDF $F(z)=\sigma(z)$. Use the substitution $u=F(z)$.
Because $du=f(z)\,dz$, we get

\begin{align}
M_Z(t)
&=\int_{-\infty}^{\infty} e^{tz} f(z)\,dz \\
&=\int_0^1 \exp\left(t\log\!\left(\frac{u}{1-u}\right)\right)\,du \\
&=\int_0^1 u^t (1-u)^{-t}\,du \\
&= B(1+t,1-t) = \Gamma(1+t)\Gamma(1-t).
\end{align}

This integral is finite only if $t\in(-1,1)$.
Using the reflection identity $\Gamma(1+t)\Gamma(1-t)=\dfrac{\pi t}{\sin(\pi t)}$, we obtain

$$M_Z(t)=\frac{\pi t}{\sin(\pi t)},\qquad |t|<1.$$

For a general location–scale transform $X=\mu+sZ$,

$$M_X(t)=e^{\mu t}M_Z(st)=e^{\mu t}\,\frac{\pi s t}{\sin(\pi s t)},\qquad |t|<\frac{1}{s}.$$

To get the variance, expand around $t=0$. Using

$$\frac{x}{\sin x} = 1 + \frac{x^2}{6} + O(x^4),$$

we get

\begin{align}
M_X(t)
&= e^{\mu t}\left(1 + \frac{(\pi s t)^2}{6} + O(t^4)\right) \\
&= 1 + \mu t + \left(\frac{\mu^2}{2} + \frac{\pi^2 s^2}{6}\right)t^2 + O(t^3).
\end{align}

So $\mathbb{E}[X]=M_X'(0)=\mu$ and $\mathbb{E}[X^2]=M_X''(0)=\mu^2+\dfrac{\pi^2 s^2}{3}$.
Therefore

$$\mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2=\frac{\pi^2 s^2}{3}.$$

### 6.3 Likelihood (iid sample)
For data $x_1,\ldots,x_n$ i.i.d. from $\mathrm{Logistic}(\mu,s)$,

$$L(\mu,s)=\prod_{i=1}^n \frac{e^{-z_i}}{s(1+e^{-z_i})^2},\qquad z_i=\frac{x_i-\mu}{s}.$$

The log-likelihood is

\begin{align}
\ell(\mu,s)
&=\sum_{i=1}^n \log f(x_i\mid\mu,s)\\
&= -n\log s - \sum_{i=1}^n z_i - 2\sum_{i=1}^n \log(1+e^{-z_i}).
\end{align}

There is no closed-form MLE in general; it is typically found by **numerical optimization**.


In [None]:
def logistic_loglik(x: np.ndarray, mu: float, s: float) -> float:
    return float(np.sum(logistic_logpdf(x, mu=mu, s=s)))


def fit_logistic_mle(x: np.ndarray, mu_init: float | None = None, s_init: float | None = None):
    x = np.asarray(x, dtype=float)

    if mu_init is None:
        mu_init = float(np.median(x))
    if s_init is None:
        s_init = float(np.std(x, ddof=0) * np.sqrt(3.0) / np.pi)
        s_init = max(s_init, 1e-3)

    def nll(theta: np.ndarray) -> float:
        mu, log_s = float(theta[0]), float(theta[1])
        s = float(np.exp(log_s))
        return -logistic_loglik(x, mu=mu, s=s)

    res = optimize.minimize(nll, x0=np.array([mu_init, np.log(s_init)]), method="BFGS")
    mu_hat, log_s_hat = res.x
    return {
        "mu_hat": float(mu_hat),
        "s_hat": float(np.exp(log_s_hat)),
        "success": bool(res.success),
        "message": res.message,
        "fun": float(res.fun),
    }


# Compare our simple MLE to SciPy's fit on simulated data
x_data = logistic_rvs(rng, size=5_000, mu=1.2, s=0.8)

ours = fit_logistic_mle(x_data)
scipy_loc, scipy_scale = stats.logistic.fit(x_data)

ours, (scipy_loc, scipy_scale)


## 7) Sampling & Simulation

### 7.1 Inverse transform sampling
Because the logistic CDF is invertible in closed form, we can sample using the inverse CDF.

If $U\sim\mathrm{Unif}(0,1)$ and $X=F^{-1}(U)$, then $X$ has CDF $F$.
For logistic:

$$X = \mu + s\,\log\!\left(\frac{U}{1-U}\right).$$

### 7.2 Practical notes
- When implementing $\log\!\left(\frac{U}{1-U}\right)$ numerically, use
  $$\log U - \log(1-U)$$
  with `log1p` for stability.
- Clip $U$ away from exactly 0 and 1 to avoid returning $\pm\infty$.

**Algorithm (vectorized)**

1. Draw $u \leftarrow \mathrm{Uniform}(0,1)$
2. Set $u \leftarrow \mathrm{clip}(u,\varepsilon, 1-\varepsilon)$
3. Return $x \leftarrow \mu + s(\log u - \log(1-u))$


In [None]:
# Sampling sanity checks
mu0, s0 = -0.5, 1.7

x = logistic_rvs(rng, size=200_000, mu=mu0, s=s0)

# 1) Mean/variance
print('mean (mc)', x.mean(), 'theory', logistic_moments(mu0, s0)['mean'])
print('var  (mc)', x.var(ddof=0), 'theory', logistic_moments(mu0, s0)['variance'])

# 2) Probability integral transform: F(X) should look Uniform(0,1)
u = logistic_cdf(x, mu=mu0, s=s0)
print('u mean', u.mean(), 'u var', u.var(ddof=0))

# Compare a few quantiles to Uniform(0,1)
qs = np.array([0.01, 0.1, 0.5, 0.9, 0.99])
print('empirical u-quantiles:', np.quantile(u, qs))
print('target quantiles     :', qs)


## 8) Visualization

We’ll visualize:
- the theoretical **PDF** and **CDF** for several parameter choices
- **Monte Carlo** samples from the NumPy-only sampler


In [None]:
# PDF/CDF for several parameter choices

params = [
    (0.0, 0.6),
    (0.0, 1.0),
    (0.0, 2.0),
    (2.0, 1.0),
]

# choose an x-range that covers all cases (0.001 to 0.999 quantiles)
lo = min(logistic_ppf(1e-3, mu=mu, s=s) for mu, s in params)
hi = max(logistic_ppf(1 - 1e-3, mu=mu, s=s) for mu, s in params)
xx = np.linspace(lo, hi, 800)

fig = make_subplots(rows=1, cols=2, subplot_titles=("PDF", "CDF"))

for mu, s in params:
    label = f"μ={mu}, s={s}"
    fig.add_trace(go.Scatter(x=xx, y=logistic_pdf(xx, mu=mu, s=s), mode="lines", name=label), row=1, col=1)
    fig.add_trace(go.Scatter(x=xx, y=logistic_cdf(xx, mu=mu, s=s), mode="lines", showlegend=False), row=1, col=2)

fig.update_xaxes(title_text="x", row=1, col=1)
fig.update_xaxes(title_text="x", row=1, col=2)
fig.update_yaxes(title_text="density", row=1, col=1)
fig.update_yaxes(title_text="probability", row=1, col=2)

fig.update_layout(title="Logistic distribution: PDF and CDF", width=950, height=420)
fig.show()


In [None]:
# Monte Carlo histogram + PDF overlay

mu0, s0 = 0.0, 1.0
samples_mc = logistic_rvs(rng, size=80_000, mu=mu0, s=s0)

x_grid = np.linspace(logistic_ppf(1e-4, mu0, s0), logistic_ppf(1 - 1e-4, mu0, s0), 900)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=samples_mc,
        nbinsx=70,
        histnorm="probability density",
        name="Monte Carlo (NumPy-only)",
        opacity=0.55,
    )
)
fig.add_trace(
    go.Scatter(
        x=x_grid,
        y=logistic_pdf(x_grid, mu=mu0, s=s0),
        mode="lines",
        name="True PDF",
        line=dict(width=3),
    )
)

fig.update_layout(title=f"Logistic(μ={mu0}, s={s0}): histogram vs PDF", width=900, height=420)
fig.show()


In [None]:
# CDF: theoretical vs empirical

x_grid = np.linspace(logistic_ppf(1e-4, mu0, s0), logistic_ppf(1 - 1e-4, mu0, s0), 700)

emp_x = np.sort(samples_mc)
emp_cdf = np.arange(1, emp_x.size + 1) / emp_x.size

fig = go.Figure()
fig.add_trace(go.Scatter(x=x_grid, y=logistic_cdf(x_grid, mu=mu0, s=s0), mode="lines", name="True CDF"))
fig.add_trace(
    go.Scatter(
        x=emp_x[::200],
        y=emp_cdf[::200],
        mode="markers",
        name="Empirical CDF (subsampled)",
        marker=dict(size=4, opacity=0.55),
    )
)

fig.update_layout(title=f"Logistic(μ={mu0}, s={s0}): CDF vs empirical", width=900, height=420)
fig.show()


## 9) SciPy Integration (`scipy.stats.logistic`)

SciPy parameterization:

```python
stats.logistic(loc=mu, scale=s)
```

- `loc` is the location parameter $\mu$.
- `scale` is the scale parameter $s>0$.

SciPy provides:
- `pdf`, `logpdf`, `cdf`, `ppf`
- `rvs` for sampling
- `fit` for MLE


In [None]:
dist = stats.logistic(loc=mu0, scale=s0)

x_test = np.linspace(-2, 2, 5)

pdf = dist.pdf(x_test)
cdf = dist.cdf(x_test)
samples_scipy = dist.rvs(size=5, random_state=rng)

pdf, cdf, samples_scipy


In [None]:
# MLE fit example
true_mu, true_s = 1.5, 0.9
x_fit = stats.logistic(loc=true_mu, scale=true_s).rvs(size=10_000, random_state=rng)

mu_hat, s_hat = stats.logistic.fit(x_fit)  # returns (loc, scale)

true_mu, true_s, mu_hat, s_hat


## 10) Statistical Use Cases

### 10.1 Hypothesis testing (location)
If you assume data are logistic with unknown $(\mu,s)$, a common hypothesis is

$$H_0: \mu = \mu_0 \quad \text{vs} \quad H_1: \mu \ne \mu_0.$$

You can use a **likelihood-ratio test (LRT)**:

$$\Lambda = 2\bigl(\ell(\hat\mu,\hat s) - \ell(\mu_0, \tilde s)\bigr) \overset{approx}{\sim} \chi^2_1,$$

where $(\hat\mu,\hat s)$ are the unrestricted MLEs and $\tilde s$ is the MLE under $H_0$.

### 10.2 Bayesian modeling
- **Error model**: logistic noise is a heavy-tailed alternative to Gaussian noise.
- **Latent-variable logistic regression**: if $Y=\mathbf{1}\{\eta+\varepsilon>0\}$ with $\varepsilon\sim\mathrm{Logistic}(0,1)$, then
  $$\Pr(Y=1\mid\eta)=\sigma(\eta).$$
  This gives the familiar logistic likelihood used in Bayesian logistic regression.

### 10.3 Generative modeling
- **Inverse-CDF sampling** makes logistic a convenient base distribution.
- **Mixtures of logistics** can model multimodal or skewed densities and appear in modern neural generative models.


In [None]:
# 10.1 Likelihood-ratio test example: H0: mu = 0

rng_test = np.random.default_rng(123)

n = 400
mu_true, s_true = 0.35, 1.0
x = logistic_rvs(rng_test, size=n, mu=mu_true, s=s_true)


def mle_unrestricted(x: np.ndarray):
    x = np.asarray(x, dtype=float)

    def nll(theta: np.ndarray) -> float:
        mu, log_s = float(theta[0]), float(theta[1])
        s = float(np.exp(log_s))
        return -logistic_loglik(x, mu=mu, s=s)

    mu_init = float(np.median(x))
    s_init = float(np.std(x, ddof=0) * np.sqrt(3.0) / np.pi)

    res = optimize.minimize(nll, x0=np.array([mu_init, np.log(max(s_init, 1e-3))]), method="BFGS")
    mu_hat, log_s_hat = res.x
    return float(mu_hat), float(np.exp(log_s_hat)), float(-res.fun)


def mle_mu_fixed(x: np.ndarray, mu0: float):
    x = np.asarray(x, dtype=float)

    def nll(log_s: np.ndarray) -> float:
        s = float(np.exp(float(log_s)))
        return -logistic_loglik(x, mu=mu0, s=s)

    s_init = float(np.std(x, ddof=0) * np.sqrt(3.0) / np.pi)
    res = optimize.minimize(nll, x0=np.array([np.log(max(s_init, 1e-3))]), method="BFGS")
    s_hat = float(np.exp(float(res.x)))
    return s_hat, float(-res.fun)


mu0 = 0.0
mu_hat, s_hat, ll1 = mle_unrestricted(x)
s_tilde, ll0 = mle_mu_fixed(x, mu0=mu0)

lrt = 2.0 * (ll1 - ll0)
p_value = 1.0 - chi2.cdf(lrt, df=1)

{
    "n": n,
    "true": (mu_true, s_true),
    "mle_unrestricted": (mu_hat, s_hat),
    "mle_H0": (mu0, s_tilde),
    "LRT": lrt,
    "p_value": p_value,
}


In [None]:
# 10.2 Bayesian example: posterior over mu with known scale (grid approximation)

x = logistic_rvs(rng, size=200, mu=0.6, s=1.0)
s_known = 1.0

# Prior: mu ~ Normal(0, 2^2)
mu_grid = np.linspace(-2.5, 2.5, 1201)
log_prior = norm(loc=0.0, scale=2.0).logpdf(mu_grid)

# Log-likelihood for each mu on the grid
log_like = np.array([logistic_loglik(x, mu=mu, s=s_known) for mu in mu_grid])
log_post_unnorm = log_prior + log_like
log_post = log_post_unnorm - np.max(log_post_unnorm)
post = np.exp(log_post)
post /= post.sum()

post_mean = float(np.sum(mu_grid * post))
post_cdf = np.cumsum(post)
ci_low = float(mu_grid[np.searchsorted(post_cdf, 0.025)])
ci_high = float(mu_grid[np.searchsorted(post_cdf, 0.975)])

(post_mean, (ci_low, ci_high))


In [None]:
# Visualize the posterior

fig = go.Figure()
fig.add_trace(go.Scatter(x=mu_grid, y=post, mode="lines", name="posterior"))
fig.add_vline(x=post_mean, line_dash="dash", line_color="black", annotation_text="posterior mean")
fig.add_vrect(x0=ci_low, x1=ci_high, fillcolor="gray", opacity=0.2, line_width=0)

fig.update_layout(
    title="Posterior over μ (known s): grid approximation",
    xaxis_title="μ",
    yaxis_title="posterior density (discrete grid)",
    width=900,
    height=420,
)
fig.show()


In [None]:
# 10.3 Generative modeling: a simple mixture of logistics

weights = np.array([0.55, 0.45])
components = [(-1.2, 0.6), (1.4, 0.9)]  # (mu, s)


def mixture_logistic_pdf(x: np.ndarray) -> np.ndarray:
    x = np.asarray(x, dtype=float)
    out = np.zeros_like(x)
    for w, (mu, s) in zip(weights, components):
        out += w * logistic_pdf(x, mu=mu, s=s)
    return out


def mixture_logistic_rvs(rng: np.random.Generator, size: int) -> np.ndarray:
    k = rng.choice(len(weights), size=size, p=weights)
    out = np.empty(size, dtype=float)
    for idx in range(len(weights)):
        mask = k == idx
        mu, s = components[idx]
        out[mask] = logistic_rvs(rng, size=int(mask.sum()), mu=mu, s=s)
    return out


mix_samples = mixture_logistic_rvs(rng, size=60_000)

x_grid = np.linspace(np.quantile(mix_samples, 0.001), np.quantile(mix_samples, 0.999), 900)

fig = go.Figure()
fig.add_trace(
    go.Histogram(
        x=mix_samples,
        nbinsx=90,
        histnorm="probability density",
        name="samples",
        opacity=0.55,
    )
)
fig.add_trace(go.Scatter(x=x_grid, y=mixture_logistic_pdf(x_grid), mode="lines", name="mixture PDF", line=dict(width=3)))

fig.update_layout(title="Mixture of logistics: histogram vs PDF", width=900, height=420)
fig.show()


## 11) Pitfalls

- **Invalid scale**: $s\le 0$ is not a valid logistic distribution.
- **Overflow in naive formulas**:
  - `np.exp(-z)` overflows if $z$ is very negative.
  - use stable forms (piecewise sigmoid, `logaddexp`, `log1p`).
- **Sampling at the boundaries**:
  - the inverse CDF uses $\log\!\left(\frac{p}{1-p}\right)$; if $p$ is exactly 0 or 1, you get $\pm\infty$.
  - clip $p$ (or the underlying uniform draws) away from {0,1}.
- **MGF domain**:
  - $M_X(t)$ exists only for $|t|<1/s$.
- **Parameterization confusion**:
  - some sources parameterize logistic by a “steepness” $k=1/s$.
  - SciPy uses `(loc, scale)`.
- **Fitting**:
  - for small samples, MLE can be noisy; prefer robust starting points (median + variance-based scale).


## 12) Summary

- `logistic` is a **continuous** distribution on $\mathbb{R}$ with CDF equal to the **sigmoid**.
- Parameters: location $\mu\in\mathbb{R}$ and scale $s>0$ (a pure shift/scale family).
- Key formulas:
  - $\mathbb{E}[X]=\mu$,
  - $\mathrm{Var}(X)=\pi^2 s^2/3$,
  - $h(X)=\ln(s)+2$,
  - $M_X(t)=e^{\mu t}\,\pi s t/\sin(\pi s t)$ for $|t|<1/s$.
- Sampling is easy via inverse CDF: $\mu+s\log\!\left(\frac{U}{1-U}\right)$.

**References**
- SciPy documentation: `scipy.stats.logistic`.
- Reflection identity: $\Gamma(z)\Gamma(1-z)=\pi/\sin(\pi z)$.
- Mixture of logistics in neural generative modeling: *PixelCNN++* (Salimans et al., 2017) uses discretized logistic mixtures.
