# Deriving derivative of loss class and testing markdown conversion for equation.#

In [1]:
import sympy as sym
import numpy as np

## 1. Negative binomial loss class ##

### 1.1 Logliklihood function

In [2]:
 x, mu , k = sym.symbols('x mu k')

Probability mass function (PMF) of negative binomial ${p}(x; \mu,k) = \frac{\Gamma \left(k+x\right)}{\Gamma \left(k\right)x!}(\frac{k}{k+\mu})^{k}(\frac{\mu}{k+\mu})^{x}$ 
This definition of the negative binomial distribution is often refered to as negative binomial 2. This parameterisation takes the mean (usually refered as $\mu$, but in pygom $\hat{y}$ as we are looking at a prediction) and $k$ (an overdispersion parameter). The variance = $\mu+\frac{\mu^2}{k}$, some notation uses $\alpha$, ($k=\alpha^{-1}$). 
See Bolker, B. M. (2008). Negative Binomial. In Ecological Models in R (pp. 124–126). Princeton University Press.

In [3]:
nbpmf = (sym.gamma(k+x)/(sym.gamma(k)*sym.factorial(x)))*(k/(k+mu))**k*(mu/(k+mu))**x
nbpmf

(k/(k + mu))**k*(mu/(k + mu))**x*gamma(k + x)/(factorial(x)*gamma(k))

Due to the this PMF containing gamma functions and a factorial it is easier to calculate the sum of it's logged terms than to log it as one object (you end up with infinities otherwise).   

In [4]:
nbpmf.args

((k/(k + mu))**k, (mu/(k + mu))**x, 1/factorial(x), 1/gamma(k), gamma(k + x))

In [5]:
logpmf_p1= k*(sym.ln(k)-sym.ln(k+mu))
logpmf_p2= x*(sym.ln(mu)-sym.ln(k+mu))
logpmf_p3= -sym.ln(sym.factorial(x))
logpmf_p4= -sym.ln(sym.gamma(k))
logpmf_p5= sym.gamma(k+x)
logpmf = logpmf_p1+logpmf_p2+logpmf_p3+logpmf_p4+logpmf_p5
logpmf

k*(log(k) - log(k + mu)) + x*(log(mu) - log(k + mu)) - log(factorial(x)) - log(gamma(k)) + gamma(k + x)

In [6]:
logpmf.args

(-log(factorial(x)),
 -log(gamma(k)),
 k*(log(k) - log(k + mu)),
 x*(log(mu) - log(k + mu)),
 gamma(k + x))

In [7]:
from scipy.special import gammaln
def nb2logpmf(x, mu,k):
    '''
    The log probability mass function (pmf) of Negative Binomial 2 distribution. 

    Parameters
    ----------
    x: array like observation.
    mu: mean or prediction.
    k: overdispersion parameter (variance = mean(1+mean/k)). Note some notation uses $\alpha$, ($k=\alpha^{-1}$).
    See Bolker, B. M. (2008). Negative Binomial. In Ecological Models in R (pp. 124–126). Princeton University Press.

    Returns
    -------
    log pmf:
    math:`\\mathcal\\ln({p}(x; \\mu,k)) = \\ln(\\frac{\\Gamma \\left(k+x\\right)}{\\Gamma \\left(k\\right)x!}(\\frac{k}{k+\\mu})^{k}(\\frac{\\mu}{k+\\mu})^{x})`

    '''
    # note that we input k the overdispersion parameter here


    logpmf_p1= -gammaln(x+1) 
    logpmf_p2= -gammaln(k)
    logpmf_p3= k*(np.log(k) - np.log(k + mu)) 
    logpmf_p4= x*(np.log(mu) - np.log(k + mu))
    logpmf_p5= gammaln(k+x)
    logpmf = logpmf_p1+logpmf_p2+logpmf_p3+logpmf_p4+logpmf_p5
    return logpmf

Our loss function is the negative of the logliklihood above.

In [8]:
negloglikli=-logpmf

1st derivative of -Loglikelihood of negative binomial loss with respect to $\mu$.

In [9]:
nbfirstderv= sym.diff(negloglikli,mu).simplify()
nbfirstderv

k*(mu - x)/(mu*(k + mu))

1st derivative of -Loglikelihood of negative binomial loss with respect to yhat: 
$\frac{k(\mu-y)}{\mu(k + \mu)} $

In [10]:
nbsecderv = sym.diff(nbfirstderv,mu).simplify()
nbsecderv.simplify()

-k*(-mu*(k + mu) + mu*(mu - x) + (k + mu)*(mu - x))/(mu**2*(k + mu)**2)

2nd derivative of -Loglikelihood of negative binomial loss with respect to yhat: 
$\frac{k(\mu(k + \mu) + \mu(y -\mu) + (k + \mu)(y - \mu)}{\mu^{2}(k + \mu)^{2}} $

In [11]:
nbsecderv.args

(k, mu**(-2), (k + mu)**(-2), mu*(k + mu) - mu*(mu - x) - (k + mu)*(mu - x))

# 2.  Gamma loss class in terms of mean and shape

In [12]:
 a, s, x, mu= sym.symbols('a s x mu')

Probability density function (PDF) of the gamma distribution is $\frac{1}{s^a\Gamma(a)}x^{a-1}e^{-x/s}$. However we need this in terms of mean (here $\mu$), luckily we can subistitute in $s=\frac{\mu}{a}$ to get our likelihood function. But lets start with a log tranformation of the pdf.

See Bolker, B. M. (2008). Gamma. In Ecological Models in R (pp. 131–133). Princeton University Press.

In [13]:
pdf_gamma = 1/(s**a*sym.gamma(a))*(x**(a-1)*sym.E**(-x/s))
pdf_gamma

s**(-a)*x**(a - 1)*exp(-x/s)/gamma(a)

In [14]:
pdf_gamma.args

(s**(-a), x**(a - 1), 1/gamma(a), exp(-x/s))

In [15]:
log_pdf_gamma_p1 = -a*sym.ln(s)
log_pdf_gamma_p2 = (a-1)*sym.ln(x)
log_pdf_gamma_p3 = -sym.ln(sym.gamma(a))
log_pdf_gamma_p4 = -x/s
log_pdf_gamma= log_pdf_gamma_p1+log_pdf_gamma_p2+log_pdf_gamma_p3+log_pdf_gamma_p4
log_pdf_gamma

-a*log(s) + (a - 1)*log(x) - log(gamma(a)) - x/s

In [16]:
s_in_terms_mu_a = mu/a
log_pdf_mu_a_gamma = log_pdf_gamma.subs(s,s_in_terms_mu_a) 
log_pdf_mu_a_gamma

-a*log(mu/a) - a*x/mu + (a - 1)*log(x) - log(gamma(a))

In [17]:
log_pdf_mu_a_gamma.args

(-log(gamma(a)), (a - 1)*log(x), -a*log(mu/a), -a*x/mu)

In [18]:
sym.print_latex(log_pdf_mu_a_gamma)

- a \log{\left(\frac{\mu}{a} \right)} - \frac{a x}{\mu} + \left(a - 1\right) \log{\left(x \right)} - \log{\left(\Gamma\left(a\right) \right)}


In [19]:
from scipy.special import gamma, factorial, gammaln
def gamma_mu_shape_logpdf(x, mu,shape):
    '''
    The log probability density function (pdf) of gamma distrbution in terms of mean and shape. 

    Parameters
    ----------
    x: array like observation.
    mu: mean or prediction.
    v: variance.
    See Bolker, B. M. (2008). Gamma. In Ecological Models in R (pp. 131–133). Princeton University Press.


    Returns
    -------
    log pdf, :math:`\\mathcal\\ln({p}(x; \\mu,a)) = - a \log{\left(\frac{\mu}{a} \right)} - \frac{a x}{\mu} + \left(a - 1\right) \log{\left(x \right)} - \log{\left(\Gamma\left(a\right) \right)}`
`

    '''

    logpdf_p1= -gammaln(shape)
    logpdf_p2= (shape - 1)*np.log(x)
    logpdf_p3= -shape*np.log(mu/shape)
    logpdf_p4= -shape*x/mu
    logpdf = logpdf_p1+logpdf_p2+logpdf_p3+logpdf_p4
    return logpdf

In [20]:
negloglikli_gamma_mu_a = -log_pdf_mu_a_gamma
negloglikli_gamma_mu_a

a*log(mu/a) + a*x/mu - (a - 1)*log(x) + log(gamma(a))

1st derivative of -Loglikelihood (gamma loss) with respect to $\mu$.

In [21]:
gammafirstderv= sym.diff(negloglikli_gamma_mu_a,mu)
display(gammafirstderv,gammafirstderv.simplify())

a/mu - a*x/mu**2

a*(mu - x)/mu**2

In [22]:
sym.print_latex(gammafirstderv.simplify())

\frac{a \left(\mu - x\right)}{\mu^{2}}


2nd derivative of -Loglikelihood (gamma loss) with respect to $\mu$.: 

In [23]:
gammasecderv = sym.diff(gammafirstderv,mu).simplify()
display(gammasecderv,gammasecderv.simplify())

a*(-mu + 2*x)/mu**3

a*(-mu + 2*x)/mu**3

In [24]:
sym.print_latex(gammasecderv)

\frac{a \left(- \mu + 2 x\right)}{\mu^{3}}
