# QMCPy for Logistic Regression
This notebook will give examples of how to use QMCPy for Logistic Regression

In [1]:
from qmcpy import *
from numpy import *
from qmcpy.integrand.LR import LR

Say we have some data. Say we want to use logistic regression for a dataset regarding admission to graduate school. We have,


The likelihood function  for logistic regression, $f:\mathbb{R}^{d+1} \to [0,\infty)$ takes the form
\begin{align*}
f(\boldsymbol{x}) & = \prod_{i = 1}^m 
\left(\frac{\exp\left(x_0 + \sum_{j = 1}^d x_j s_{ij} \right)}
{1+\exp\left(x_0 + \sum_{j = 1}^d x_j s_{ij}\right)}\right)^{t_i}
\left(1-\frac{\exp\left(x_0 + \sum_{j = 1}^d x_j s_{ij} \right)}{1+\exp\left(x_0 + \sum_{j = 1}^d x_j s_{ij}\right)}\right)^{1-t_i} \\  %equations need punctuation
& = \prod_{i = 1}^m 
\left(\frac{\exp\left(x_0 + \sum_{j = 1}^d x_j s_{ij} \right)}
{1+\exp\left(x_0 + \sum_{j = 1}^d x_j s_{ij}\right)}\right)^{t_i}
\left(\frac{1}{1+\exp\left(x_0 + \sum_{j = 1}^d x_j s_{ij}\right)}\right)^{1-t_i} \\
& = \prod_{i = 1}^m 
\left(\frac{\left[\exp\left(x_0 + \sum_{j = 1}^d x_j s_{ij} \right) \right]^{t_i} }
{1+\exp\left(x_0 + \sum_{j = 1}^d x_j s_{ij}\right)}\right)
\end{align*}

For us to use logistic regression, we must have a matrix, t, such that it only contains 0's and 1's. Another matrix, S, can be defined by the rest of the dataset. 

So, let t = array([0, 1, 1, 1, 0, 1, 1, 0, 1, 0]).
Also, let S = 
array([
[380,3.61,3]
[660,3.67,3]
[800,4,1]
[640,3.19,4]
[520,2.93,4]
[760,3,2]
[560,2.98,1]
[400,3.08,2]
[540,3.39,3]
[700,3.92,2]])

In [1]:
n = 10
data = genfromtxt('binary.csv', dtype=float, delimiter=',', skip_header = True)
s = data[:n, 1:]
t = data[:n, 0]

NameError: name 'genfromtxt' is not defined

We have our matrices, now what we want to do is define our dimensions. For this work we must have the number of rows in S + 1 for the dimensions to work. So we will take the number of rows in S and add 1 and set that equal to r.

In [13]:
no,dim_s = S.shape
r = dim_s +1
dim = r+1

Now, we want to setup our integration to find the coefficients for the logistic regression. We will setup the integration by using this program below.

In [14]:
lr = LR(Sobol(dim_s+1,seed=8), s_matrix = S, t = t, r = r, prior_variance=[1,1e-4,1,1])
if r==0: raise Exception('require r>0')
qmcclt = CubQMCCLT(lr,
    abs_tol = 0,
    rel_tol = .25,
    n_init = 256,
    n_max = 2 ** 30,
    inflate = 1.2,
    alpha = 0.01,
    replications = 16,
    error_fun = lambda sv,abs_tol,rel_tol: maximum(abs_tol,abs(sv)*rel_tol),
    bound_fun = lambda phvl, phvh: (
        minimum.reduce([phvl[1:dim]/phvl[0],phvl[1:dim]/phvh[0],phvh[1:dim]/phvl[0],phvh[1:dim]/phvh[0]]),
        maximum.reduce([phvl[1:dim]/phvl[0],phvl[1:dim]/phvh[0],phvh[1:dim]/phvl[0],phvh[1:dim]/phvh[0]]),
        sign(phvl[0])!=sign(phvh[0])),
    dependency = lambda flags_comb: hstack((flags_comb.any(),flags_comb)))
s,data = qmcclt.integrate()
print(data)

MeanVarDataRep (AccumulateData Object)
    solution        [-0.164  0.007 -0.66  -0.411]
    indv_error_bound [5.689e-06 5.075e-06 1.399e-07 2.111e-05 1.052e-05]
    ci_low          [ 1.883e-04 -3.829e-05  1.248e-06 -1.547e-04 -9.313e-05]
    ci_high         [ 1.997e-04 -2.814e-05  1.528e-06 -1.125e-04 -7.209e-05]
    ci_comb_low     [-0.203  0.006 -0.821 -0.494]
    ci_comb_high    [-0.141  0.008 -0.563 -0.361]
    solution_comb   [-0.164  0.007 -0.66  -0.411]
    flags_comb      [False False False False]
    flags_indv      [False False False False False]
    n_total         2^(16)
    n               [4096. 4096.  512.  512.  512.]
    replications    2^(4)
    time_integrate  0.257
CubQMCCLT (StoppingCriterion Object)
    inflate         1.200
    alpha           0.010
    abs_tol         0
    rel_tol         2^(-2)
    n_init          2^(8)
    n_max           2^(30)
LR (Integrand Object)
Gaussian (TrueMeasure Object)
    mean            0
    covariance      [1.e+00 1.e-04 1.e+0

## Sample Problem 2

$y = \int_{[a,b]^d} ||x||_2^2 dx, \:\: \mbox{Lebesgue Measure}$

$\phantom{y} = \Pi_{i=1}^d (b_i-a_i)\int_{[a,b]^d} ||x||_2^2 \; [ \Pi_{i=1}^d (b_i-a_i)]^{-1} dx, \:\: \mbox{Uniform Measure}$

In [5]:
abs_tol = .001
dim = 2
a = array([1.,2.])
b = array([2.,4.])
true_value = ((a[0]**3-b[0]**3)*(a[1]-b[1])+(a[0]-b[0])*(a[1]**3-b[1]**3))/3
print('Answer = %.5f'%true_value)

Answer = 23.33333


In [6]:
# Lebesgue Measure
integrand = CustomFun(
    true_measure = Lebesgue(Uniform(Sobol(dim, seed=7), lower_bound=a, upper_bound=b)), 
    g = lambda x: (x**2).sum(1))
solution,data = CubQMCCLT(integrand, abs_tol=abs_tol).integrate()
print('y = %.5f'%solution)
error = abs((solution-true_value))
if error>abs_tol:
    raise Exception("Not within error tolerance")

y = 23.33326


In [7]:
# Uniform Measure
integrand = CustomFun(
    true_measure = Uniform(Sobol(dim, seed=17), lower_bound=a, upper_bound=b),
    g = lambda x: (b-a).prod()*(x**2).sum(1))
solution,data = CubQMCCLT(integrand, abs_tol=abs_tol).integrate()
print('y = %.5f'%solution)
error = abs((solution-true_value))
if error>abs_tol:
    raise Exception("Not within error tolerance")

y = 23.33340


## Sample Problem 3
Integral that cannot be done in terms of any standard mathematical functions<br>
$$y = \int_{[a,b]} \frac{\sin{x}}{\log{x}} dx, \:\: \mbox{Lebesgue Measure}$$

Mathematica Code: `Integrate[Sin[x]/Log[x], {x,a,b}]`


In [8]:
abs_tol = .0001
dim = 1
a = 3
b = 5
true_value = -0.87961 

In [9]:
# Lebesgue Measure
integrand = CustomFun(
    true_measure = Lebesgue(Uniform(Lattice(dim, randomize=True, seed=7),a,b)), 
    g = lambda x: (sin(x)/log(x)).sum(1))
solution,data = CubQMCLatticeG(integrand, abs_tol=abs_tol).integrate()
print('y = %.3f'%solution)
error = abs((solution-true_value))
if error>abs_tol:
    raise Exception("Not within error tolerance")

y = -0.880


## Sample Problem 4
Integral over $\mathbb{R}^d$
$$y = \int_{\mathbb{R}^2} e^{-||x||_2^2} dx$$

In [10]:
abs_tol = .1
dim = 2
true_value = pi

In [11]:
integrand = CustomFun(
    true_measure = Lebesgue(Gaussian(Lattice(dim,seed=7))),
    g = lambda x: exp(-x**2).prod(1))
solution,data = CubQMCLatticeG(integrand,abs_tol=abs_tol).integrate()
print('y = %.3f'%solution)
error = abs((solution-true_value))
if error>abs_tol:
    raise Exception("Not within error tolerance")

y = 3.141
