# Question 1
For Standard Cauchy Distributions: $\pi(x)=\frac{1}{\pi(x^{2}+1)}$\
For Standard Normal Distributions: $g(x)=e^{\frac{-x^{2}}{2}}\frac{1}{\sqrt{2\pi}}$
$$
\frac{\pi(x)}{g(x)}=\frac{\sqrt{2\pi}e^{\frac{-x^{2}}{2}}}{\pi(x^{2}+1)}\
\frac{\pi(x)}{g(x)}=\frac{\sqrt{2}e^{\frac{-x^{2}}{2}}}{\sqrt{\pi}(x^{2}+1)}\
$$
$$sup\frac{\pi(x)}{g(x)}=\sqrt{\frac{2}{\pi}}$$
Also, since we are estimating the mean, then $h(x)=1$ implying that $Var(h(x))=0$.\
Since this is finite, using Theorem 2 of W4L11, we obtain that the variance of the estimator will be finite.

# Question 2

## Question 2.1
Yes, the weighted importance estimator should also have a finite variance.The reasoning is as follows:\
Since $sup\frac{f(x)}{g(x)}<\infty$, we can say that $sup\frac{f(x)}{\tilde{g(x)}}<\infty$, as $\tilde{g(x)} \propto g(x)$ and the proportionality constant is known to be finite.

## Question 2.2
I feel that the benefit of Importance Sampling means that in each iteration, we get an acceptable sample, however for Accept-Reject, this is not true. We generally expect to get an acceptable sample in more than one iteration.\
This implies that Importance Sampling is faster to perform.

# Question 3

Let us define the following:
$$ 
\tilde{\pi}(x)=e^{\frac{-x^{2}}{2}}\prod_{i=1}^{n}(1+\frac{(y_{i}-x)^{2}}{v})^{-(v+1)/2}
$$

In [1]:
using Distributions
using Random
Random.seed!(1)
function sampleY(v,n)
    global i=0
    global Y=[]
    global v=5
    d=TDist(v)
    while(i<n)
        global i=i+1
        y=rand(d,1)
        append!(Y,y)
    end
return Y
end

function PI(Y,x,v,n)
    global m=exp(-x*x/2)
    global i1=0
    while(i1<n)
        global i1=i1+1
        global m=m*(1+(Y[i1]-x)^2/v)^(-(v+1)/2)
    end
    return m
end

function normalcdf(x)
    a=exp((-x*x/2))/((2*pi)^(0.5))
    return a
end

normalcdf (generic function with 1 method)

In [2]:
function firstMoment(n,v,num)
    Y=sampleY(v,n)
    global i3=0
    global top=0
    global bot=0
    global flag=0
    while (flag==0)
#         println(i3)
        x=randn()
        g=normalcdf(x)
        p=PI(Y,x,v,n)
#         println("p:",p,"g:",g)
        top=top+(x*p)/g
        bot=bot+p/g
        i3=i3+1
        if(i3>=num)
            global flag=1
            break
        end
    end
    return top/bot
end

function secondMoment(n,v,num)
    Y=sampleY(v,n)
    global i2=0
    global top2=0
    global bot2=0
    while (i2<num)
        i2=i2+1
        x=randn()
        g=normalcdf(x)
        p=PI(Y,x,v,n)
        top2=top2+x*x*p/g
        bot2=bot2+p/g
    end
    return top2/bot2
end

secondMoment (generic function with 1 method)

In [3]:
println("For v=5")
expectation=firstMoment(50,5,10000)
println("Expectation:",expectation)
println("Variance:",expectation^2-secondMoment(50,5,10000))

For v=5
Expectation:0.3195498376470879
Variance:0.048214074599329894


In [4]:
println("For v=1")
expectation=firstMoment(50,1,10000)
println("Expectation:",expectation)
println("Variance:",expectation^2-secondMoment(50,1,10000))

For v=1
Expectation:0.04382631588330899
Variance:-0.18224589091390006


In [5]:
println("For v=2")
expectation=firstMoment(50,2,10000)
println("Expectation:",expectation)
println("Variance:",(expectation)^2-secondMoment(50,2,10000))

For v=2
Expectation:0.20644879379356426
Variance:0.00898394357852978


We can observe that variance is finite

# Question 4

Knowing the following:
$$
\pi (\lambda | D) \propto \pi(D|\lambda)\pi(\lambda)\\
\pi(\lambda)=\frac{\lambda ^{\alpha -1}e^{-\beta\lambda}\beta^{\alpha}}{\Gamma{\alpha}}\\
\pi(D|\lambda)=\prod_{i=1}^{n}\frac{\lambda^{Y_{i}}e^{-\lambda}}{Y_{i}!}\\
\pi (\lambda | D) \propto \lambda ^{\alpha -1}e^{-\beta\lambda}*\prod_{i=1}^{n}\frac{\lambda^{Y_{i}}e^{-\lambda}}{Y_{i}!}\\
\pi (\lambda | D) \propto \lambda ^{\alpha +\sum_{i=1}^{n}Y_{i}-1}e^{-\beta\lambda}*\prod_{i=1}^{n}\frac{e^{-\lambda}}{Y_{i}!}\\
\pi (\lambda | D) \propto \lambda ^{\alpha +\sum_{i=1}^{n}Y_{i}-1}e^{-\beta\lambda-n\lambda}*\prod_{i=1}^{n}\frac{1}{Y_{i}!}\\
\pi (\lambda | D) \propto \lambda ^{\alpha +\sum_{i=1}^{n}Y_{i}-1}e^{-\beta\lambda-n\lambda}\\
$$
$\pi (\lambda | D)$ represents a Gamma Distribution: $Gamma(\alpha +\sum_{i=1}^{n}Y_{i},\beta+n)$
This becomes our posterior for $\lambda$.
