In [2]:
import numpy as np
import pandas as pd

In [3]:
dates = pd.date_range('20130101', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df

Unnamed: 0,A,B,C,D
2013-01-01,0.385192,-1.572812,-0.427965,1.589609
2013-01-02,1.035777,-0.428113,0.210538,0.926154
2013-01-03,-0.540016,-0.458651,0.405547,1.109699
2013-01-04,0.141192,-0.9041,-2.229195,2.381928
2013-01-05,0.143568,0.762386,0.070902,1.329507
2013-01-06,0.417421,1.304776,1.054625,0.087306


<details class="tip">
    <summary>Extra: Proof of decomposition</summary>
    <p><br>First, let's recall conditional probability,<br>
    $$P\left (A|B\right ) = \frac{P\left (A, B\right )}{P\left (B\right )}$$
    The above equation is so derived because of reduction of sample space of $A$ when $B$ has already occured.
    Now, adjusting terms -<br>
    $$P\left (A, B\right ) = P\left (A|B\right )*P\left (B\right )$$
    This equation is called chain rule of probability. Let's generalize this rule for Bayesian Networks. The ordering of names of nodes is such that parent(s) of nodes lie above them (Breadth First Ordering).<br>
    $$P\left (X_1, X_2, X_3, ..., X_n\right ) = P\left (X_n, X_{n-1}, X_{n-2}, ..., X_1\right )\\
    = P\left (X_n|X_{n-1}, X_{n-2}, X_{n-3}, ..., X_1\right ) * P \left (X_{n-1}, X_{n-2}, X_{n-3}, ..., X_1\right ) \left (Chain Rule\right )\\  
    = P\left (X_n|X_{n-1}, X_{n-2}, X_{n-3}, ..., X_1\right ) * P \left (X_{n-1}|X_{n-2}, X_{n-3}, X_{n-4}, ..., X_1\right ) * P \left (X_{n-2}, X_{n-3}, X_{n-4}, ..., X_1\right )$$
    Applying chain rule repeatedly, we get the following equation -<br>
    $$P\left (\bigcap_{i=1}^{n}X_i\right ) = \prod_{i=1}^{n} P\left (X_i | P\left (\bigcap_{j=1}^{i-1}X_j\right )\right )$$
    Keep the above equation in mind. Let's bring back Markov property. To bring some intuition behind Markov property, let's reuse <a href="#bayesian-network-example">Bayesian Network Example</a>. If we say, the student scored very good  <strong>grades</strong>, then it is highly likely the student gets  <strong>acceptance letter </strong> to university. No matter how  <strong>difficult</strong> the class was, how much  <strong>intelligent </strong> the student was, and no matter what his/her  <strong>SAT</strong> score was. The key thing to note here is by  <strong>observing</strong> the node's parent, the influence by  <strong>non-descendants</strong> towards the node gets eliminated. Now, the equation becomes -<br>
    $$P\left (\bigcap_{i=1}^{n}X_i\right ) = \prod_{i=1}^{n} P\left (X_i | Par\left (X_i\right )\right )$$
    Bingo, with the above equation, we have proved  <strong>Factorization Theorem </strong> in Probability.
    </p>
</details>

## mkdocs-material things

<div class="admonition note">
    <p class="admonition-title">Note</p>
    <p>
        If two distributions are similar, then their entropies are similar, implies the KL divergence with respect to two distributions will be smaller. And vica versa. In Variational Inference, the whole idea is to <strong>minimize</strong> KL divergence so that our approximating distribution $q(\theta)$ can be made similar to $p(\theta|D)$.
    </p>
</div>

<details class="note">
<summary>Empty Class in C++</summary>
<div class="tabbed-set tabbed-alternate" data-tabs="1:3"><input checked="checked" id="__tabbed_1_1" name="__tabbed_1" type="radio" /><input id="__tabbed_1_2" name="__tabbed_1" type="radio" /><input id="__tabbed_1_3" name="__tabbed_1" type="radio" /><div class="tabbed-labels"><label for="__tabbed_1_1">Question</label><label for="__tabbed_1_2">Answer</label><label for="__tabbed_1_3">Comment</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
<p>What does the compiler create for an empty class in C++?</p>
</div>
<div class="tabbed-block">
<p>A class with no virtual functions occupies 1 byte; otherwise, it occupies 4 or 8 bytes, depending on the machine is 32-bit or 64-bit.</p>
<p>For any object of an empty class, 1 byte is allocated by compiler for unique address identification. The minimum amount of memory is 1 byte.</p>
<p>For a class without any data members but with at lest one virtual function, its object occupies 4 bytes (on 32-bit machines) or 8 bytes (on 64-bit machines), because a hidden virtual point (called VPTR) is created as a class (all objects share one) data member field. Hence an empty class with virtual functions is actually no longer empty.</p>
</div>
<div class="tabbed-block"></div>
</div>
</div>
</details>