# Background knowledge represented by Bayesian network

I am travelling to the airport from Bangkok city centre (Siam square), the weather in Bangkok city is unpredictable as it can be raining, or it can be sunny all day. Only 60% of time that the weather will be a nice sunny day and it can be raining any times in Bangkok mostly about 40% of time. With different weather conditions, the bus can arrive at the bus stop late, on time, or early. And this will result to the time arriving at the airport. On a nice sunny day, the bus will tend to arrive at a bus stop on time or earlier than its actual schedule and I will get o the airport on time. Whereas, on a sunny day, there are more chance that the bus will arrive at the bus stop late and this will make me get to the airport late.

In [1]:
#Import libraries

import numpy as np
from pgmpy.factors.discrete import TabularCPD
from pgmpy.models import BayesianModel

### We define the network structure, named office_model, using BayesianModel

In [2]:
transport_model = BayesianModel([('rainy day', 'bus'),
                              ('sunny day', 'bus'),
                              ('bus', 'arrive at airport')])



### TabularCPD defines the conditional probability distribution table (cpd table). We need these tables for each node. After defining them, we add them all to the model.

In [3]:
rainy_cpd = TabularCPD(
    variable = 'rainy day',
    variable_card = 2,   # cardinality
    values = [[0.4], [0.6]])  # ['yes', 'no']

In [4]:
sunny_cpd = TabularCPD(
    variable = 'sunny day',
    variable_card = 2,   # cardinality
    values = [[0.7], [0.3]])  # ['yes', 'no']

In [5]:
arrive_at_airport_cpd = TabularCPD(
    variable = 'arrive at airport',
    variable_card = 2,
    values = [[.7, .4, .2],
              [.3, .6, .8]],
    evidence = ['bus'],
    evidence_card = [3])

In [6]:
bus_cpd = TabularCPD(
    variable = 'bus',
    variable_card = 3,
    values = [[.3, .7, .3, .5],
              [.5, .2, .4, .3],
              [.2, .1, .3, .2],],
    evidence = ['sunny day', 'rainy day'],
    evidence_card = [2,2])

In [7]:
transport_model.add_cpds(rainy_cpd, sunny_cpd, bus_cpd, arrive_at_airport_cpd)

In [8]:
# Checking if the cpds are valid for the model
transport_model.check_model()

True

In [9]:
# Viewing nodes of the model
transport_model.nodes()

NodeView(('rainy day', 'bus', 'sunny day', 'arrive at airport'))

In [10]:
transport_model.get_cpds()

[<TabularCPD representing P(rainy day:2) at 0x7fd5f0176fa0>,
 <TabularCPD representing P(sunny day:2) at 0x7fd5f0176b50>,
 <TabularCPD representing P(bus:3 | sunny day:2, rainy day:2) at 0x7fd5f0193940>,
 <TabularCPD representing P(arrive at airport:2 | bus:3) at 0x7fd5ea478af0>]

In [11]:
print("Nodes: ", transport_model.nodes())
print("Edges: ", transport_model.edges())
transport_model.get_cpds

Nodes:  ['rainy day', 'bus', 'sunny day', 'arrive at airport']
Edges:  [('rainy day', 'bus'), ('bus', 'arrive at airport'), ('sunny day', 'bus')]


<bound method BayesianNetwork.get_cpds of <pgmpy.models.BayesianModel.BayesianModel object at 0x7fd5f0176670>>

In [12]:
transport_model.active_trail_nodes('rainy day')

{'rainy day': {'arrive at airport', 'bus', 'rainy day'}}

In [13]:
#Checking independcies of a node

transport_model.local_independencies('rainy day')

(rainy day ⟂ sunny day)

In [14]:
transport_model.get_independencies()

(arrive at airport ⟂ sunny day, rainy day | bus)
(arrive at airport ⟂ rainy day | sunny day, bus)
(arrive at airport ⟂ sunny day | rainy day, bus)
(sunny day ⟂ rainy day)
(sunny day ⟂ arrive at airport | bus)
(sunny day ⟂ arrive at airport | rainy day, bus)
(rainy day ⟂ sunny day)
(rainy day ⟂ arrive at airport | bus)
(rainy day ⟂ arrive at airport | sunny day, bus)

### Part2: Querying Bayesian network

In [15]:
from pgmpy.inference import VariableElimination

In [16]:
transport_infer = VariableElimination(transport_model)

### 1. Doing Inference using hard evidence

In [17]:
# Query 1: What is the probability of taking a bus, given the rainy day=no (0) 

q = transport_infer.query(variables=['bus'], evidence={'rainy day':0})
print(q)

# Additional query for the first query: What is the probability of taking a bus, given the rainy day=yes (1) 

q = transport_infer.query(variables=['bus'], evidence={'rainy day':1})
print(q)

# Query2: What is  the joint probability of taking bus and not getting to the airport late given that the rainy day=yes (1)
q = transport_infer.query(variables=['bus', 'arrive at airport'], evidence={'rainy day':1})
print(q)

# Query3: What is the probabilities (not joint) of taking a bus bus and getting to the airport late given rainy day=no (0)
q = transport_infer.query(variables=['bus', 'arrive at airport'], evidence={'rainy day':0}, joint=False)
#print(q)
for factor in q.values():
    print(factor)

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

+--------+------------+
| bus    |   phi(bus) |
| bus(0) |     0.3000 |
+--------+------------+
| bus(1) |     0.4700 |
+--------+------------+
| bus(2) |     0.2300 |
+--------+------------+


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

+--------+------------+
| bus    |   phi(bus) |
| bus(0) |     0.6400 |
+--------+------------+
| bus(1) |     0.2300 |
+--------+------------+
| bus(2) |     0.1300 |
+--------+------------+


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

+--------+----------------------+------------------------------+
| bus    | arrive at airport    |   phi(bus,arrive at airport) |
| bus(0) | arrive at airport(0) |                       0.4480 |
+--------+----------------------+------------------------------+
| bus(0) | arrive at airport(1) |                       0.1920 |
+--------+----------------------+------------------------------+
| bus(1) | arrive at airport(0) |                       0.0920 |
+--------+----------------------+------------------------------+
| bus(1) | arrive at airport(1) |                       0.1380 |
+--------+----------------------+------------------------------+
| bus(2) | arrive at airport(0) |                       0.0260 |
+--------+----------------------+------------------------------+
| bus(2) | arrive at airport(1) |                       0.1040 |
+--------+----------------------+------------------------------+


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

+--------+------------+
| bus    |   phi(bus) |
| bus(0) |     0.3000 |
+--------+------------+
| bus(1) |     0.4700 |
+--------+------------+
| bus(2) |     0.2300 |
+--------+------------+
+----------------------+--------------------------+
| arrive at airport    |   phi(arrive at airport) |
| arrive at airport(0) |                   0.4440 |
+----------------------+--------------------------+
| arrive at airport(1) |                   0.5560 |
+----------------------+--------------------------+


In [18]:
# Computing the MAP of bus given rainy day=no.
q = transport_infer.map_query(variables=['bus'], evidence={'rainy day':0})
print(q)

# Computing the MAP of bus and airport not late given rainy day=yes
q = transport_infer.map_query(variables=['bus', 'arrive at airport'], evidence={'rainy day':1})
print(q)

  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'bus': 1}


  0%|          | 0/1 [00:00<?, ?it/s]

  0%|          | 0/1 [00:00<?, ?it/s]

{'bus': 0, 'arrive at airport': 0}


### 2.Inference using virtual evidence

Virtual evidence refers to Pearl’s (1988) idea of interpreting uncertain evidence on a set of events as an hard evidence on some virtual event that only depends on this set of events.

In [19]:
# Query with hard evidence rainy day = yes and virtual evidence airport not late = [0.45, 0.55]

airport_not_late_virt_evidence = TabularCPD(variable='arrive at airport', variable_card=2, values=[[0.45], [0.55]])
q = transport_infer.query(variables=['bus'], evidence={'rainy day':1}, virtual_evidence=[airport_not_late_virt_evidence])
print(q)

# Query with hard evidence rainy = yes and virtual evidences airport not late = [0.45, 0.55] and bus = [0.3, 0.7]

airport_not_late_virt_evidence = TabularCPD(variable='arrive at airport', variable_card=2, values=[[0.45], [0.3]])
q = transport_infer.query(variables=['bus'], evidence={'rainy day':1}, virtual_evidence=[airport_not_late_virt_evidence])
print(q)

  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

+--------+------------+
| bus    |   phi(bus) |
| bus(0) |     0.6226 |
+--------+------------+
| bus(1) |     0.2377 |
+--------+------------+
| bus(2) |     0.1396 |
+--------+------------+


  0%|          | 0/2 [00:00<?, ?it/s]

  0%|          | 0/2 [00:00<?, ?it/s]

+--------+------------+
| bus    |   phi(bus) |
| bus(0) |     0.6734 |
+--------+------------+
| bus(1) |     0.2151 |
+--------+------------+
| bus(2) |     0.1115 |
+--------+------------+


In [20]:
print(transport_model.get_cpds('arrive at airport'))

+----------------------+--------+--------+--------+
| bus                  | bus(0) | bus(1) | bus(2) |
+----------------------+--------+--------+--------+
| arrive at airport(0) | 0.7    | 0.4    | 0.2    |
+----------------------+--------+--------+--------+
| arrive at airport(1) | 0.3    | 0.6    | 0.8    |
+----------------------+--------+--------+--------+
