# Some useful constructions

I list a few constructions used in my solution.  I'd suggest reviewing this briefly to see whats available and returning if the need arises.  Don't feel obligated to use any of the following, as long as you pass the asserts you're all set :)

### You can iterate row by row through a dataframe

Using [pd.DataFrame.iterrows()](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html)

In [1]:
import pandas as pd

df = pd.DataFrame({'a': [1, 2],
                   'b': [3, 4]})
df

Unnamed: 0,a,b
0,1,3
1,2,4


In [2]:
for idx, row in df.iterrows():
    print(row)

a    1
b    3
Name: 0, dtype: int64
a    2
b    4
Name: 1, dtype: int64


### Building a `pd.DataFrame` row by row

In [3]:
import pandas as pd

# make a list of dictionaries (each will be a new row)
row_list = list()
for name in ['ringo', 'john', 'paul', 'george']:
    new_row = {'name': name, 'length of name': len(name), 'first char': name[0]}
    row_list.append(new_row)
    
pd.DataFrame(row_list)

Unnamed: 0,name,length of name,first char
0,ringo,5,r
1,john,4,j
2,paul,4,p
3,george,6,g


### Copying something 

helpful to be sure each of the  `new_row` dictionaries above are seperate objects ... otherwise the modification of one row will also modify others!

In [4]:
from copy import copy

a = {'key': 'value', 1: 2}
b = copy(a)

# notice that a and b are seperate objects, modifying b does not modify a
b['key'] = 'VALUE'

a, b

({'key': 'value', 1: 2}, {'key': 'VALUE', 1: 2})

### Check if one dictionary is included in another

Python's `Set` objects are wonderful for testing if one collection is included in another.

In [5]:
a = {1, 2, 3}

# does set a contain all these values?
a.issuperset([1, 2, 3, 4])

False

In [6]:
# is set a included in all these values?
a.issubset([1, 2, 3, 4])

True

In [7]:
a_dict = {'a': 1, 'b': 2}
a_super_dict = {'a': 1, 'b': 2, 'c': 3}

# build a set of (key, value) tuples
set(a_dict.items())

{('a', 1), ('b', 2)}

In [8]:
# is a_dict contained within a_super_dict?
set(a_dict.items()).issubset(a_super_dict.items())

True

### Merging two dictionaries together

In [9]:
dict0 = {'a': 1, 'b': 2}
dict1 = {'c': 3}

# add all dict1 key value pairs into dict0
dict0.update(dict1)

dict0

{'a': 1, 'b': 2, 'c': 3}

In [10]:
dict0 = {'a': 1, 'b': 2}
dict1 = {'c': 3, 'a': 'a whole new value!'}

# notice: intersecting keys are overwritten (taking values from dict1)
dict0.update(dict1)

dict0

{'a': 'a whole new value!', 'b': 2, 'c': 3}

In [11]:
# a shorthand, more recently introduced in python
dict0 | dict1

{'a': 'a whole new value!', 'b': 2, 'c': 3}

# HW 7 Pseudocode hint

See how much progress you can make yourself without using this hint (I tried to "hide" it in this extra file to dissaude you from reading this right away ... however, if you find yourself stuck or not having fun anymore, read the pseudocode hint below):

your `BayesNet` class has two methods 
- `BayesNet.add_prior_node()`
- `BayesNet.add_conditional_node()`

Lets focus on the more challenging one `.add_conditional_node()` as the other behavior will be included in the discussion.  

## Studying the interface

Before adding our first conditional node, `bayes_net.df_joint` looks like:

| prob | Cloudy |
|------|--------|
| 0.5  | cloudy |
| 0.5  | clear  |

at which point we add the conditional node via:

```python
# add rain conditional prob
cond_prob_rain = \
    ConditionalProb(target='Rain',
                    condition_list=['Cloudy'],
                    cond_prob_dict={('cloudy',): {'rain': .8, 'no rain': .2},
                                    ('clear',): {'rain': .2, 'no rain': .8}})
bayes_net.add_conditional_node(cond_prob_rain)
```
so that, afterwards, `bayes_net.df_joint` looks like:

| prob | Cloudy | Rain    |
|------|--------|---------|
| 0.4  | cloudy | rain    |
| 0.1  | cloudy | no rain |
| 0.1  | clear  | rain    |
| 0.4  | clear  | no rain |

## Implementation pseudo code

Notice that the first row in the initial dataframe 
- prob=.5, Cloudy=cloudy

ends up generating a corresponding row in the output for every outcome of the new target random variable (rain, no rain):
- prob=.4, Cloudy=cloudy, Rain=rain
- prob=.1, Cloudy=cloudy, Rain=no rain

So a rough pseudo code of the `.add_conditional_node()` is:

```python
for every row in data_frame:
    # 1.  find corresponding conditional probability appropriate for row
    # (example, when initial row had Cloudy=cloudy the conditional
    # probability of rain was cond_prob = {'rain': .8, 'no rain': .2})

    
    for outcome, prob in cond_prob.items():  
        # build a new row in output dataframe for every outcome in conditional prob (incorporates outcome & prob in new row)
```

### `.add_prior_node()` ...
works similarly, except its conditional prob is constant for all rows (the prob of the new variable is independent of all existing random variables)