# LECTURE 5 NOTES

## Content
* introduction to decision making and decision theory
* decisions under ignorance
* decisions with probabilities

## Objectives

* exposure to decision making and the factors influencing decisions
* introduction to decision theory concepts
* basic modeling for decisions with uncertainty (maximax, minimax, maximin algorithms)
* basic modeling for decisions with probabilities (expected value)

## DECISION MAKING

In thinking about decision making, we will set up a context in which to frame all subsequent discussion on the topic.

A common decision making context we will be dealing with involves a decision maker or _actor_ and _nature_.  The context could involved one or more actors against nature, and as in a game, one or more actors against each other.  However, we'll restrict our discussions to a single actor and nature. 

There are various states of nature or _events_, that may influence the behavior of the actor, and various _outcomes_ are possible depending on all the modeled events that could occur.  Each outcome has a perceived value to the actor, the choices the actor makes represent the _acts_ of the decision maker.

<!-- rational actor theory reference -->
We are assuming that the decision maker is a _rational actor_ -- that is the decisions are made consistently based on preference, and that decisions are made within the framework of _reason_ and _fact_.  While reason and fact are central to the decision making process of the rational actor, we assume the actor will explore all possible choices and apply preferences to their decisions based on the _desired_ outcome (e.g. maximize loss, minimize risk, etc).

### FACTORS INFLUENCING DECISION MAKING

In extending the framework for decision making, we have to explore core factors that influence decisions. 

<!-- Utility 
_Preferences_ 
* preferences -->
* **outcomes** : the results of the various events of nature on the actor
* **utility** : a (typically numeric) measure of value for a particular event 
* **loss or cost** : the inverse of utility, or the measure of the cost of an event; for example the "cost" associated with purchasing an umbrella may be compared to its utility; "loss" can also refer to what what one may lose of they do not make a specific decision
<!--* cost :--> 
* **risk** : the probability associated with an event happening; for example the _risk_ of it raining tomorrow is equivalent to asking "what is the probability it will rain tomorrow"



### HIGH-LEVEL PROCESSES

At the most basic level a decision process can be partitioned in 8 steps enumerated below.  Most processes will begin with an identification of the _decision makers_ (actors) and _stakeholders_ (which may or may not be the actors).  For example, in the context of managing an investment portfolio, the ultimate decision makers might be the investment managers, while the stakeholders will be the investors who are placing money into the fund with the expectation of the highest possible return on their investment. The basic process for decision making is given below:

![Basic decision making process.](./assets/decision_flow.png)




There are four concepts typically used to frame the decision context:

* **acts** - _actions taken by the decision maker_
* **events** - _the states of nature that may affect the decision maker, these states are not under the control of the decision maker (e.g. the decion maker cannot control the weather)_
* **outcomes** - _the consequence or effect of both nature and acts_
* **payoff** - _the actual or perceived value the decision maker places on the outcomes_

Crucial to this framework is the concept that there are consequences of _acts_ that have value to the decision maker.


### PAYOFF TABLE
One way to model this problem is with a _payoff table_. Such a payoff table can be modeled with $a$ denoting the decision (**action**) and $n$ the state of nature (**events**).  Let $p_{a_i,n_j}$ be the payoff for each of the acts $a_i$ and states of nature $n_j$. For now we will assume payoff represents the preference of the decision maker given the alternative decisions and the states of nature.  A payoff table might look like this:

| $^n / _a$ | $n_1$ | $n_2$ | $\ldots$ | $n_m$ |
|:--------:|:-----:|:-----:|:-----:|:-----:|
| $a_1$ | $p_{a_1,n_1}$ | $p_{a_1,n_2}$ | $\ldots$ | $p_{a_1,n_m}$ |
| $a_2$ | $p_{a_2,n_1}$ | $p_{a_2,n_2}$ | $\ldots$ | $p_{a_2,n_m}$ |
| $\ldots$ | $\ldots$ | $\ldots$ | $\ddots$ | $\ldots$ |
| $a_n$ | $p_{a_n,n_1}$ | $p_{a_n,n_2}$ | $\ldots$ | $p_{a_n,n_m}$ |




#### EXAMPLE

Let's explore a simple example. Suppose it is to snow tomorrow.  You can wear snow boots, but you find them to be hot, bulky and a burden, thus you don't  usually _like_ to wear boots.  This problem can be modeled thus,

* acts (choices) 
    * wear boots
    * don't wear boots
* events
    * snow
    * no snow
* outcomes
    * wear boots, snow
    * wear boots, no snow
    * don't wear boots, snow
    * don't wear boots, no snow 

So far the set up of the problem would look like this in the payoff table:

|         | snow | no snow |
|:-------:|:----:|:-------:|
|**boots**    |$p_{ b, s}$|$p_{\neg b,s}$|
|**no boots** |$p_{\neg b, s}$|$p_{\neg b, \neg s}$|

How do we make a decision?  What criterion will we use to make the _right_ decision?  _For now_ we will say that we want to make the **decision with the highest payoff**, eventhough this may or may not be the desired strategy in every case as we shall see later, based on preferences, risk tolerances, assumptions about the state of nature, etc.  

Let's assume that for now the payoff can be numerically modeled as _an integer representing the decision maker's satisfaction or perceived benefit_, where a negative value represents  a negative benefit for the decision maker, while a positive value, positive benefit. Though these numbers are modeled as integers, they do not have to be, as in the case where payoffs are monetary. 

* payoff
    * $r_{b,s} \rightarrow  0$; equivalent to when outcome = wear boots, snow.  You are satisfied that your feet are warm and dry, even if you don't like to wear boots.
    * $r_{b,\neg s} \rightarrow -10$; equivalent to when outcome = wear boots, no snow.  You you are inconveniently forced to wear something that is uncomfortable and not need, thus you are _not at all satisfied that day_.
    * $r_{\neg b, s} \rightarrow -5$; equivalent to when outcome = don't wear boots, snow.  You are inconvenienced because your feet are cold and maybe wet because it snowed and you failed to wear your boots, thus you are not satisified, but you still place some value on not having to wear boots at all!
    * $r_{\neg b, \neg s} \rightarrow 5$; equivalent to when outcome = don't wear boots, no snow.  You are very satisfied since it didn't snow and you didn't wear your boots.

The final payoff table will look like this:

|         | snow | no snow |
|:-------:|:----:|:-------:|
|**boots**    | $0$  | $-10$ |
|**no boots** | $-5$ | $5$ |

Let's assume also (for now at least), that we do not know the likelihood of snow or no snow.  In this state, we say we are making a _decision under ignorance_.  How do we decide to wear boots or not?

### DECISIONS UNDER IGNORANCE

When we say we do not know the probability or likelihood of rain, we are forced into making what is called a _decision under ignorance_ -- that is we have **no information** about what the likelihoods of the states of nature will be that could influence our decision.

There are four common strategies to consider when making decisions under ignorance.  They are enumerated in the table below:

| decision rule | strategy characterization | explanation |
|:-------------:|:---------------------:|---------------------------------------|
|maximin        | pessimism             |**best of the worst case** |
|maximax        | extreme optimism      |**best of the best case**  |
|minimax        | caution               |**least of the best case** |
|minimax regret | caution               |**least of the best case with regret criterion** | 

Mathematically, these decisions can be written with just $\mathrm{min}$, $\mathrm{max}$, $\mathrm{argmin}$ and $\mathrm{argmax}$.

| decision rule | binary action formalism |
|:-------------:|:--------------------:|
|maximin        | $$\mathrm{argmax} \min p_{a, n}$$ |
|maximax        | $$\mathrm{argmax} \max p_{a, n}$$ |
|minimax        | $$\mathrm{argmin} \max p_{a, n}$$ |
|minimax regret | $$\mathrm{argmin} \max \hat{p}_{a, n}$$  |

Note: $\hat{p}$ is the regret calculation described in the next section.

We an now return to our example scenario and make a decision as to what we should do under various rules.

**MAXIMIN**
With **maximin** we will take the best of the worst case payoffs, that is for each possibile decision, we will find the set of all of the worst of them, then pick the best of that set. Given the payoff table:

|         | snow | no snow | **maximin strategy** |
|:-------:|:----:|:-------:|:-----:|
|**boots**    | $0$  | $-10$ | $-10$ |
|**no boots** | $-5$ | $5$ | $\bf{-5}$ |

Using the maximin strategy, the best of the worst case is $-5$, thus we would *not wear boots*. Notice that this strategy does not protect your feet eventhough it assumes a pessimistic stance in the face of ignorance about the rain forecast.

**MAXIMAX**
With the **maximax** strategy,  we will take a wildly optimistic, opportunistic and more blindly risky posture to  decision making.  In fact, we will be na&iuml;vely positive to any of the realistic possibilities. In this case, we are *not going to wear boots* -- we value satisfaction and expect the best of the weather.


|         | snow | no snow | **maximax strategy** |
|:-------:|:----:|:-------:|:-----:|
|**boots**    | $0$  | $-10$ | $0$ |
|**no boots** | $-5$ | $5$ | $\bf{5}$ |

Under maximax, we would **not wear boots*.

**MINIMAX**
In order to be a bit more cautions, but also hold _some_ hope for good weather, we could employ the minimax strategy.  Here we will take the worst of the best ... holding out for the possibility that it might rain, but trying also to be optimistic that it won't.

|         | snow | no snow | **minimax strategy** |
|:-------:|:----:|:-------:|:-----:|
|**boots**    | $0$  | $-10$ | $\bf{0}$ |
|**no boots** | $-5$ | $5$ | $5$ |

Thus under minimax, we would **wear boots**.

### LOSS / REGRET TABLE

Computing the loss table requires we take the **payoff table** and compute the difference between the _maximum_ payoff and all the other payoffs. Building loss (regret) table allows one to calculate losses for the _minimax regret_ strategy.  Recall, minimax considers the best of the worst case scenarios, and in order to consider the worst cases, we may also will need to consider losses within the framework of what you could be losing if you take the best possible action for all actions and weigh it against the action under consideration.  Concretely, if the maximum payoff for a particular set of actions is $100$ and the payoff for a specific action is $25$, then the reget is $75$.  On the other hand, if the payoff is $-25$, the regret is $100$ .    Thus, we'd construct the regret table:


| $^n / _a$ | $n_1$ | $n_2$ | $\ldots$ | $n_m$ |
|:--------:|:-----:|:-----:|:-----:|:-----:|
| $a_1$ | $\mathrm{argmax}_{n_1} - p_{a_1,n_1}$ | $\mathrm{argmax}_{n_2} - p_{a_1,n_2}$ | $\ldots$ | $\mathrm{argmax}_{n_m} - p_{a_1,n_m}$ |
| $a_2$ | $\mathrm{argmax}_{n_1} - p_{a_2,n_1}$ | $\mathrm{argmax}_{n_2} - p_{a_2,n_2}$ | $\ldots$ | $\mathrm{argmax}_{n_m} - p_{a_2,n_m}$ |
| $\ldots$ | $\ldots$ | $\ldots$ | $\ddots$ | $\ldots$ |
| $a_n$ | $\mathrm{argmax}_{n_1} - p_{a_n,n_1}$ | $\mathrm{argmax}_{n_2} - p_{a_n,n_2}$ | $\ldots$ | $\mathrm{argmax}_{n_m} - p_{a_n,n_m}$ |



**MINIMAX REGRET**
We mentioned above the _minimax_ strategy,  and now that we know about regrets, we introduce the _minimax regret_ strategy which is computed not from the payoff table, but from the loss/regret table. Like _minimax_, we are still cautious, and also hold _some_ hope for good weather,  but look at the _loss_ or _regret_ as computed above instead of the _payoff_.

Thus the regret table looks like:



|         | snow | no snow | minimax regret |
|:-------:|:----:|:-------:|:--------------:|
|**boots**    | $0$  | $15$ | $\bf{0}$ |
|**no boots** | $5$ | $0$ | $5$ |

The decision would be to **wear boots**, the minimum of all the maximum regrets.

**EXAMPLE 2**

We've shown a simple example, but let's bring this into a more significant exploration of the concepts with the regret table.  Let's consider the following example (inspired by [https://people.richland.edu/james/summer02/m160/decision.html](https://people.richland.edu/james/summer02/m160/decision.html)):

>Zed and Adrian and run a small bicycle shop called "Z to A Bicycles". They must order bicycles for the coming season. Orders for the bicycles must be placed in quantities of twenty (20). The cost per bicycle is \$70 if they order 20, \$67 if they order 40, \$65 if they order 60, and \$64 if they order 80. The bicycles will be sold for \$100 each. Any bicycles left over at the end of the season can be sold (for certain) at \$45 each. If Zed and Adrian run out of bicycles during the season, then they will suffer a loss of "goodwill" among their customers. They estimate this goodwill loss to be \$5 per customer who was unable to buy a bicycle. 



| $^a / _n$   | Buy 20 | Buy 40 | Buy 60 | Buy 80 |
|:----------------:|:-----------:|:-----------:|:----------:|:-----------:|
|Demand 10|$1000-1400+450=\bf{50}$ | $1000-2680+1350=-330$ | $-650$ | $-970$ |
|Demand 30 |$550$ |$\bf{770}$ |$450$ |$130$ |
|Demand 50 |$450$ |$1270$ |$\bf{1550}$ |$1230$ |
|Demand 70 |$350$ |$1170$ |$2050$ |$\bf{2330}$ |


Note the maximum of each payoff at the given demand levels is indicated in **bold**.

Building the regret table yields:

| $^a/_n$   | Buy 20 | Buy 40 | Buy 60 | Buy 80 |
|:--:|:------:|:------:|:------:|:------:|
|Demand 10 |$\bf{50}-50=0$            |$380$|$700$|$1020$|
|Demand 30 |$770-550=220$   | $770-770=0$        |$320$ |$640$ |
|Demand 50 |$1550-450=1100$ |$280$ | $1550-1550=0$         |$320$ |
|Demand 70 |$2230-350=1980$  |$1160$ |$280$ |$2330-2330=0$      |


** CHOOSING THE BEST ACTION **

* Buying 60 bicycles is the best strategy under _minimax regret_.  Why? 

## DECISIONS UNDER RISK

_Decisions under risk_ are made when there are known probabilities of the state of nature.  For example, we have the forecast for snow tomorrow and this information will inform our decision to wear boots or not.

In order to model such decisions, we need to understand **Expected Return** (also known as **Expected Value**), denoted
$\mathit{ER}$.  Intuitively, expected return is the sum of the product of the probabilities of the states of nature and payoff for each given decision.  That is, given that we know the probabilities of the states of nature (probability of rain or no rain) and the payoffs for each state of nature for a given decision or act (wear boots, don't wear boots). The expected return for decision $a_i$ is then

\begin{equation}
E[d_i] = \sum_{j=1}^m p_{a_i, n_j} \times \Pr(n_j) 
\end{equation}

where $\Pr(n_j)$ is the probability of the $j$th state of nature.

Back to our weather example.  Let's say our weather app tells us  the probability of snow tomorrow is $0.3$, thus the expected return of wearing boots is $\mathit{ER}_{\mathit boots} = 0 \times 0.3 + -10 \times 0.7 = -7$.  Similarly, the expected return of not wearing boots is $ER_{\mathit \neg boots} = -5 \times 0.3 + 5 \times 0.7 = 2$.  If we use the strategy of _maximum expected return_, we would choose **not wear boots**.


|         | snow $(0.3)$| no snow $(0.7)$ | **maximum expected return strategy** |
|:-------:|:----:|:-------:|:--------------------:|
|**boots**    | $0$  | $-10$ | $0 \times 0.3 + -10 \times 0.7 = -7$ |
|**no boots** | $-5$ | $5$ | $-5 \times 0.3 + 5 \times 0.7 = \bf{2}$ |


What about our bicycle purchasing example?  Let's assume that Zed and Adrian take a look at the macro-economic indicators  estimate that the demand for bicycles this season for $10$, $30$, $50$, or $70$ bicycles have probabilities of $0.2$, $0.4$, $0.3$, and $0.1$ respectively.  

* Buying 40 bicylces is the recommended action under risk using _maximum expected value_.  Why?  Compute this for yourself (don't look at the answer before you try).

Computing the _maximum expected value_ is then just the `argmax` of the expected value:

\begin{equation} 
\mathrm{argmax}\:E[d] .
\end{equation}
 


## SUPPLEMENTAL

While we can do a lot of these computations by hand and with tables, why not try to program them and make reusable components for these things to be more valuable later.

### PREREQUISITES
You need a few things to get this code to run (but nothing unusual):

* [Python 2.7.13](http://python.org/)
* Anaconda for Python 2.7 [see download](https://continuum.io/download)
    * we will be using [Pandas](https://) which is packaged with Anaconda
   

### CODE IMPLEMENTATION

In [5]:
def compute_payoff(cost, demand, buy):
    if demand > buy:
        goodwill = 5*(demand-buy)
        leftover = 0
        sales = buy*100
    else:
        goodwill = 0
        leftover = 45*(buy-demand)
        sales = demand*100
        
    cost = cost*buy
    
    return sales - cost + leftover - goodwill 




Now that we have the payoff computed, we will move to build the payoff table given the demand and actions scenario.  We'll use the `actions` list to encode the tuple for `(number_of_bicycles, purchase_cost_per_bicycle)`.

In [3]:
import pandas as pd 

demands = [10,30,50,70]
actions = [(20,70),(40,67),(60,65),(80,64)]

And to build a table of the results:

In [34]:
print "| action/nature |  | | | |"
print "|:-------------:|--|--|--|--|"
cols = []
for d in demands:
    print "|Demand {}".format(d),
    row = []
    for b,c in actions:
        payoff = compute_payoff(c, d, b)
        row.append(payoff)
        print "|",payoff,"",
    print "|"
    cols.append(row)
    
df = pd.DataFrame(cols)

| action/nature |  | | | |
|:-------------:|--|--|--|--|
|Demand 10 | 50  | -330  | -650  | -970  |
|Demand 30 | 550  | 770  | 450  | 130  |
|Demand 50 | 450  | 1270  | 1550  | 1230  |
|Demand 70 | 350  | 1170  | 2050  | 2330  |


### CHOOSING THE BEST STRATEGY
A **maximax** strategy will choose to purchase 80 bicycles.  Why?

A **maximin** strategy will choose to purchase 20. Why?

### LET'S EXPLORE THE REGET TABLE

In [7]:
df # show the dataframe

Unnamed: 0,0,1,2,3
0,50,-330,-650,-970
1,550,770,450,130
2,450,1270,1550,1230
3,350,1170,2050,2330


In [29]:
df_regret = df.copy() # copy the payoff table, we will be turning it into a regret table later

In [8]:
def maximax(df):
    row_max = []
    for i, row in df.iteritems():
        amax = df[i].argmax()
        row_max.append((amax, df.iloc[amax, i]))
    return max(row_max, key=lambda(a,b): b)

In [10]:
maximax(df)

(3, 2330)

In [25]:
def maximin(df):
    row_min = []
    for i, row in df.iteritems():
        amin = df[i].argmin()
        row_min.append((amin, df.iloc[amin, i]))
    return max(row_min,key=lambda(a,b): b)

In [26]:
maximin(df)

(0, 50)

Great!  Now we can create a regret table as the algorithm shows, we take the max of each action and subtract the payoff for that payoff.

In [30]:
for i, row in df_regret.iteritems():
    df_regret.iloc[i] = max(df_regret.iloc[i])-df_regret.iloc[i]

In [31]:
df_regret

Unnamed: 0,0,1,2,3
0,0,380,700,1020
1,220,0,320,640
2,1100,280,0,320
3,1980,1160,280,0


In [32]:
def minimax(df):
    col_max = []
    for i, row in df.iteritems():
        amax = df[i].argmax()
        col_max.append((i, df.iloc[amax, i]))
    return min(col_max, key=lambda(a,b): b)

In [33]:
minimax(df_regret) # now we can compute the mininmax regret

(2, 700)

### CHALLENGE

* Write the Python code independently for the maximum expected value.