# Lecture Notes

## Rule-based Learning of simple concepts

> Any hypothesis foudn to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples.

### Find-S Algorithm

The Find-S algorithm finds the most specific hyothesis that fits all positive examples of the training set.

**Example**

Consider the following data set about which seeds are considered poisonous:

| Color | Toughness | Fungus | Appearance | Poisonous |
| --- | --- | --- | --- | --- |
| Green | Hard | No | Wrinkled | Yes |
| Green | Hard | Yes | Smooth | No |
| Brown | Soft | No | Wrinkled | No |
| Orange | Hard | No | Wrinkled | Yes |
| Green | Hard | Yes | Wrinkled | Yes |
| Orange | Hard | No | Wrinkled | Yes |

First, we consider the hypothesis to be the most specific hypothesis:

$ \theta = (\bot, \bot, \bot, \bot) $

When we consider example one, we see our initial hypothesis is more specific and we generalize its attributes.

$ \theta = (Green, Hard, No, Wrinkled) $

Looking at examples two and three, we see it is not a positive example within our target class $ c $, so we ignore it and our hypothesis remains unchanged.

Example four is positive, so we have to consider it. We compare each attribute of the current example with the current hypothesis and if any mismatch is found, we replace it with a general case "$ ? $".

$ \theta = (?, Hard, No, Wrinkled) $

If we keep repeating this step for every example in the training set, we get our output hypothesis vector:

$ \theta = (?, Hard, ?, Wrinkled) $

### Candidate Elimination Algorithm

The CEA is a supervised technique for learning concepts from data.

In concept learning, we have a dateset of objects labelled either as positive or negative, where positives are members of the target concept and negatives are not. The goal is to formulate a proper concept function of this data using the CEA.


**Example** 

| Sky | AirTemp | Humidity | Wind | Water | Forecast | *EnjoySport*
| --- | --- | --- | --- | --- | --- | --- |
| Sunny | Warm | Normal | Strong | Warm | Same | Yes |
| Sunny | Warm | High | Strong | Warm | Same | Yes |
| Rainy | Cold | High | Strong | Warm | Change | No |
| Sunny | Warm | High | Strong | Cool | Change | Yes |

We want a boolean function that holds true for all examples in which *EnjoySport* = Yes.
We call the hypothesis space $ H $ the set of all candidate hypotheses that the chosen representation can express.

There may be multiple hypotheses that fully capture positive elements in our data, but we want hypotheses which are consistent with the negative ones.

All the functions consistent with positive and negative objects (meaning they classify them correctly) constitute the version space of the target concept.

**Partial Ordering**

The ordering the CEA uses is induced by the relation "more general than or equally general as" ($ \geq $).

The most general hypothesis is the one that labels all examples as positive. Similarly, the most specific one is the one that always returns $ false $. 

**Finding the Version Space**

CEA processes the labelled objects one by one. If necessary, it specializes $ G $ and generalizes $ S $ so that $ V(G, S) $ (the version space whose general and specific boundaries are $ G $ and $ S $) correctly classifies all the objects processed so far.

```
Data: D (a dataset of objects labelled as positive or negative)
Result: V (version space of hypotheses consistent with D)
Initialize G to the set containing most general hypothesis
Initialize S to the set containing most specific hypothesis

for each object x in D:
    if x is a positive object:
        Remove from G any hypothesis inconsistent with x

        for each hypothesis in S that is consistent with x:
            Remove s from S

            Add to S all minimal generalizations h of s such that h is consistent with x
            and for some member g of G it holds that g >= h

            Remove from S any hypothesis that's more general than another hypothesis in S
    
    else:
        Remove from G any hypothesis inconsistent with x

        for each hypothesis g in G that is inconsistent with x:
            Remove g from G

            Add to G all the minimal specializations h of g such that h is consistent with x
            and for some member s of S it holds that s >= h

            Remove from S any hypothesis that's more general than another hypothesis in S
    
Return V as V(G, S)
```

## Example

**Initialization**

CEA initializes $ G $ to $ G_0 = \{\langle ?, ?, ?, ?, ?, ?\rangle\} $ and $ S $ to $ S_0 = \{ \langle \bot, \bot, \bot, \bot, \bot, \bot \rangle\} $.
The space bounded by $ S_0 $ and $ G_0, V(G_0, S_0) $ is the whole hypothesis space $ H $.

**Processing the first positive object**

The only hypothesis in $ G $ is $ \langle ?, ?, ?, ?, ?, ? \rangle $. 
This is consistent with the first object $ \langle Sunny, Warm, Normal, Strong, Warm, Same \rangle $, so $ G_1 = G_0 $ stays the same.

However, $ \langle \bot, \bot, \bot, \bot, \bot, \bot \rangle $ as the only hypothesis in $ S $ is not
consistent with the object, so we remove it.
The minimal generalization of $ \langle \bot, \bot, \bot, \bot, \bot, \bot \rangle $ that still covers the object $ \langle Sunny, Warm, Normal, Strong, Warm, Same \rangle $ is precisely the hypothesis $ \langle Sunny, Warm, Normal, Strong, Warm, Same \rangle $. 
It is also less general than the hypothesis in $ G_1 $, so the new specific boundary is now
$ S_1 = \{\langle Sunny, Warm, Normal, Strong, Warm, Same \rangle\}. $

**Processing the second positive object**

Again, the hypothesis in $ G $ is consistent with the positive example $ \langle Sunny, Warm, High, Strong, Warm, Same \rangle $, so no changes to the general boundary are necessary. $ G_2 = G_1 $

The hypothesis $ \langle Sunny, Warm, Normal, Strong, Warm, Same \rangle $ from $ S_1 $ is overly
specific, so we remove it from the boundary. 
Its minimal generalization that covers the second positive object is $ \langle Sunny, Warm, ?, Strong, Warm, Same \rangle $ which is less general than $ \langle \bot, \bot, \bot, \bot, \bot, \bot \rangle $ from $ G_2 $, so $ S_2 = \{\langle Sunny, Warm, ?, Strong, Warm, Same \rangle\} $.

**Processing the first negative object**

The third example is negative, so we enter the `else`-block within the main loop.

Since $ S_2 $ is consistent with the example $ \langle Rainy, Cold, High, Strong, Warm, Change \rangle $, we don't change it such that $ S_3 = S_2 $.

We can see that $ \langle ?, ?, ?, ?, ?, ? \rangle $ is too general and remove it from the boundary.
Next, we can see that there are six minimal specializations of the removed hypothesis that are consistent with the example:
$ \langle Sunny, ?, ?, ?, ?, ? \rangle, \langle ?, Warm, ?, ?, ?, ? \rangle, \langle ?, ?, Normal, ?, ?, ? \rangle, \langle ?, ?, ?, Weak, ?, ? \rangle, \langle, ?, ?, ?, ?, Cold, ? \rangle, \langle ?, ?, ?, ?, ?, Same \rangle $

> To specialize a hypothesis so that it correctly classifies a negative object, we replace each ? in the hypothesis with values other than the one in the object.
So, we have $ Sunny $ instead of $ Rainy $, $ Warm $ instead of $ Cold $ and so on.

However, only $ \langle Sunny, ?, ?, ?, ?, ? \rangle $, $ \langle ?, Warm, ?, ?, ?, ? \rangle $ and $ \langle ?, ?, ?, ?, ?, Same \rangle $ are more general than the hypothesis $ \langle Sunny, Warm, ?, Strong, Warm, Same \rangle $ we have in $ S_3 $.

A hypothesis that isn't more general than *at least one* hypothesis in the specific boundary doesn't classify all the positive examples as positive. That's why we keep only the minimal specializations that are $ \geq $ than the hypothesis in $ S_3 $.

**Processing the third positive object**

Since the hypothesis $ \langle ?, ?, ?, ?, ?, Same \rangle $ is inconsistent with the object $ \langle Sunny, Warm, High, Strong, Cool, Change \rangle $, we remove it.

The object $ \langle Sunny, Warm, High, Strong, Cool, Change \rangle $ shows that $ S_3 $ is overly specific, so we generalize the hypothesis to $ \langle Sunny, Warm, ?, Strong, ?, ? $ and we get a new boundary $ S_4 $.

Since all examples have been processed, the algorithm returns the vector space $ V(G_4, S_4) $.

### Inductive Bias

> The policy by which  a [learning] algorithm generalizes from observed training examples to classify unseen instances is its inductive bias.
> Inductive bias is the set of assumptions that, together with the training data, deductively justify the classification by the learner to future instances.

* A learning algorithm without inductive bias has no directive to classifcy unseen examples. Put another way, it cannot *generalize*
* A learning algorithm without inductive bias can only *memorize*