# Case-based Reasoning System

<h5>Yefferson A. Marín Cantero</h5>
Artificial intelligence course <br>
Systems Engineering Programa <br>
Universidad Tecnológica de Bolívar <br>
Cartagena de Indias, D.T. y C - Bolívar <br>
1p-2020

## Definitions
A **Case-based reasoning** (CBR) is a paradigm of artificial intelligence and cognitive science that models the reasoning process as primarily memory based. Case-based reasoners systems solve new problems by retrieving stored ‘cases’ describing similar prior problem-solving episodes and adapting their solutions to fit new needs. 

Case-based reasoning has been [formalized](https://www.idi.ntnu.no/emner/tdt4171/papers/AamodtPlaza94.pdf) for purposes of computer reasoning as a four-step process:

- **1. Retrieve** Given a target problem, retrieve from memory cases relevant to solving it. A case consists of a problem, its solution, and, typically, annotations about how the solution was derived. 
- **2. Reuse:** Map the solution from the previous case to the target problem. This may involve adapting the solution as needed to fit the new situation. 
- **3. Revise:** Having mapped the previous solution to the target situation, test the new solution in the real world (or a simulation) and, if necessary, revise. 
- **4. Retain:** After the solution has been successfully adapted to the target problem, store the resulting experience as a new case in memory. 

## Introduction
This algorithm is done using the following requirements:

* [Python](https://python.org)
* [Pandas](https://pandas.pydata.org/)

And its taking a **library cases** stored in [input/library.csv](input/library.csv) to get the Case-based reasoning from test **problem cases** stored in [input/cases.csv](input/cases.csv).

The purpose is designing a system that fits the test cases into the base library cases in order to find the most appropiate solution. 

## Steps
### 1. Retrieve
#### Library and problem cases
First it's required to retrieve the base cases or the library of relevant cases with their data (used to compute) and its solutions. Also we need to retrieve our test problem cases.

### 2. Reuse
#### One-hot enconding
Machine learning algorithms cannot work with categorical data directly, this must be converted to numbers. This technique is called one-hot encoding and its a representation of categorical variables as binary vectors.

We must transform our library and problem cases by firstly requiring that the categorical values be mapped to integer values, and then, representing each integer value as a binary vector that is all zero values except the index of the integer, which is marked with a 1.

#### Mahalanobis distance
Having mapped both (library and problem cases), we should define a similarity comparision method, the best match is the Mahalanobis distance which is an effective multivariate distance metric that measures the distance between a point (P) and a distribution (D). It is an extremely useful metric having, excellent applications in multivariate anomaly detection, classification on highly imbalanced datasets and one-class classification. 

Mahalanobis distance is widely used in cluster analysis and classification techniques, as a multi-dimensional generalization of the idea of measuring how many standard deviations away P is from the mean of D.

$$ D(\overrightarrow{u}, \overrightarrow{v}) = \sqrt{(\overrightarrow{u}-\overrightarrow{u})V^{-1}(\overrightarrow{u}-\overrightarrow{v})^T} $$

Where $\overrightarrow{u}$ and $\overrightarrow{v}$ are arrays, and $V$ The inverse of the covariance matrix.

### 3. Revise
After compare the shortest distances, we get the solution based on proximity calculated (similarity) with library cases.

### 4. Retain 
We store them as a new case on the base cases library.  

## Implementation
Initially we import the needed libraries, as it follows:

In [1]:
import pandas as pd

### Library and problem cases 
The we get our **library cases** stored in [input/library.csv](input/library.csv) and the test **problem cases** stored in [input/cases.csv](input/cases.csv).

In [2]:
# Read
library = pd.read_csv('input/library.csv')

# Show
library

Unnamed: 0,Outlook,Temperature,Humidity,Windy,Play
0,Sunny,Hot,High,False,No
1,Sunny,Hot,High,True,No
2,Overcast,Hot,High,False,Yes
3,Rainy,Mild,High,False,Yes
4,Rainy,Cool,Normal,False,Yes
5,Rainy,Cool,Normal,True,No
6,Overcast,Cool,Normal,True,Yes
7,Sunny,Mild,High,False,No
8,Sunny,Cool,Normal,False,Yes
9,Rainy,Mild,Normal,False,Yes


In [3]:
# Read
cases = pd.read_csv('input/cases.csv')

# Show
cases

Unnamed: 0,Outlook,Temperature,Humidity,Windy
0,Sunny,Mild,Normal,False
1,Rainy,Cool,Normal,False
2,Overcast,Cool,High,False
3,Sunny,Cool,High,True
4,Rainy,Hot,High,True
5,Rainy,Cool,High,True


We can verify which kind of data its represented

In [4]:
library.dtypes

Outlook         object
 Temperature    object
 Humidity       object
 Windy          object
 Play           object
dtype: object

In [5]:
cases.dtypes

Outlook         object
 Temperature    object
 Humidity       object
 Windy          object
dtype: object

### One-hot encoding
Now, as we verified previously, our data is categorical, so we are going to convert them using one-hot encoding method, using the *panda*'s method [get_dummies()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html) method:

In [6]:
# Encode
library_encoded = pd.get_dummies(library)

# Show
library_encoded

Unnamed: 0,Outlook_Overcast,Outlook_Rainy,Outlook_Sunny,Temperature_ Cool,Temperature_ Hot,Temperature_ Mild,Humidity_ High,Humidity_ Normal,Windy_ False,Windy_ True,Play_ No,Play_ Yes
0,0,0,1,0,1,0,1,0,1,0,1,0
1,0,0,1,0,1,0,1,0,0,1,1,0
2,1,0,0,0,1,0,1,0,1,0,0,1
3,0,1,0,0,0,1,1,0,1,0,0,1
4,0,1,0,1,0,0,0,1,1,0,0,1
5,0,1,0,1,0,0,0,1,0,1,1,0
6,1,0,0,1,0,0,0,1,0,1,0,1
7,0,0,1,0,0,1,1,0,1,0,1,0
8,0,0,1,1,0,0,0,1,1,0,0,1
9,0,1,0,0,0,1,0,1,1,0,0,1


In [7]:
# Encode
cases_encoded = pd.get_dummies(cases)

# Show
cases_encoded

Unnamed: 0,Outlook_Overcast,Outlook_Rainy,Outlook_Sunny,Temperature_ Cool,Temperature_ Hot,Temperature_ Mild,Humidity_ High,Humidity_ Normal,Windy_ False,Windy_ True
0,0,0,1,0,0,1,0,1,1,0
1,0,1,0,1,0,0,0,1,1,0
2,1,0,0,1,0,0,1,0,1,0
3,0,0,1,1,0,0,1,0,0,1
4,0,1,0,0,1,0,1,0,0,1
5,0,1,0,1,0,0,1,0,0,1
