# tuuli CBR Notebook

## Step 1: Generate cases

This notebook lets us test our CBR stack with [clood](https://github.com/RGU-Computing/clood). We create a new project, add cases and retrieve the most similar cases based on search criteria, e.g. predicted birth date and nutrition preference.

In [123]:
import pandas as pd
import numpy as np
import random
import requests
import json
import uuid

Define a few example persons. Todo: Should we use trimesters instead of predicted birth dates? Advantage with predicted birth date is that a similarity function for dates is built in clood.

In [204]:
persons = [
    {'age': 24, 'predicted_birth_date': '2023-06-03', 'nutrition_preference': 'vegan'},
    {'age': 24, 'predicted_birth_date': '2023-09-03', 'nutrition_preference': 'vegetarian'},
    {'age': 24, 'predicted_birth_date': '2023-09-03', 'nutrition_preference': 'vegan'},
    {'age': 27, 'predicted_birth_date': '2023-03-01', 'nutrition_preference': 'flexitarian'}
]

Nugget flows are the order in which the nuggets are delivered to the user. '1,2,3,4' therefore means that the user received nuggets 1,2,3 and 4 in that order. We might need to define an appropriate similiarity function. Currently those are saved as strings.

In [205]:
nugget_flows = [
    [1,2,3,4],
    [4,3,2,1],
    [2,3,4,1]
]

Next we combine each person with each nugget flow. Therefore the number of example cases will be the number of persons multiplied with the number of nugget flows.

In [206]:
cases = []
for p in persons:
    for f in nugget_flows:
        n = p.copy()
        n['nugget_flow'] = str(f)
        cases.append(n)

Let's have a look at the example cases we have generated

In [207]:
df = pd.DataFrame.from_records(cases)
df

Unnamed: 0,age,predicted_birth_date,nutrition_preference,nugget_flow
0,24,2023-06-03,vegan,"[1, 2, 3, 4]"
1,24,2023-06-03,vegan,"[4, 3, 2, 1]"
2,24,2023-06-03,vegan,"[2, 3, 4, 1]"
3,24,2023-09-03,vegetarian,"[1, 2, 3, 4]"
4,24,2023-09-03,vegetarian,"[4, 3, 2, 1]"
5,24,2023-09-03,vegetarian,"[2, 3, 4, 1]"
6,24,2023-09-03,vegan,"[1, 2, 3, 4]"
7,24,2023-09-03,vegan,"[4, 3, 2, 1]"
8,24,2023-09-03,vegan,"[2, 3, 4, 1]"
9,27,2023-03-01,flexitarian,"[1, 2, 3, 4]"


## Step 2: Set up clood

Create a new clood project

In [208]:
base_url = 'http://93.90.192.115:3000/dev/'
project_name = str(uuid.uuid4())

In [209]:
url = base_url + 'project'
data = {"retainDuplicateCases": False, "name": project_name}
request = requests.post(url, json = data)

In [210]:
project_id = json.loads(request.text)['id__']
project_casebase = project_id  + "_casebase"

Define attributes

In [211]:
url = base_url + 'project/' + project_id
attributes = {
  "name": project_name,
  "casebase": project_casebase,
  "attributes": [
    {
      "similarity": "Nearest Number",
      "name": "age",
      "weight": 1,
      "type": "Integer",
    },
    {
      "similarity": "Nearest Date",
      "name": "predicted_birth_date",
      "weight": 1,
      "type": "Date",
    },
    {
      "similarity": "BM25",
      "name": "nutrition_preference",
      "weight": 1,
      "type": "String",
    },
    {
      "similarity": "Semantic USE",
      "name": "nugget_flow",
      "weight": 1,
      "type": "String",
    }
  ],
  "hasCasebase": True,
  "description": "CBR testing...",
  "id__": project_id
}
request = requests.put(url, json = attributes)


Add cases

In [212]:
url = base_url + 'case/' + project_id + '/list'
request = requests.post(url, json = cases)
print(request)

<Response [201]>


## Step 3: Retrieve the recommended and most similar cases

We set some search criteria and let CBR select the most similar cases. You can play around and with this and for example change the nugget_flow (aka the previously shown nuggets), age and/or predicted birth date.

In [220]:
url = base_url + 'retrieve'
data = {
    "data": [
        {
            "name": "age",
            "type": "Integer",
            "similarity": "Nearest Number",
            "weight": 1,
            "unknown": False,
            "strategy": "Best Match",
            "value": 25
        },
        {
            "name": "predicted_birth_date",
            "type": "Date",
            "similarity": "Nearest Date",
            "weight": 1,
            "unknown": False,
            "strategy": "Best Match",
            "value": "2023-09-02"
        },
        {
            "name": "nutrition_preference",
            "type": "String",
            "similarity": "BM25",
            "weight": 5,
            "unknown": False,
            "strategy": "Best Match",
            "value": "vegetarian"
        },
        {
            "name": "nugget_flow",
            "type": "String",
            "similarity": "Semantic USE",
            "weight": 1,
            "unknown": True,
            "strategy": "Best Match",
            "value": "[4,3,2]"
        }
    ],
    "topk": 5,
    "globalSim": "Weighted Sum",
    "explanation": True,
    "project": {
        "retainDuplicateCases": False,
        "name": project_name,
        "casebase": project_casebase,
        "attributes": [
      {
        "similarity": "Nearest Number",
        "name": "age",
        "weight": 1,
        "type": "Integer",
      },
      {
        "similarity": "Nearest Date",
        "name": "predicted_birth_date",
        "weight": 1,
        "type": "Date",
      },
      {
        "similarity": "BM25",
        "name": "nutrition_preference",
        "weight": 1,
        "type": "String",
      },
      {
        "similarity": "Semantic USE",
        "name": "nugget_flow",
        "weight": 1,
        "type": "String",
      }
    ],
        "hasCasebase": True,
        "description": "CBR testing...",
        "id__": project_id
    }
}
request = requests.post(url, json = data)
print(request)

<Response [200]>


Recommended case

In [219]:
pd.DataFrame(json.loads(request.text)['recommended'], index=[0])

Unnamed: 0,age,predicted_birth_date,nutrition_preference,nugget_flow
0,25,2023-09-02,meat,"[4, 3, 2, 1]"


The most similar cases

In [215]:
pd.DataFrame(json.loads(request.text)['bestK'])

Unnamed: 0,age,predicted_birth_date,nutrition_preference,nugget_flow,score__,match_explanation
0,24,2023-09-03,vegan,"[1, 2, 3, 4]",5.464733,"[{'field': 'age', 'similarity': 0.999}, {'fiel..."
1,24,2023-09-03,vegan,"[4, 3, 2, 1]",5.464733,"[{'field': 'age', 'similarity': 0.999}, {'fiel..."
2,24,2023-09-03,vegan,"[2, 3, 4, 1]",5.464733,"[{'field': 'age', 'similarity': 0.999}, {'fiel..."
3,24,2023-06-03,vegan,"[1, 2, 3, 4]",5.464487,"[{'field': 'age', 'similarity': 0.999}, {'fiel..."
4,24,2023-06-03,vegan,"[4, 3, 2, 1]",5.464487,"[{'field': 'age', 'similarity': 0.999}, {'fiel..."


In [203]:
pd.DataFrame(json.loads(request.text)['bestK'])['match_explanation'].values

array([list([{'field': 'age', 'similarity': 0.999}, {'field': 'predicted_birth_date', 'similarity': 0.99999726}]),
       list([{'field': 'age', 'similarity': 0.999}, {'field': 'predicted_birth_date', 'similarity': 0.99999726}]),
       list([{'field': 'age', 'similarity': 0.999}, {'field': 'predicted_birth_date', 'similarity': 0.99999726}]),
       list([{'field': 'age', 'similarity': 0.999}, {'field': 'predicted_birth_date', 'similarity': 0.9997506}]),
       list([{'field': 'age', 'similarity': 0.999}, {'field': 'predicted_birth_date', 'similarity': 0.9997506}])],
      dtype=object)