In [18]:
from CBR.casebase import *

small_sets = False
size = 300 if small_sets else -1

# A link to the (processed) csv file. 
# Feel free to try some different values. 
# csv = "data/mushrooms.csv"
csv = "data/admission.csv"
# csv = "data/tort.csv"
# csv = "data/welfare.csv"

# Load the case base with logistic regression orders.
CB = casebase(
    csv,
    verb=True, # The verbose mode prints the dimension order information. 
    method='logreg',
    size=size, # Truncates the size of the resulting case base to 'size', 
               # or uses the full size of the csv if size == -1. 
    shuffle=True,
    )

# 'analyze' is a conenience functions that bundles the other analysis functions,
# check the casebase.py file to see (or change) what it does.
CB.analyze()


Printing dimension orders.
GRE Score: Ascending (0.47)
TOEFL Score: Ascending (0.61)
University Rating: Descending (-0.15)
SOP: Descending (-0.36)
LOR : Ascending (0.74)
CGPA: Ascending (1.77)
Research: Ascending (0.02)
Computing the forcing relation on cases.


100%|██████████| 500/500 [00:00<00:00, 2385.86it/s]


The consistency is 456 / 500 = 91.2%.
Of the 39 cases with label 0 there are 8 inconsistent ones.
Of the 461 cases with label 1 there are 36 inconsistent ones.
Computing best citability information.


100%|██████████| 500/500 [00:02<00:00, 237.25it/s]

[PR] Mean and std. of best citability is 50.5 ± 100.2.
Top forcing landmark with outcome 0:
-------------------------------
d                  c[d].value
-----------------  ------------
GRE Score          316
TOEFL Score        105
University Rating  2
SOP                2.5
LOR                2.5
CGPA               8.2
Research           1
-------------------------------
Outcome: 0.0
-------------------------------
Top forcing landmark with outcome 1:
-------------------------------
d                  c[d].value
-----------------  ------------
GRE Score          304
TOEFL Score        103
University Rating  5
SOP                5
LOR                3
CGPA               7.92
Research           0
-------------------------------
Outcome: 1.0
-------------------------------

Number of landmarks for outcome 0: 23
Number of landmarks for outcome 1: 67

Number of ordinary (aka trivial) cases for either class:
For outcome 0: 39 - 23 = 16
For outcome 1: 461 - 67 = 394
Removals to obtain cons.:




In [19]:
# Load the first two cases in the case base in variables a and b. 
a = CB[1]
b = CB[7]

# Print a comparison between the cases a and b.
# This shows for each dimension d the values a(d) and b(d),
# the relation between them (so a(d) < b(d), or a(d) > b(d), etc).
CB.compare(a, b)

# Dimensions on which b is not better than a are called 'relevant differences.
# These can be computer with the .diff function of cases.
# Note that these are also indicated by the 'compare' function.
relevant_differences = a.diff(b)
print(list(relevant_differences))

Dimension              a   R   b       F
-----------------  -----  ---  -----  ---
GRE Score          310.0   >   305.0   X
TOEFL Score         99.0   <   107.0
University Rating    2.0   =   2.0
SOP                  1.5   >   2.5     X
LOR                  2.0   <   2.5
CGPA                 7.3   <   8.42
Research             0.0   =   0.0
Label                1.0       1.0
['GRE Score', 'SOP']


In [20]:
# As we know, if c has outcome s then c forces the outcome of d for s iff D(c, d) = {}. 
# This means we can use the .diff function to check whether a CB forces the outcome of a case.
a = CB[3] # change to CB[0] for an example of a case that was not forced. 
force = False
for b in CB:
    if not a == b:
        if list(b.diff(a)) == []:
            force = True
            print("The outcome of the input case a was forced by a case b in the case base:")
            CB.compare(b, a)
            break

if not force:
    print("The outcome of case a was not forced by CB.")

The outcome of the input case a was forced by a case b in the case base:
Dimension              a   R   b       F
-----------------  -----  ---  -----  ---
GRE Score          315.0   >   314.0
TOEFL Score        105.0   =   105.0
University Rating    2.0   =   2.0
SOP                  2.0   >   2.5
LOR                  2.5   >   2.0
CGPA                7.65   >   7.64
Research             0.0   =   0.0
Label                0.0       1.0


In [21]:
# Compute the size of the case base, and make a ~80% split. 
CB_size = len(list(CB.inds))
split = CB_size - (CB_size // 5)
CB_train = CB[:split]
CB_test = CB[split:]
print(f"Total size of CB: {CB_size}")
print(f"Taking first {split} for train, and {CB_size - split} for test.")

# Compute whether the first case of the test split is forced by the train split. 
a = CB_train[0]
for b in CB_train:
    if not a == b:
        if list(b.diff(a)) == []:
            force = True
            print("The outcome of the input case a was forced by a case b in the training case base:")
            print(a)
            print(b)
            break

Total size of CB: 500
Taking first 400 for train, and 100 for test.
The outcome of the input case a was forced by a case b in the training case base:
-------------------------------
d                  c[d].value
-----------------  ------------
GRE Score          335
TOEFL Score        117
University Rating  5
SOP                5
LOR                5
CGPA               9.56
Research           1
-------------------------------
Outcome: 1.0
-------------------------------
-------------------------------
d                  c[d].value
-----------------  ------------
GRE Score          304
TOEFL Score        103
University Rating  5
SOP                5
LOR                3
CGPA               7.92
Research           0
-------------------------------
Outcome: 1.0
-------------------------------
