# **Simulating Color Blindness**

---

#### **Description**


#### **Contents**

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.colors

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn import metrics

### **Loading the Data**

The model will be trained to take in the RGB values of the ***True color*** and produce the RGB values for what it would look like to someone with ***Protanopia***, ***Deuteranopia***, or ***Tritanopia***.

So we want to separate the **true data** from the **other RGB data**. 

In [3]:
df = pd.read_csv("color_blind_RGB.csv")

true_data = df[['true_red','true_green','true_blue']].copy()
other_data = df.drop(columns=['true_red', 'true_green', 'true_blue'])

### **Split, Train, Test**

Possible sklearn models to use:
* LinearRegression
* DecisionTreeRegressor
* RandomForestRegressor


Feel free to break up `other_data` and train your model on one type of color vision deficiency at a time, make note of whether this affects the performance.

In [4]:
X_train, X_test, y_train, y_test = train_test_split(true_data, other_data, test_size=0.2, random_state=42)

reg = LinearRegression()
reg.fit(X_train, y_train)
pred = reg.predict(X_test)

print("R2: " + str(metrics.r2_score(y_test, pred)))
print("MSE: " + str(metrics.mean_squared_error(y_test, pred)))
print("MAE: " + str(metrics.mean_absolute_error(y_test, pred)))

R2: 0.952213879916472
MSE: 236.63232163653115
MAE: 10.624655616329685


### **View Results**

---

##### **Step 1:** Helper Functions

The following functions will help format the output table to be visually digestible:

* `make_hex_col()`: Combines the red, green, and blue columns of a vision type into one hexadecimal code that will be used to color the background of a table cell. Takes a dataframe object and color vision type *(true, prot, deut, trit)* as parameters and returns a list of all the RGB color samples as HEX codes, ready to be added to a dataframe as a new column. This allows the `output_cell_styling` function to show us the exact color of a sample. <br />
To learn more about the process of converting RGB to HEX, see this [infographic](https://drive.google.com/file/d/17SjN9rsOv57Y7V1nECDT1R4lGu2yLHq7/view?usp=sharing).

* `text_color_from_ratio()`: Takes a hex code as a parameter and uses standards set by [WCAG](https://www.w3.org/TR/WCAG21/#contrast-minimum) to determine whether text on top of this hex color should be **black** or **white** for visibility. This function determines the [contrast ratio](https://www.w3.org/TR/WCAG21/#dfn-contrast-ratio) of the input color to white by computing the [relative luminance](https://www.w3.org/TR/WCAG21/#dfn-relative-luminance) of the input color. The contrast ratio will determine whether black or white will stand out better against the background color and can be easily read. To learn more about the math involved, visit any of the linked webpages.

* `output_cell_styling()`: Takes a feature column as a parameter and returns a list of styling instructions for each column value as strings. These styling instructions are CSS ([cascading style sheets](https://www.w3schools.com/css/css_intro.asp)) ***attribute:value*** pairs. The primary attributes to consider are:
    - **Background Color** `background-color` - Accepts hex codes (ex: #012D9C) and some standard [color names](https://developer.mozilla.org/en-US/docs/Web/CSS/named-color) (ex: 'red'). At a glance, the background should show us exactly what the sample color looks like *(hex code)* and the colors of the predicted/actual labels *(word name)*.
    - **Text Color** `color` - Accepts hex codes (ex: #012D9C) and some standard [color names](https://developer.mozilla.org/en-US/docs/Web/CSS/named-color) (ex: 'red'). We want to manipulate the color of the text so that it can be properly seen/read against the background color.
    
    When the name of this function is used as a parameter in this method call `df.style.apply()` it is called on each feature column in `df`.

In [5]:
# Combine the RGB features into one hex code for each sample using string formatting
# The model may have predicted some values outside the 0-255 range
#   - Any negative RGB value is set to 0
#   - Any RGB value above 255 is set to 255
# Return all hex codes as a list so it can be added as a new column to the dataframe
def make_hex_col(frame, cv_type):
    hex_vals = []
    for index, row in frame.iterrows():
        r = 0 if int(row[cv_type + '_red']) < 0 else 255 if int(row[cv_type + '_red']) > 255 else int(row[cv_type + '_red']) 
        g = 0 if int(row[cv_type + '_green']) < 0 else 255 if int(row[cv_type + '_green']) > 255 else int(row[cv_type + '_green']) 
        b = 0 if int(row[cv_type + '_blue']) < 0 else 255 if int(row[cv_type + '_blue']) > 255 else int(row[cv_type + '_blue']) 
        hex_vals.append(('#%02X%02X%02X' % (r, g, b)))
    return hex_vals

# Revert color parameter (hex or label name) to RGB for contrast ratio analysis
# Dark colors need white text and light colors need black text for appropriate visibility
# Learn more about this process by visiting the resources linked above
def text_color_from_ratio(hex):
    rgb = matplotlib.colors.to_rgb(hex.lower())
    lum  = (rgb[0] / 12.92 if rgb[0] <= 0.03928 else ( (rgb[0] + 0.055) / 1.055) ** 2.4) * 0.2126
    lum += (rgb[1] / 12.92 if rgb[1] <= 0.03928 else ( (rgb[1] + 0.055) / 1.055) ** 2.4) * 0.7152
    lum += (rgb[2] / 12.92 if rgb[2] <= 0.03928 else ( (rgb[2] + 0.055) / 1.055) ** 2.4) * 0.0722
    ratio = lum if lum > 1 else 5 if lum == 0 else 1/lum
    return 'white' if ratio >= 4.5 else 'black'

# For styling hex columns: center-align text, set bg color to hex, color text for visibility
def output_cell_styling(column):
    return ['text-align: center; color: ' + text_color_from_ratio(val) + '; background-color: ' + val.lower() for val in column]

##### **Step 2:** Initial Results Comparison

Compile the actual and predicted colors for all vision types into a single dataframe for easy side-by-side comparison. The three RGB columns for all of the true/actual/predicted colors should be simplified into one column of hex codes. In total, there should be 7 columns: 1 for true colors and 2 (actual & predicted) for each of the three color vision deficiency types.

However, before calling `make_hex_col()` on any of the predicted data, `pred` must be formatted as a dataframe with the same column names as `y_test`.

From here, you could opt to skip steps 3 & 4 and simply run this line of code to see the results without more advanced styling:

```python
    results_df.style.apply(output_cell_styling)
```

In [6]:
pred_df = pd.DataFrame(pred, columns=y_test.columns)

results_df = pd.DataFrame()
results_df['true'] = make_hex_col(X_test, 'true')
results_df['prot actual'] = make_hex_col(y_test, 'prot')
results_df['prot pred'] = make_hex_col(pred_df, 'prot')
results_df['deut actual'] = make_hex_col(y_test, 'deut')
results_df['deut pred'] = make_hex_col(pred_df, 'deut')
results_df['trit actual'] = make_hex_col(y_test, 'trit')
results_df['trit pred'] = make_hex_col(pred_df, 'trit')

results_df

Unnamed: 0,true,prot actual,prot pred,deut actual,deut pred,trit actual,trit pred
0,#A3F548,#FADE44,#FFD850,#FFD8A5,#FFCE76,#BDE4F6,#C7D8E9
1,#10C704,#C1AA00,#C49600,#D8A12E,#CA8D2E,#5CB7C6,#3FA8B6
2,#43BDED,#A0ADE3,#8FABD9,#98ADF2,#8EABE5,#2BC1D0,#43C5D4
3,#AA91A9,#9496AB,#92A1AD,#9F93A6,#A09FA7,#A7929D,#A699A4
4,#09435C,#394058,#30474F,#33405E,#33474E,#00474D,#0C4F55
...,...,...,...,...,...,...,...
1006,#BE83DC,#7996EA,#7DA0DF,#8497D4,#8CA0CC,#B1909B,#AD96A1
1007,#52703D,#72673D,#76713D,#7D6343,#816D45,#5C6972,#606E76
1008,#873969,#4A5579,#455D70,#585462,#555C5A,#824348,#7D4A4F
1009,#A73DE2,#006EE5,#346FE3,#0074CA,#4072BC,#8F6870,#876067


##### **Step 3:** Prepare Results for Styling

In the end, we want our output table to have two header rows: 
* the top row will separate the columns into the 4 color vision types
* the second row will denote the "Actual" and "Predicted" columns for each type

To do this, we need to add an additional column to the results dataframe to serve as the "Predicted" column for the True color type. This will allow the second line of the following code to properly reformat the columns. Don't worry! This extra column will essentially be removed in Step 4.

In [7]:
results_df.insert(1, 'extra', len(pred)*['#000000'], True)
results_df.columns = columns=pd.MultiIndex.from_product([['True', 'Protanopia', 'Deuteranopia', 'Tritanopia'],['Actual', 'Predicted']])

##### **Step 4:** Style and Show Results



In [8]:
results_df.style.apply(output_cell_styling).hide([('True', 'Predicted')], axis="columns") \
    .set_table_styles([{'selector': 'th', 'props': 'text-align: center;'}], overwrite=False) \
    .set_table_styles({(x, 'Actual'): [{'selector': 'td, th', 'props': 'border-left: 4px solid #d5d5d5;'}] for x in ['Protanopia', 'Deuteranopia', 'Tritanopia']}, overwrite=False)

Unnamed: 0_level_0,True,Protanopia,Protanopia,Deuteranopia,Deuteranopia,Tritanopia,Tritanopia
Unnamed: 0_level_1,Actual,Actual,Predicted,Actual,Predicted,Actual,Predicted
0,#A3F548,#FADE44,#FFD850,#FFD8A5,#FFCE76,#BDE4F6,#C7D8E9
1,#10C704,#C1AA00,#C49600,#D8A12E,#CA8D2E,#5CB7C6,#3FA8B6
2,#43BDED,#A0ADE3,#8FABD9,#98ADF2,#8EABE5,#2BC1D0,#43C5D4
3,#AA91A9,#9496AB,#92A1AD,#9F93A6,#A09FA7,#A7929D,#A699A4
4,#09435C,#394058,#30474F,#33405E,#33474E,#00474D,#0C4F55
5,#52A980,#A19778,#9B9A78,#AD9286,#A2968A,#62A1AE,#60A5B1
6,#B1AA2E,#BAA531,#C9AA3D,#D09C37,#DFA24D,#BB9DA9,#C89AA5
7,#D282E5,#7B9CF9,#7FA5EB,#8A9DDB,#8FA5D3,#C5919D,#BE97A2
8,#35D667,#CEBA5E,#C4AF5B,#E3B173,#C8A882,#66C8D8,#53C4D3
9,#D32248,#726D62,#4B5D5F,#86683E,#665B3C,#D12D2E,#C63437
