# Results Analysis

This Notebook is meant to analyse and visualise the results of the experiments. Here we will compare both the model and human survey results to the ground truth, while evaluating how they are impacted by different visual features.

## Expected File Structure

The code expects there to be a “ResNet” folder at the same level as the Notebook, inside said folder there should be a file called “Model Test Results.json”, containing the results from the neural network. It expects there to be a folder “Human Classification Test”, again at the same level as the Notebook, inside of which there should be a “Simple Human Results.json” file, with the majority results of the human survey takers. There should also be a “Visual Features” folder at the same level as the Notebook, inside it there should the following files that store how characters are classified according to different visual features: “Age.json”, “Apparent Gender.json”, “Hair Colour.json”, and “i2v.json”.

## Cell Group Independence

The Cells in this Notebook are divided into groups marked by bold titles such as the one above. Since most Cell Groups are responsible for evaluating the impact of a single visual feature on the results, they are designed to be independent of each other, so that, when analysing the impact of a single feature, the user does not need to run the entire Notebook.

All Cell Groups require both the “Loading Results” and the “Overall Results” to be run beforehand. Some specific Cell Groups will have other groups as requirements, but those will be made clear when needed.

## Loading Results

This cell group is responsible for loading the model and human survey results, as well as structuring them in dictionaries as detailed in the README file, which will be used in the entire Notebook.

In [None]:
# Loading probabilistic model results and generating a simple version from them

import json

modelDict = json.load(open("../ResNet/Model Test Results.json", "r"))
simpleModelDict = {}

for role in modelDict:
    simpleModelDict[role] = {}
    for character in modelDict[role]:
        if modelDict[role][character][0] > modelDict[role][character][1]:
            simpleModelDict[role][character] = "Main"
        else:
            simpleModelDict[role][character] = "Supporting"

In [None]:
# Loading simple human results

simpleHumanDict = json.load(open("Human Classification Test/Simple Human Results.json", "r"))

## Class Distribution

This Cell Group calculates and visualises how the data is distributed between the two classes, according to all three classification methods.

Some variables and lists initialised here will be used in other Cell Groups.

In [None]:
# Creating the dictionaries to store the class distribution

totalCharacters = 0
count = {"Truth": {}, "Model": {}, "Human": {}}
count["Truth"] = {"Main": 0, "Supporting": 0}
count["Model"] = {"Main": 0, "Supporting": 0}
count["Human"] = {"Main": 0, "Supporting": 0}

for role in modelDict:
    totalCharacters += len(modelDict[role])
    count["Truth"][role] = len(modelDict[role])
    for character in modelDict[role]:
        count["Model"][simpleModelDict[role][character]] += 1
        count["Human"][simpleHumanDict[role][character]] += 1
        
percentages = {"Truth": {}, "Model": {}, "Human": {}}
for group in count:
    for role in count[group]:
        percentages[group][role] = (count[group][role]/totalCharacters)*100
print(percentages)

In [None]:
# Bar graph to visualise the class distribution

import matplotlib.pyplot as plt

keys = list(percentages.keys())
mainResults = []
for group in percentages:
    mainResults.append(percentages[group]["Main"])
supportingResults = []
for group in percentages:
    supportingResults.append(percentages[group]["Supporting"])

axes = plt.gca()
axes.set_ylim([0,65])
plt.bar(keys, mainResults, -0.2, color="b", label="Main", align="edge")
for index, value in enumerate(mainResults):
    plt.text(index-0.2, value-9, round(value, 2), color="w")
plt.bar(keys, supportingResults, 0.2, color="grey", label="Supporting", align="edge")
for index, value in enumerate(supportingResults):
    plt.text(index, value-9, round(value, 2), color="black")

plt.xlabel("Method of classification")
plt.ylabel("Percentage of total characters")
plt.legend(loc='best')
plt.show()

## Different answers between humans and the model

In this Cell we calculate how often the model and the survey takers were in agreement and disagreement, as well as some other numbers related to this statistic.

In [None]:
sameAnswer = 0
bothRight = 0
modelRight = 0
sameAnswerRole = {"Main": 0, "Supporting": 0}
bothRightRole = {"Main": 0, "Supporting": 0}
modelRightRole = {"Main": 0, "Supporting": 0}
totalCharactersRole = {"Main": 0, "Supporting": 0}

for role in simpleHumanDict:
    totalCharactersRole[role] = len(simpleHumanDict[role])
    for character in simpleHumanDict[role]:
        if simpleHumanDict[role][character] == simpleModelDict[role][character]:
            sameAnswer += 1
            sameAnswerRole[role] += 1
            if simpleHumanDict[role][character] == role:
                bothRight += 1
                bothRightRole[role] += 1
        else:
            if simpleModelDict[role][character] == role:
                modelRight += 1
                modelRightRole[role] += 1
                
print("Same Answer: "+str((sameAnswer/totalCharacters)*100)+"%")
print("Both Right: "+str((bothRight/sameAnswer)*100)+"%")
print("Model Right: "+str((modelRight/(totalCharacters-sameAnswer))*100)+"%")
for role in totalCharactersRole:
    print(role)
    print("Same Answer: "+str((sameAnswerRole[role]/totalCharactersRole[role])*100)+"%")
    print("Both Right: "+str((bothRightRole[role]/sameAnswerRole[role])*100)+"%")
    print("Model Right: "+str((modelRightRole[role]/(totalCharactersRole[role]-sameAnswerRole[role]))*100)+"%")

## Age

In this Cell Group we evaluate how Age impacts a character’s classification.

In [None]:
# Loading the age dictionary

ageDict = json.load(open("Visual Features/Age.json", "r"))

In [None]:
# Determining the age distribution per class, according to each classification method

trueAgeCount = {}
for role in ageDict:
    trueAgeCount[role] = {"young": 0, "old": 0}
    for character in ageDict[role]:
        trueAgeCount[role][ageDict[role][character]] += 1
        
modelAgeCount = {}
for role in ageDict:
    modelAgeCount[role] = {"young": 0, "old": 0}
for role in ageDict:
    for character in ageDict[role]:
        modelAgeCount[simpleModelDict[role][character]][ageDict[role][character]] += 1
        
humanAgeCount = {}
for role in ageDict:
    humanAgeCount[role] = {"young": 0, "old": 0}
for role in ageDict:
    for character in ageDict[role]:
        humanAgeCount[simpleHumanDict[role][character]][ageDict[role][character]] += 1

In [None]:
# Visualising the overall age distribution

totalOld = trueAgeCount["Main"]["old"] + trueAgeCount["Supporting"]["old"]
totalYoung = trueAgeCount["Main"]["young"] + trueAgeCount["Supporting"]["young"]

values = [(totalYoung/totalCharacters)*100, (totalOld/totalCharacters)*100]

plt.pie(values, labels=["Young", "Old"], autopct='%1.2f%%', colors=["#B63CEC", "#ECA93C"])
plt.show()

In [None]:
# Visualising the class distribution for old characters

mainOld = [(trueAgeCount["Main"]["old"]/totalOld)*100, 
           (modelAgeCount["Main"]["old"]/totalOld)*100, 
           (humanAgeCount["Main"]["old"]/totalOld)*100]
supportingOld = [(trueAgeCount["Supporting"]["old"]/totalOld)*100, 
                 (modelAgeCount["Supporting"]["old"]/totalOld)*100, 
                 (humanAgeCount["Supporting"]["old"]/totalOld)*100]

axes = plt.gca()
axes.set_ylim([0,100])
plt.bar(keys, mainOld, -0.2, color="b", label="Main", align="edge")
for index, value in enumerate(mainOld):
    plt.text(index-0.2, value-9, round(value, 2), color="w")
plt.bar(keys, supportingOld, 0.2, color="grey", label="Supporting", align="edge")
for index, value in enumerate(supportingOld):
    plt.text(index, value-9, round(value, 2), color="black")

plt.xlabel("Method of classification")
plt.ylabel("Percentage of old characters")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising the class distribution for young characters

mainYoung = [(trueAgeCount["Main"]["young"]/totalYoung)*100, 
           (modelAgeCount["Main"]["young"]/totalYoung)*100, 
           (humanAgeCount["Main"]["young"]/totalYoung)*100]
supportingYoung = [(trueAgeCount["Supporting"]["young"]/totalYoung)*100, 
                 (modelAgeCount["Supporting"]["young"]/totalYoung)*100, 
                 (humanAgeCount["Supporting"]["young"]/totalYoung)*100]

axes = plt.gca()
axes.set_ylim([0,100])
plt.bar(keys, mainYoung, -0.2, color="#0000ff", label="Main", align="edge")
for index, value in enumerate(mainYoung):
    plt.text(index-0.2, value-9, round(value, 2), color="w")
plt.bar(keys, supportingYoung, 0.2, color="grey", label="Supporting", align="edge")
for index, value in enumerate(supportingYoung):
    plt.text(index, value-9, round(value, 2), color="black")

plt.xlabel("Method of classification")
plt.ylabel("Percentage of young characters")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising how old characters compare to the average, when it comes to class distribution

axes = plt.gca()
axes.set_ylim([0,100])

label = "Total Main"
for index, value in enumerate(mainOld):
    plt.text(index-0.18, value-14, round(value, 2), color="w", size="x-small")
    plt.bar(index-0.2, mainResults[index], -0.2, color="#33aaff", label=label, align="edge")
    plt.text(index-0.38, mainResults[index]-14, round(mainResults[index], 2), color="black", size="x-small")
    label = None
    
plt.bar(keys, mainOld, -0.2, color="#0000ff", label="Old Main", align="edge")
    
label = "Total Supporting"
labelOld = "Old Supporting"
for index, value in enumerate(supportingOld):
    plt.bar(index, supportingResults[index], 0.2, color="#d5d5d5", label=label, align="edge")
    plt.text(index+0.02, supportingResults[index]-14, round(supportingResults[index], 2), color="black", size="x-small")
    plt.bar(index+0.2, value, 0.2, color="grey", label=labelOld, align="edge")
    plt.text(index+0.21, value-14, round(value, 2), color="black", size="x-small")
    label = None
    labelOld = None

plt.xlabel("Method of classification")
plt.ylabel("Percentage of characters")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising how young characters compare to the average, when it comes to class distribution

axes = plt.gca()
axes.set_ylim([0,100])

label = "Total Main"
for index, value in enumerate(mainYoung):
    plt.text(index-0.18, value-9, round(value, 2), color="w", size="x-small")
    plt.bar(index-0.2, mainResults[index], -0.2, color="#33aaff", label=label, align="edge")
    plt.text(index-0.38, mainResults[index]-9, round(mainResults[index], 2), color="black", size="x-small")
    label = None
    
plt.bar(keys, mainYoung, -0.2, color="#0000ff", label="Young Main", align="edge")
    
label = "Total Supporting"
labelYoung = "Young Supporting"
for index, value in enumerate(supportingYoung):
    plt.bar(index, supportingResults[index], 0.2, color="#d5d5d5", label=label, align="edge")
    plt.text(index+0.02, supportingResults[index]-9, round(supportingResults[index], 2), color="black", size="x-small")
    plt.bar(index+0.2, value, 0.2, color="grey", label=labelYoung, align="edge")
    plt.text(index+0.21, value-9, round(value, 2), color="black", size="x-small")
    label = None
    labelYoung = None

plt.xlabel("Method of classification")
plt.ylabel("Percentage of characters")
plt.legend(loc='best')
plt.show()

## Apparent Gender

In this Cell Group we evaluate how Gender impacts a character’s classification.

In [None]:
# Loading the gender dictionary

genderDict = json.load(open("Visual Features/Apparent Gender.json", "r"))

In [None]:
# Determining the gender distribution per class, according to each classification method

trueGenderCount = {}
for role in genderDict:
    trueGenderCount[role] = {"man": 0, "woman": 0}
    for character in genderDict[role]:
        trueGenderCount[role][genderDict[role][character]] += 1
        
modelGenderCount = {}
for role in genderDict:
    modelGenderCount[role] = {"man": 0, "woman": 0}
for role in genderDict:
    for character in genderDict[role]:
        modelGenderCount[simpleModelDict[role][character]][genderDict[role][character]] += 1
        
humanGenderCount = {}
for role in genderDict:
    humanGenderCount[role] = {"man": 0, "woman": 0}
for role in genderDict:
    for character in genderDict[role]:
        humanGenderCount[simpleHumanDict[role][character]][genderDict[role][character]] += 1

In [None]:
# Visualising the overall age distribution

totalMan = trueGenderCount["Main"]["man"] + trueGenderCount["Supporting"]["man"]
totalWoman = trueGenderCount["Main"]["woman"] + trueGenderCount["Supporting"]["woman"]

values = [(totalMan/totalCharacters)*100, (totalWoman/totalCharacters)*100]

plt.pie(values, labels=["Man", "Woman"], autopct='%1.2f%%', colors=["#B63CEC", "#ECA93C"])
plt.show()

In [None]:
# Visualising how male characters compare to the average, when it comes to class distribution

mainMan = [(trueGenderCount["Main"]["man"]/totalMan)*100, 
           (modelGenderCount["Main"]["man"]/totalMan)*100, 
           (humanGenderCount["Main"]["man"]/totalMan)*100]
supportingMan = [(trueGenderCount["Supporting"]["man"]/totalMan)*100, 
                 (modelGenderCount["Supporting"]["man"]/totalMan)*100, 
                 (humanGenderCount["Supporting"]["man"]/totalMan)*100]

axes = plt.gca()
axes.set_ylim([0,100])

label = "Total Main"
for index, value in enumerate(mainMan):
    plt.text(index-0.18, value-9, round(value, 2), color="w", size="x-small")
    plt.bar(index-0.2, mainResults[index], -0.2, color="#33aaff", label=label, align="edge")
    plt.text(index-0.38, mainResults[index]-9, round(mainResults[index], 2), color="black", size="x-small")
    label = None
    
plt.bar(keys, mainMan, -0.2, color="#0000ff", label="Male Main", align="edge")
    
label = "Total Supporting"
labelMale = "Male Supporting"
for index, value in enumerate(supportingMan):
    plt.bar(index, supportingResults[index], 0.2, color="#d5d5d5", label=label, align="edge")
    plt.text(index+0.02, supportingResults[index]-9, round(supportingResults[index], 2), color="black", size="x-small")
    plt.bar(index+0.2, value, 0.2, color="grey", label=labelMale, align="edge")
    plt.text(index+0.21, value-9, round(value, 2), color="black", size="x-small")
    label = None
    labelMale = None

plt.xlabel("Method of classification")
plt.ylabel("Percentage of characters")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising how female characters compare to the average, when it comes to class distribution

mainWoman = [(trueGenderCount["Main"]["woman"]/totalWoman)*100, 
           (modelGenderCount["Main"]["woman"]/totalWoman)*100, 
           (humanGenderCount["Main"]["woman"]/totalWoman)*100]
supportingWoman = [(trueGenderCount["Supporting"]["woman"]/totalWoman)*100, 
                 (modelGenderCount["Supporting"]["woman"]/totalWoman)*100, 
                 (humanGenderCount["Supporting"]["woman"]/totalWoman)*100]

axes = plt.gca()
axes.set_ylim([0,100])

label = "Total Main"
for index, value in enumerate(mainWoman):
    plt.text(index-0.18, value-9, round(value, 2), color="w", size="x-small")
    plt.bar(index-0.2, mainResults[index], -0.2, color="#33aaff", label=label, align="edge")
    plt.text(index-0.38, mainResults[index]-9, round(mainResults[index], 2), color="black", size="x-small")
    label = None
    
plt.bar(keys, mainWoman, -0.2, color="#0000ff", label="Female Main", align="edge")
    
label = "Total Supporting"
labelFemale = "Female Supporting"
for index, value in enumerate(supportingWoman):
    plt.bar(index, supportingResults[index], 0.2, color="#d5d5d5", label=label, align="edge")
    plt.text(index+0.02, supportingResults[index]-9, round(supportingResults[index], 2), color="black", size="x-small")
    plt.bar(index+0.2, value, 0.2, color="grey", label=labelFemale, align="edge")
    plt.text(index+0.21, value-9, round(value, 2), color="black", size="x-small")
    label = None
    labelFemale = None

plt.xlabel("Method of classification")
plt.ylabel("Percentage of characters")
plt.legend(loc='best')
plt.show()

## Hair Colour

In this Cell Group we evaluate how Hair Colour impacts a character’s classification.

In [None]:
# Loading the hair colour dictionary

hairColourDict = json.load(open("Visual Features/Hair Colour.json", "r"))

In [None]:
# Determining the hair colour distribution per class, according to each classification method

trueHairColourCount = {}
for role in hairColourDict:
    trueHairColourCount[role] = {}
    for character in hairColourDict[role]:
        if trueHairColourCount[role].get(hairColourDict[role][character]) == None:
            trueHairColourCount[role][hairColourDict[role][character]] = 0
        trueHairColourCount[role][hairColourDict[role][character]] += 1
        
modelHairColourCount = {}
for role in hairColourDict:
    modelHairColourCount[role] = {}
for role in hairColourDict:
    for character in hairColourDict[role]:
        if modelHairColourCount[simpleModelDict[role][character]].get(hairColourDict[role][character]) == None:
            modelHairColourCount[simpleModelDict[role][character]][hairColourDict[role][character]] = 0
        modelHairColourCount[simpleModelDict[role][character]][hairColourDict[role][character]] += 1

humanHairColourCount = {}
for role in hairColourDict:
    humanHairColourCount[role] = {}
for role in hairColourDict:
    for character in hairColourDict[role]:
        if humanHairColourCount[simpleHumanDict[role][character]].get(hairColourDict[role][character]) == None:
            humanHairColourCount[simpleHumanDict[role][character]][hairColourDict[role][character]] = 0
        humanHairColourCount[simpleHumanDict[role][character]][hairColourDict[role][character]] += 1

In [None]:
# Calculating and visualising the overall hair colour distribution across both classes combined

totalHairColourCount = {}
for colour in trueHairColourCount["Main"]:
    totalHairColourCount[colour] = trueHairColourCount["Main"][colour] + trueHairColourCount["Supporting"][colour]
    
hairColourPercentages = {}
for colour in totalHairColourCount:
    hairColourPercentages[colour] = (totalHairColourCount[colour]/totalCharacters)*100
    
axes = plt.gca()
axes.set_ylim([0,30])

xs = []
for colour in list(hairColourPercentages.keys()):
    xs.append(colour)
    
colours = ["#000000", "#9706FF", "#EAF0F6", "#F7EA0B",
           "#40C704", "#AF6E04", "#0158F0", "#FE67F9", "#FF193F",
           "#FF9619", "#AFAFAF"]

textColours = ["white", "black", "black", "black",
               "white", "white", "white", "black", "white", 
               "black", "black"]

for index, key in enumerate(hairColourPercentages):
    value = hairColourPercentages[key]
    plt.text(index-0.35, value-2, round(value, 2), color=textColours[index], size="x-small")
    plt.bar(xs[index], value, 0.7, color=colours[index], align="center")
    

plt.xlabel("Hair Colour")
plt.ylabel("Percentage of characters")
plt.show()

In [None]:
# Visualising, for each hair colour, the percentage of characters classified as main,
# accoridng ot the ground truth

axes = plt.gca()
axes.set_ylim([0,100])

xs = []
ys = []
for colour in list(trueHairColourCount["Main"].keys()):
    xs.append(colour)
    ys.append(mainResults[0])
    
colours = ["#000000", "#9706FF", "#EAF0F6", "#F7EA0B",
           "#40C704", "#AF6E04", "#0158F0", "#FE67F9", "#FF193F",
           "#FF9619", "#AFAFAF"]

textColours = ["white", "black", "black", "black",
               "white", "white", "white", "black", "white", 
               "black", "black"]

plt.text(len(xs)-1.5, mainResults[0]+2, round(mainResults[0], 2), color="#159100", size="x-small")
plt.plot(xs, ys, "y--", color="#159100", label="Percentage of total characters classified as Main")
for index, key in enumerate(trueHairColourCount["Main"]):
    value = trueHairColourCount["Main"][key]/(trueHairColourCount["Main"][key]+trueHairColourCount["Supporting"][key])*100
    plt.text(index-0.35, value-11, round(value, 2), color=textColours[index], size="x-small")
    plt.bar(xs[index], value, 0.7, color=colours[index], align="center")
    

plt.xlabel("Hair Colour")
plt.ylabel("Percentage of characters classified as Main according to the Truth")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising, for each hair colour, the percentage of characters classified as main,
# accoridng ot the model

axes = plt.gca()
axes.set_ylim([0,100])

xs = []
ys = []
for colour in list(trueHairColourCount["Main"].keys()):
    xs.append(colour)
    ys.append(mainResults[1])
    
colours = ["#000000", "#9706FF", "#EAF0F6", "#F7EA0B",
           "#40C704", "#AF6E04", "#0158F0", "#FE67F9", "#FF193F",
           "#FF9619", "#AFAFAF"]

textColours = ["white", "black", "black", "black",
               "white", "white", "white", "black", "white", 
               "black", "black"]

plt.text(len(xs)-1.5, mainResults[1]+2, round(mainResults[1], 2), color="#159100", size="x-small")
plt.plot(xs, ys, "y--", color="#159100", label="Percentage of total characters classified as Main")
for index, key in enumerate(xs):
    value = modelHairColourCount["Main"][key]/(modelHairColourCount["Main"][key]+modelHairColourCount["Supporting"][key])*100
    plt.text(index-0.35, value-11, round(value, 2), color=textColours[index], size="x-small")
    plt.bar(xs[index], value, 0.7, color=colours[index], align="center")
    

plt.xlabel("Hair Colour")
plt.ylabel("Percentage of characters classified as Main according to the Model")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising, for each hair colour, the percentage of characters classified as main,
# accoridng ot the human survey

axes = plt.gca()
axes.set_ylim([0,100])

xs = []
ys = []
for colour in list(trueHairColourCount["Main"].keys()):
    xs.append(colour)
    ys.append(mainResults[2])
    
colours = ["#000000", "#9706FF", "#EAF0F6", "#F7EA0B",
           "#40C704", "#AF6E04", "#0158F0", "#FE67F9", "#FF193F",
           "#FF9619", "#AFAFAF"]

textColours = ["white", "black", "black", "black",
               "white", "white", "white", "black", "white", 
               "black", "black"]

plt.text(len(xs)-1.5, mainResults[2]+2, round(mainResults[2], 2), color="#159100", size="x-small")
plt.plot(xs, ys, "y--", color="#159100", label="Percentage of total characters classified as Main")
for index, key in enumerate(xs):
    value = humanHairColourCount["Main"][key]/(humanHairColourCount["Main"][key]+humanHairColourCount["Supporting"][key])*100
    plt.text(index-0.35, value-11, round(value, 2), color=textColours[index], size="x-small")
    plt.bar(xs[index], value, 0.7, color=colours[index], align="center")
    

plt.xlabel("Hair Colour")
plt.ylabel("Percentage of characters classified as Main according to Humans")
plt.legend(loc='best')
plt.show()

## Hair Colour x Gender

This cell group evaluates how the impact hair colour has on the results changes according to apparent gender.

**Special Requirements:** "Apparent Gender" group and "Hair Colour" group

In [None]:
# Determining the hair colour distribution per gender per class, according to each classification method

trueHairColourCountGender = {"man": {}, "woman": {}}
for role in hairColourDict:
    trueHairColourCountGender["man"][role] = {}
    trueHairColourCountGender["woman"][role] = {}
    for character in hairColourDict[role]:
        if trueHairColourCountGender[genderDict[role][character]][role].get(hairColourDict[role][character]) == None:
            trueHairColourCountGender[genderDict[role][character]][role][hairColourDict[role][character]] = 0
        trueHairColourCountGender[genderDict[role][character]][role][hairColourDict[role][character]] += 1
        
modelHairColourCountGender = {"man": {}, "woman": {}}
for role in hairColourDict:
    modelHairColourCountGender["man"][role] = {}
    modelHairColourCountGender["woman"][role] = {}
for role in hairColourDict:
    for character in hairColourDict[role]:
        if modelHairColourCountGender[genderDict[role][character]][simpleModelDict[role][character]].get(hairColourDict[role][character]) == None:
            modelHairColourCountGender[genderDict[role][character]][simpleModelDict[role][character]][hairColourDict[role][character]] = 0
        modelHairColourCountGender[genderDict[role][character]][simpleModelDict[role][character]][hairColourDict[role][character]] += 1
        
humanHairColourCountGender = {"man": {}, "woman": {}}
for role in hairColourDict:
    humanHairColourCountGender["man"][role] = {}
    humanHairColourCountGender["woman"][role] = {}
for role in hairColourDict:
    for character in hairColourDict[role]:
        if humanHairColourCountGender[genderDict[role][character]][simpleHumanDict[role][character]].get(hairColourDict[role][character]) == None:
            humanHairColourCountGender[genderDict[role][character]][simpleHumanDict[role][character]][hairColourDict[role][character]] = 0
        humanHairColourCountGender[genderDict[role][character]][simpleHumanDict[role][character]][hairColourDict[role][character]] += 1

### Male Characters

In [None]:
# Visualising, for each hair colour, the percentage of male characters classified as main,
# accoridng ot the ground truth

axes = plt.gca()
axes.set_ylim([0,100])

xs = []
ys = []
for colour in list(trueHairColourCount["Main"].keys()):
    xs.append(colour)
    ys.append(mainMan[0])

colours = ["#000000", "#9706FF", "#EAF0F6", "#F7EA0B",
           "#40C704", "#AF6E04", "#0158F0", "#FE67F9", "#FF193F",
           "#FF9619", "#AFAFAF"]

textColours = ["white", "black", "black", "black",
               "white", "white", "white", "black", "white", 
               "black", "black"]

plt.text(len(xs)-1.5, mainMan[0]+2, round(mainMan[0], 2), color="#159100", size="x-small")
plt.plot(xs, ys, "y--", color="#159100", label="Percentage of total male characters classified as Main")
for index, key in enumerate(xs):
    value = trueHairColourCountGender["man"]["Main"][key]/(trueHairColourCountGender["man"]["Main"][key]+trueHairColourCountGender["man"]["Supporting"][key])*100
    plt.text(index-0.35, value-11, round(value, 2), color=textColours[index], size="x-small")
    plt.bar(xs[index], value, 0.7, color=colours[index], align="center")
    

plt.xlabel("Hair Colour")
plt.ylabel("Percentage of male characters classified as Main according to the Truth")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising, for each hair colour, the percentage of male characters classified as main,
# accoridng ot the model

axes = plt.gca()
axes.set_ylim([0,100])

xs = []
ys = []
for colour in list(trueHairColourCount["Main"].keys()):
    xs.append(colour)
    ys.append(mainMan[1])
    
colours = ["#000000", "#9706FF", "#EAF0F6", "#F7EA0B",
           "#40C704", "#AF6E04", "#0158F0", "#FE67F9", "#FF193F",
           "#FF9619", "#AFAFAF"]

textColours = ["white", "black", "black", "black",
               "white", "white", "white", "black", "white", 
               "black", "black"]

plt.text(len(xs)-1.5, mainMan[1]+2, round(mainMan[1], 2), color="#159100", size="x-small")
plt.plot(xs, ys, "y--", color="#159100", label="Percentage of total male characters classified as Main")
for index, key in enumerate(xs):
    value = modelHairColourCountGender["man"]["Main"][key]/(modelHairColourCountGender["man"]["Main"][key]+modelHairColourCountGender["man"]["Supporting"][key])*100
    plt.text(index-0.35, value-11, round(value, 2), color=textColours[index], size="x-small")
    plt.bar(xs[index], value, 0.7, color=colours[index], align="center")
    

plt.xlabel("Hair Colour")
plt.ylabel("Percentage of male characters classified as Main according to the Model")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising, for each hair colour, the percentage of male characters classified as main,
# accoridng ot the human survey

axes = plt.gca()
axes.set_ylim([0,100])

xs = []
ys = []
for colour in list(trueHairColourCount["Main"].keys()):
    xs.append(colour)
    ys.append(mainMan[2])
    
colours = ["#000000", "#9706FF", "#EAF0F6", "#F7EA0B",
           "#40C704", "#AF6E04", "#0158F0", "#FE67F9", "#FF193F",
           "#FF9619", "#AFAFAF"]

textColours = ["white", "black", "black", "black",
               "white", "white", "white", "black", "white", 
               "black", "black"]

plt.text(len(xs)-1.5, mainMan[2]+2, round(mainMan[2], 2), color="#159100", size="x-small")
plt.plot(xs, ys, "y--", color="#159100", label="Percentage of total male characters classified as Main")
for index, key in enumerate(xs):
    value = humanHairColourCountGender["man"]["Main"][key]/(humanHairColourCountGender["man"]["Main"][key]+humanHairColourCountGender["man"]["Supporting"][key])*100
    plt.text(index-0.35, value-11, round(value, 2), color=textColours[index], size="x-small")
    plt.bar(xs[index], value, 0.7, color=colours[index], align="center")
    

plt.xlabel("Hair Colour")
plt.ylabel("Percentage of male characters classified as Main according to the Humans")
plt.legend(loc='best')
plt.show()

### Female Characters

In [None]:
# Visualising, for each hair colour, the percentage of female characters classified as main,
# accoridng ot the ground truth

axes = plt.gca()
axes.set_ylim([0,100])

xs = []
ys = []
for colour in list(trueHairColourCount["Main"].keys()):
    xs.append(colour)
    ys.append(mainWoman[0])
    
colours = ["#000000", "#9706FF", "#EAF0F6", "#F7EA0B",
           "#40C704", "#AF6E04", "#0158F0", "#FE67F9", "#FF193F",
           "#FF9619", "#AFAFAF"]

textColours = ["white", "black", "black", "black",
               "white", "white", "white", "black", "white", 
               "black", "black"]

plt.text(len(xs)-1.5, mainWoman[0]+2, round(mainWoman[0], 2), color="#159100", size="x-small")
plt.plot(xs, ys, "y--", color="#159100", label="Percentage of total female characters classified as Main")
for index, key in enumerate(xs):
    value = trueHairColourCountGender["woman"]["Main"][key]/(trueHairColourCountGender["woman"]["Main"][key]+trueHairColourCountGender["woman"]["Supporting"][key])*100
    plt.text(index-0.35, value-11, round(value, 2), color=textColours[index], size="x-small")
    plt.bar(xs[index], value, 0.7, color=colours[index], align="center")
    

plt.xlabel("Hair Colour")
plt.ylabel("Percentage of female characters classified as Main according to the Truth")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising, for each hair colour, the percentage of female characters classified as main,
# accoridng ot the model

axes = plt.gca()
axes.set_ylim([0,100])

xs = []
ys = []
for colour in list(trueHairColourCount["Main"].keys()):
    xs.append(colour)
    ys.append(mainWoman[1])
    
colours = ["#000000", "#9706FF", "#EAF0F6", "#F7EA0B",
           "#40C704", "#AF6E04", "#0158F0", "#FE67F9", "#FF193F",
           "#FF9619", "#AFAFAF"]

textColours = ["white", "black", "black", "black",
               "white", "white", "white", "black", "white", 
               "black", "black"]

plt.text(len(xs)-1.5, mainWoman[1]+2, round(mainWoman[1], 2), color="#159100", size="x-small")
plt.plot(xs, ys, "y--", color="#159100", label="Percentage of total female characters classified as Main")
for index, key in enumerate(xs):
    value = modelHairColourCountGender["woman"]["Main"][key]/(modelHairColourCountGender["woman"]["Main"][key]+modelHairColourCountGender["woman"]["Supporting"][key])*100
    plt.text(index-0.35, value-11, round(value, 2), color=textColours[index], size="x-small")
    plt.bar(xs[index], value, 0.7, color=colours[index], align="center")
    

plt.xlabel("Hair Colour")
plt.ylabel("Percentage of female characters classified as Main according to the Model")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising, for each hair colour, the percentage of female characters classified as main,
# accoridng ot the human survey

axes = plt.gca()
axes.set_ylim([0,100])

xs = []
ys = []
for colour in list(trueHairColourCount["Main"].keys()):
    xs.append(colour)
    ys.append(mainWoman[2])
    
colours = ["#000000", "#9706FF", "#EAF0F6", "#F7EA0B",
           "#40C704", "#AF6E04", "#0158F0", "#FE67F9", "#FF193F",
           "#FF9619", "#AFAFAF"]

textColours = ["white", "black", "black", "black",
               "white", "white", "white", "black", "white", 
               "black", "black"]

plt.text(len(xs)-1.5, mainWoman[2]+2, round(mainWoman[2], 2), color="#159100", size="x-small")
plt.plot(xs, ys, "y--", color="#159100", label="Percentage of total female characters classified as Main")
for index, key in enumerate(xs):
    value = humanHairColourCountGender["woman"]["Main"][key]/(humanHairColourCountGender["woman"]["Main"][key]+humanHairColourCountGender["woman"]["Supporting"][key])*100
    plt.text(index-0.35, value-11, round(value, 2), color=textColours[index], size="x-small")
    plt.bar(xs[index], value, 0.7, color=colours[index], align="center")
    

plt.xlabel("Hair Colour")
plt.ylabel("Percentage of female characters classified as Main according to the Humans")
plt.legend(loc='best')
plt.show()

## Overall Hair Colour Differences

This Cell Group is meant to help visualise the impact each Hair Colour has on the results.

**Special Requirements:** "Hair Colour" group

In [None]:
# Method to calculate and plot the impact of a given hair colour on a character's chances of being main
# The method can be changed to calculate the differences for a specific gender,
# just be sure to swap the "count" dicts and the "mainResults" list

import math

def plotDifferenceFromMean(hairColour, colours, textColours):
    global keys
    global mainResults
    global trueHairColourCount
    global modelHairColourCount
    global humanHairColourCount
    
    axes = plt.gca()
    
    values = []
    value = trueHairColourCount["Main"][hairColour]/(trueHairColourCount["Main"][hairColour]+trueHairColourCount["Supporting"][hairColour])*100
    value = ((value/mainResults[0]) - 1)*100
    values.append(value)
    value = modelHairColourCount["Main"][hairColour]/(modelHairColourCount["Main"][hairColour]+modelHairColourCount["Supporting"][hairColour])*100
    value = ((value/mainResults[1]) - 1)*100
    values.append(value)
    value = humanHairColourCount["Main"][hairColour]/(humanHairColourCount["Main"][hairColour]+humanHairColourCount["Supporting"][hairColour])*100
    value = ((value/mainResults[2]) - 1)*100
    values.append(value)
    maxValue = 0
    for value in values:
        if math.fabs(value) > maxValue:
            maxValue = math.fabs(value)
    axes.set_ylim([-1*(maxValue+2), maxValue+2])
    
    plt.plot(keys, [0, 0, 0], color="black")
    
    maxValue = math.fabs(value)
    textY = values[0] + (0.7*maxValue/10)
    if values[0] < 0:
        textY = values[0] - (0.5*maxValue/10)
    plt.text(0-0.05, textY, round(values[0], 2), color=textColours[0], size="small")
    plt.bar(keys[0], values[0], 0.3, color=colours[0], align="center")
    
    textY = values[1] + (0.7*maxValue/10)
    if values[1] < 0:
        textY = values[1] - (0.5*maxValue/10)
    plt.text(1-0.05, textY, round(values[1], 2), color=textColours[1], size="small")
    plt.bar(keys[1], values[1], 0.3, color=colours[1], align="center")
    
    textY = values[2] + (0.5*maxValue/10)
    if values[2] < 0:
        textY = values[2] - (0.7*maxValue/10)
    plt.text(2-0.05, textY, round(values[2], 2), color=textColours[2], size="small")
    plt.bar(keys[2], values[2], 0.3, color=colours[2], align="center")
    
    plt.xlabel("Method of classification")
    plt.ylabel("Difference from the mean value of Main characters")
    plt.title(hairColour+" hair")
    plt.show()

The next cells run the "plotDifferenceFromMean" method for each hair colour.

In [None]:
plotDifferenceFromMean("black", ["black", "black", "black"], ["black", "black", "black"])

In [None]:
plotDifferenceFromMean("purple", ["#9706FF", "#9706FF", "#9706FF"], ["black", "black", "black"])

In [None]:
plotDifferenceFromMean("white", ["#EAF0F6", "#EAF0F6", "#EAF0F6"], ["black", "black", "black"])

In [None]:
plotDifferenceFromMean("blonde", ["#F7EA0B", "#F7EA0B", "#F7EA0B"], ["black", "black", "black"])

In [None]:
plotDifferenceFromMean("green", ["#40C704", "#40C704", "#40C704"], ["black", "black", "black"])

In [None]:
plotDifferenceFromMean("brown", ["#AF6E04", "#AF6E04", "#AF6E04"], ["black", "black", "black"])

In [None]:
plotDifferenceFromMean("blue", ["#0158F0", "#0158F0", "#0158F0"], ["black", "black", "black"])

In [None]:
plotDifferenceFromMean("pink", ["#FE67F9", "#FE67F9", "#FE67F9"], ["black", "black", "black"])

In [None]:
plotDifferenceFromMean("red", ["#FF193F", "#FF193F", "#FF193F"], ["black", "black", "black"])

In [None]:
plotDifferenceFromMean("orange", ["#FF9619", "#FF9619", "#FF9619"], ["black", "black", "black"])

In [None]:
plotDifferenceFromMean("gray", ["#AFAFAF", "#AFAFAF", "#AFAFAF"], ["black", "black", "black"])

# I2V Features

All Cell Groups following this are about features extracted with I2V, and therefore all require the i2vDict to be loaded, which happens in the next Cell.

In [None]:
i2vDict = json.load(open("Visual Features/i2v.json", "r"))

## Hair Length

In this Cell Group we evaluate how Hair Length impacts a character’s classification.

In [None]:
# Determining the hair length distribution per class, according to each classification method

trueHairLengthCount = {}
for role in i2vDict:
    trueHairLengthCount[role] = {}
    for character in i2vDict[role]:
        hairLength = ""
        if i2vDict[role][character]["short hair"] > i2vDict[role][character]["long hair"] + i2vDict[role][character]["very long hair"]:
            hairLength = "short hair"
        else:
            hairLength = "long hair"
        if trueHairLengthCount[role].get(hairLength) == None:
            trueHairLengthCount[role][hairLength] = 0
        trueHairLengthCount[role][hairLength] += 1

modelHairLengthCount = {}
for role in i2vDict:
    modelHairLengthCount[role] = {} 
for role in i2vDict:
    for character in i2vDict[role]:
        hairLength = ""
        if i2vDict[role][character]["short hair"] > i2vDict[role][character]["long hair"] + i2vDict[role][character]["very long hair"]:
            hairLength = "short hair"
        else:
            hairLength = "long hair"
        if modelHairLengthCount[simpleModelDict[role][character]].get(hairLength) == None:
            modelHairLengthCount[simpleModelDict[role][character]][hairLength] = 0
        modelHairLengthCount[simpleModelDict[role][character]][hairLength] += 1
        
humanHairLengthCount = {}
for role in i2vDict:
    humanHairLengthCount[role] = {}    
for role in i2vDict:
    for character in i2vDict[role]:
        hairLength = ""
        if i2vDict[role][character]["short hair"] > i2vDict[role][character]["long hair"] + i2vDict[role][character]["very long hair"]:
            hairLength = "short hair"
        else:
            hairLength = "long hair"
        if humanHairLengthCount[simpleHumanDict[role][character]].get(hairLength) == None:
            humanHairLengthCount[simpleHumanDict[role][character]][hairLength] = 0
        humanHairLengthCount[simpleHumanDict[role][character]][hairLength] += 1

In [None]:
# Visualising the overall hair length distribution

totalShort = trueHairLengthCount["Main"]["short hair"] + trueHairLengthCount["Supporting"]["short hair"]
totalLong = trueHairLengthCount["Main"]["long hair"] + trueHairLengthCount["Supporting"]["long hair"]

values = [(totalShort/totalCharacters)*100, (totalLong/totalCharacters)*100]

plt.pie(values, labels=["Short Hair", "Long Hair"], autopct='%1.2f%%', colors=["#B63CEC", "#ECA93C"])
plt.show()

In [None]:
# Visualising how short haired characters compare to the average, when it comes to class distribution

mainShort = [(trueHairLengthCount["Main"]["short hair"]/totalShort)*100, 
           (modelHairLengthCount["Main"]["short hair"]/totalShort)*100, 
           (humanHairLengthCount["Main"]["short hair"]/totalShort)*100]
supportingShort = [(trueHairLengthCount["Supporting"]["short hair"]/totalShort)*100, 
                 (modelHairLengthCount["Supporting"]["short hair"]/totalShort)*100, 
                 (humanHairLengthCount["Supporting"]["short hair"]/totalShort)*100]

axes = plt.gca()
axes.set_ylim([0,100])

label = "Total Main"
for index, value in enumerate(mainShort):
    plt.text(index-0.18, value-9, round(value, 2), color="w", size="x-small")
    plt.bar(index-0.2, mainResults[index], -0.2, color="#33aaff", label=label, align="edge")
    plt.text(index-0.38, mainResults[index]-9, round(mainResults[index], 2), color="black", size="x-small")
    label = None
    
plt.bar(keys, mainShort, -0.2, color="#0000ff", label="Short Hair Main", align="edge")
    
label = "Total Supporting"
labelShort = "Short Hair Supporting"
for index, value in enumerate(supportingShort):
    plt.bar(index, supportingResults[index], 0.2, color="#d5d5d5", label=label, align="edge")
    plt.text(index+0.02, supportingResults[index]-9, round(supportingResults[index], 2), color="black", size="x-small")
    plt.bar(index+0.2, value, 0.2, color="grey", label=labelShort, align="edge")
    plt.text(index+0.21, value-9, round(value, 2), color="black", size="x-small")
    label = None
    labelShort = None

plt.xlabel("Method of classification")
plt.ylabel("Percentage of characters")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising how long haired characters compare to the average, when it comes to class distribution

mainLong = [(trueHairLengthCount["Main"]["long hair"]/totalLong)*100, 
           (modelHairLengthCount["Main"]["long hair"]/totalLong)*100, 
           (humanHairLengthCount["Main"]["long hair"]/totalLong)*100]
supportingLong = [(trueHairLengthCount["Supporting"]["long hair"]/totalLong)*100, 
                 (modelHairLengthCount["Supporting"]["long hair"]/totalLong)*100, 
                 (humanHairLengthCount["Supporting"]["long hair"]/totalLong)*100]

axes = plt.gca()
axes.set_ylim([0,100])

label = "Total Main"
for index, value in enumerate(mainLong):
    plt.text(index-0.18, value-9, round(value, 2), color="w", size="x-small")
    plt.bar(index-0.2, mainResults[index], -0.2, color="#33aaff", label=label, align="edge")
    plt.text(index-0.38, mainResults[index]-9, round(mainResults[index], 2), color="black", size="x-small")
    label = None
    
plt.bar(keys, mainLong, -0.2, color="#0000ff", label="Long Hair Main", align="edge")
    
label = "Total Supporting"
labelLong = "Long Hair Supporting"
for index, value in enumerate(supportingLong):
    plt.bar(index, supportingResults[index], 0.2, color="#d5d5d5", label=label, align="edge")
    plt.text(index+0.02, supportingResults[index]-9, round(supportingResults[index], 2), color="black", size="x-small")
    plt.bar(index+0.2, value, 0.2, color="grey", label=labelLong, align="edge")
    plt.text(index+0.21, value-9, round(value, 2), color="black", size="x-small")
    label = None
    labelLong = None

plt.xlabel("Method of classification")
plt.ylabel("Percentage of characters")
plt.legend(loc='best')
plt.show()

## Hair Length x Gender

This cell group evaluates how the impact hair length has on the results changes according to apparent gender.

**Special Requirements:** "Apparent Gender" group and "Hair Length" group

### Male Characters

In [None]:
# Determining the hair length distribution per class for male characters, 
# according to each classification method

trueHairLengthCountMale = {}
for role in i2vDict:
    trueHairLengthCountMale[role] = {}
    for character in i2vDict[role]:
        if genderDict[role][character] != "man":
            continue
        hairLength = ""
        if i2vDict[role][character]["short hair"] > i2vDict[role][character]["long hair"] + i2vDict[role][character]["very long hair"]:
            hairLength = "short hair"
        else:
            hairLength = "long hair"
        if trueHairLengthCountMale[role].get(hairLength) == None:
            trueHairLengthCountMale[role][hairLength] = 0
        trueHairLengthCountMale[role][hairLength] += 1

modelHairLengthCountMale = {}
for role in i2vDict:
    modelHairLengthCountMale[role] = {}   
for role in i2vDict:
    for character in i2vDict[role]:
        if genderDict[role][character] != "man":
            continue
        hairLength = ""
        if i2vDict[role][character]["short hair"] > i2vDict[role][character]["long hair"] + i2vDict[role][character]["very long hair"]:
            hairLength = "short hair"
        else:
            hairLength = "long hair"
        if modelHairLengthCountMale[simpleModelDict[role][character]].get(hairLength) == None:
            modelHairLengthCountMale[simpleModelDict[role][character]][hairLength] = 0
        modelHairLengthCountMale[simpleModelDict[role][character]][hairLength] += 1
        
humanHairLengthCountMale = {}
for role in i2vDict:
    humanHairLengthCountMale[role] = {}    
for role in i2vDict:
    for character in i2vDict[role]:
        if genderDict[role][character] != "man":
            continue
        hairLength = ""
        if i2vDict[role][character]["short hair"] > i2vDict[role][character]["long hair"] + i2vDict[role][character]["very long hair"]:
            hairLength = "short hair"
        else:
            hairLength = "long hair"
        if humanHairLengthCountMale[simpleHumanDict[role][character]].get(hairLength) == None:
            humanHairLengthCountMale[simpleHumanDict[role][character]][hairLength] = 0
        humanHairLengthCountMale[simpleHumanDict[role][character]][hairLength] += 1

In [None]:
# Visualising the overall hair length distribution for male characters

totalShortMale = trueHairLengthCountMale["Main"]["short hair"] + trueHairLengthCountMale["Supporting"]["short hair"]
totalLongMale = trueHairLengthCountMale["Main"]["long hair"] + trueHairLengthCountMale["Supporting"]["long hair"]

values = [(totalShortMale/totalMan)*100, (totalLongMale/totalMan)*100]

plt.pie(values, labels=["Short Hair Male", "Long Hair Male"], autopct='%1.2f%%', colors=["#B63CEC", "#ECA93C"])
plt.show()

In [None]:
# Visualising, for male characters, how long haired characters compare to the average, 
# when it comes to class distribution

mainLongMale = [(trueHairLengthCountMale["Main"]["long hair"]/totalLongMale)*100, 
           (modelHairLengthCountMale["Main"]["long hair"]/totalLongMale)*100, 
           (humanHairLengthCountMale["Main"]["long hair"]/totalLongMale)*100]
supportingLongMale = [(trueHairLengthCountMale["Supporting"]["long hair"]/totalLongMale)*100, 
                 (modelHairLengthCountMale["Supporting"]["long hair"]/totalLongMale)*100, 
                 (humanHairLengthCountMale["Supporting"]["long hair"]/totalLongMale)*100]

axes = plt.gca()
axes.set_ylim([0,100])

label = "Total Main Male"
for index, value in enumerate(mainLongMale):
    plt.text(index-0.18, value-9, round(value, 2), color="w", size="x-small")
    plt.bar(index-0.2, mainMan[index], -0.2, color="#33aaff", label=label, align="edge")
    plt.text(index-0.38, mainMan[index]-9, round(mainMan[index], 2), color="black", size="x-small")
    label = None
    
plt.bar(keys, mainLongMale, -0.2, color="#0000ff", label="Long Hair Main Male", align="edge")
    
label = "Total Supporting Male"
labelLong = "Long Hair Supporting Male"
for index, value in enumerate(supportingLongMale):
    plt.bar(index, supportingMan[index], 0.2, color="#d5d5d5", label=label, align="edge")
    plt.text(index+0.02, supportingMan[index]-9, round(supportingMan[index], 2), color="black", size="x-small")
    plt.bar(index+0.2, value, 0.2, color="grey", label=labelLong, align="edge")
    plt.text(index+0.21, value-9, round(value, 2), color="black", size="x-small")
    label = None
    labelLong = None

plt.xlabel("Method of classification")
plt.ylabel("Percentage of male characters")
plt.legend(loc='best')
plt.show()

### Female Characters

In [None]:
# Determining the hair length distribution per class for female characters, 
# according to each classification method

trueHairLengthCountFemale = {}
for role in i2vDict:
    trueHairLengthCountFemale[role] = {}
    for character in i2vDict[role]:
        if genderDict[role][character] != "woman":
            continue
        hairLength = ""
        if i2vDict[role][character]["short hair"] > i2vDict[role][character]["long hair"] + i2vDict[role][character]["very long hair"]:
            hairLength = "short hair"
        else:
            hairLength = "long hair"
        if trueHairLengthCountFemale[role].get(hairLength) == None:
            trueHairLengthCountFemale[role][hairLength] = 0
        trueHairLengthCountFemale[role][hairLength] += 1

modelHairLengthCountFemale = {}
for role in i2vDict:
    modelHairLengthCountFemale[role] = {}   
for role in i2vDict:
    for character in i2vDict[role]:
        if genderDict[role][character] != "woman":
            continue
        hairLength = ""
        if i2vDict[role][character]["short hair"] > i2vDict[role][character]["long hair"] + i2vDict[role][character]["very long hair"]:
            hairLength = "short hair"
        else:
            hairLength = "long hair"
        if modelHairLengthCountFemale[simpleModelDict[role][character]].get(hairLength) == None:
            modelHairLengthCountFemale[simpleModelDict[role][character]][hairLength] = 0
        modelHairLengthCountFemale[simpleModelDict[role][character]][hairLength] += 1
        
humanHairLengthCountFemale = {}
for role in i2vDict:
    humanHairLengthCountFemale[role] = {}   
for role in i2vDict:
    for character in i2vDict[role]:
        if genderDict[role][character] != "woman":
            continue
        hairLength = ""
        if i2vDict[role][character]["short hair"] > i2vDict[role][character]["long hair"] + i2vDict[role][character]["very long hair"]:
            hairLength = "short hair"
        else:
            hairLength = "long hair"
        if humanHairLengthCountFemale[simpleHumanDict[role][character]].get(hairLength) == None:
            humanHairLengthCountFemale[simpleHumanDict[role][character]][hairLength] = 0
        humanHairLengthCountFemale[simpleHumanDict[role][character]][hairLength] += 1

In [None]:
# Visualising the overall hair length distribution for female characters

totalShortFemale = trueHairLengthCountFemale["Main"]["short hair"] + trueHairLengthCountFemale["Supporting"]["short hair"]
totalLongFemale = trueHairLengthCountFemale["Main"]["long hair"] + trueHairLengthCountFemale["Supporting"]["long hair"]

values = [(totalShortFemale/totalWoman)*100, (totalLongFemale/totalWoman)*100]

plt.pie(values, labels=["Short Hair Female", "Long Hair Female"], autopct='%1.2f%%', colors=["#B63CEC", "#ECA93C"])
plt.show()

In [None]:
# Visualising, for female characters, how long haired characters compare to the average, 
# when it comes to class distribution

mainLongFemale = [(trueHairLengthCountFemale["Main"]["long hair"]/totalLongFemale)*100, 
           (modelHairLengthCountFemale["Main"]["long hair"]/totalLongFemale)*100, 
           (humanHairLengthCountFemale["Main"]["long hair"]/totalLongFemale)*100]
supportingLongFemale = [(trueHairLengthCountFemale["Supporting"]["long hair"]/totalLongFemale)*100, 
                 (modelHairLengthCountFemale["Supporting"]["long hair"]/totalLongFemale)*100, 
                 (humanHairLengthCountFemale["Supporting"]["long hair"]/totalLongFemale)*100]

axes = plt.gca()
axes.set_ylim([0,100])

label = "Total Main Female"
for index, value in enumerate(mainLongFemale):
    plt.text(index-0.18, value-9, round(value, 2), color="w", size="x-small")
    plt.bar(index-0.2, mainWoman[index], -0.2, color="#33aaff", label=label, align="edge")
    plt.text(index-0.38, mainWoman[index]-9, round(mainWoman[index], 2), color="black", size="x-small")
    label = None
    
plt.bar(keys, mainLongFemale, -0.2, color="#0000ff", label="Long Hair Main Female", align="edge")
    
label = "Total Supporting Female"
labelLong = "Long Hair Supporting Female"
for index, value in enumerate(supportingLongFemale):
    plt.bar(index, supportingWoman[index], 0.2, color="#d5d5d5", label=label, align="edge")
    plt.text(index+0.02, supportingWoman[index]-9, round(supportingWoman[index], 2), color="black", size="x-small")
    plt.bar(index+0.2, value, 0.2, color="grey", label=labelLong, align="edge")
    plt.text(index+0.21, value-9, round(value, 2), color="black", size="x-small")
    label = None
    labelLong = None

plt.xlabel("Method of classification")
plt.ylabel("Percentage of female characters")
plt.legend(loc='best')
plt.show()

## Eye Size

In this Cell Group we evaluate how Eye Size impacts a character’s classification.

In [None]:
# Determining the eye size distribution per class, according to each classification method

trueEyeSizeDict = {}
for role in i2vDict:
    trueEyeSizeDict[role] = []
    for character in i2vDict[role]:
        trueEyeSizeDict[role].append(i2vDict[role][character]["closed eyes"])
        
modelEyeSizeDict = {}
for role in i2vDict:
    modelEyeSizeDict[role] = []    
for role in i2vDict:
    for character in i2vDict[role]:
        modelEyeSizeDict[simpleModelDict[role][character]].append(i2vDict[role][character]["closed eyes"])
        
humanEyeSizeDict = {}
for role in i2vDict:
    humanEyeSizeDict[role] = []    
for role in i2vDict:
    for character in i2vDict[role]:
        humanEyeSizeDict[simpleHumanDict[role][character]].append(i2vDict[role][character]["closed eyes"])

In [None]:
# Visualising the differences in eye size between both classes, according to each classification method
# Note that, since we are measuring the probability of a character having closed eyes,
# higher values mean smaller eyes

import statistics

mainEyeSize = [statistics.mean(trueEyeSizeDict["Main"]), 
           statistics.mean(modelEyeSizeDict["Main"]), 
           statistics.mean(humanEyeSizeDict["Main"])]
supportingEyeSize = [statistics.mean(trueEyeSizeDict["Supporting"]), 
               statistics.mean(modelEyeSizeDict["Supporting"]), 
               statistics.mean(humanEyeSizeDict["Supporting"])]

axes = plt.gca()
axes.set_ylim([0,0.05])

plt.bar(keys, mainEyeSize, -0.2, color="#0000ff", label="Main", align="edge")
for index, value in enumerate(mainEyeSize):
    plt.text(index-0.22, value, round(value, 5), color="black", size="x-small")
    
plt.bar(keys, supportingEyeSize, 0.2, color="grey", label="Supporting", align="edge")
label = "Supporting"
for index, value in enumerate(supportingEyeSize):
    #plt.bar(index+0.2, value, 0.2, color="grey", label=label, align="edge")
    plt.text(index, value, round(value, 5), color="black", size="x-small")
    label = None

plt.xlabel("Method of classification")
plt.ylabel("Mean eye size")
plt.legend(loc='best')
plt.show()

## Eye Size x Gender

This cell group evaluates how the impact eye size has on the results changes according to apparent gender.

**Special Requirements:** "Apparent Gender" group and "Eye Size" group

In [None]:
# Determining the eye size distribution per gender per class, according to each classification method

trueEyeSizeDictGender = {"man": {}, "woman": {}}
for role in i2vDict:
    trueEyeSizeDictGender["man"][role] = []
    trueEyeSizeDictGender["woman"][role] = []
    for character in i2vDict[role]:
        trueEyeSizeDictGender[genderDict[role][character]][role].append(i2vDict[role][character]["closed eyes"])
        
modelEyeSizeDictGender = {"man": {}, "woman": {}}
for role in i2vDict:
    modelEyeSizeDictGender["man"][role] = []
    modelEyeSizeDictGender["woman"][role] = []   
for role in i2vDict:
    for character in i2vDict[role]:
        modelEyeSizeDictGender[genderDict[role][character]][simpleModelDict[role][character]].append(i2vDict[role][character]["closed eyes"])
        
humanEyeSizeDictGender = {"man": {}, "woman": {}}
for role in i2vDict:
    humanEyeSizeDictGender["man"][role] = []
    humanEyeSizeDictGender["woman"][role] = []    
for role in i2vDict:
    for character in i2vDict[role]:
        humanEyeSizeDictGender[genderDict[role][character]][simpleHumanDict[role][character]].append(i2vDict[role][character]["closed eyes"])

In [None]:
# Visualising, for female characters, 
# the differences in eye size between both classes, according to each classification method
# Note that, since we are measuring the probability of a character having closed eyes,
# higher values mean smaller eyes

mainEyeSizeFemale = [statistics.mean(trueEyeSizeDictGender["woman"]["Main"]), 
           statistics.mean(modelEyeSizeDictGender["woman"]["Main"]), 
           statistics.mean(humanEyeSizeDictGender["woman"]["Main"])]
supportingEyeSizeFemale = [statistics.mean(trueEyeSizeDictGender["woman"]["Supporting"]), 
               statistics.mean(modelEyeSizeDictGender["woman"]["Supporting"]), 
               statistics.mean(humanEyeSizeDictGender["woman"]["Supporting"])]

axes = plt.gca()
axes.set_ylim([0,0.05])

plt.bar(keys, mainEyeSizeFemale, -0.2, color="#0000ff", label="Female Main", align="edge")
for index, value in enumerate(mainEyeSizeFemale):
    plt.text(index-0.22, value, round(value, 5), color="black", size="x-small")
    
plt.bar(keys, supportingEyeSizeFemale, 0.2, color="grey", label="Female Supporting", align="edge")
label = "Supporting"
for index, value in enumerate(supportingEyeSizeFemale):
    #plt.bar(index+0.2, value, 0.2, color="grey", label=label, align="edge")
    plt.text(index, value, round(value, 5), color="black", size="x-small")
    label = None

plt.xlabel("Method of classification")
plt.ylabel("Mean eye size")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising, for male characters, 
# the differences in eye size between both classes, according to each classification method
# Note that, since we are measuring the probability of a character having closed eyes,
# higher values mean smaller eyes

mainEyeSizeMale = [statistics.mean(trueEyeSizeDictGender["man"]["Main"]), 
           statistics.mean(modelEyeSizeDictGender["man"]["Main"]), 
           statistics.mean(humanEyeSizeDictGender["man"]["Main"])]
supportingEyeSizeMale = [statistics.mean(trueEyeSizeDictGender["man"]["Supporting"]), 
               statistics.mean(modelEyeSizeDictGender["man"]["Supporting"]), 
               statistics.mean(humanEyeSizeDictGender["man"]["Supporting"])]

axes = plt.gca()
axes.set_ylim([0,0.05])

plt.bar(keys, mainEyeSizeMale, -0.2, color="#0000ff", label="Male Main", align="edge")
for index, value in enumerate(mainEyeSizeMale):
    plt.text(index-0.22, value, round(value, 5), color="black", size="x-small")
    
plt.bar(keys, supportingEyeSizeMale, 0.2, color="grey", label="Male Supporting", align="edge")
label = "Supporting"
for index, value in enumerate(supportingEyeSizeMale):
    #plt.bar(index+0.2, value, 0.2, color="grey", label=label, align="edge")
    plt.text(index, value, round(value, 5), color="black", size="x-small")
    label = None

plt.xlabel("Method of classification")
plt.ylabel("Mean eye size")
plt.legend(loc='best')
plt.show()

In [None]:
# Visualising how male and female main characters compare when it comes to eye size, 
# according to each classification method
# Note that, since we are measuring the probability of a character having closed eyes,
# higher values mean smaller eyes

mainEyeSizeMale = [statistics.mean(trueEyeSizeDictGender["man"]["Main"]), 
           statistics.mean(modelEyeSizeDictGender["man"]["Main"]), 
           statistics.mean(humanEyeSizeDictGender["man"]["Main"])]
mainEyeSizeFemale = [statistics.mean(trueEyeSizeDictGender["woman"]["Main"]), 
           statistics.mean(modelEyeSizeDictGender["woman"]["Main"]), 
           statistics.mean(humanEyeSizeDictGender["woman"]["Main"])]

axes = plt.gca()
axes.set_ylim([0,0.05])

plt.bar(keys, mainEyeSizeMale, -0.2, color="purple", label="Male Main", align="edge")
for index, value in enumerate(mainEyeSizeMale):
    plt.text(index-0.22, value, round(value, 5), color="black", size="x-small")
    
plt.bar(keys, mainEyeSizeFemale, 0.2, color="orange", label="Female Main", align="edge")
for index, value in enumerate(mainEyeSizeFemale):
    #plt.bar(index+0.2, value, 0.2, color="grey", label=label, align="edge")
    plt.text(index, value, round(value, 5), color="black", size="x-small")

plt.xlabel("Method of classification")
plt.ylabel("Mean eye size")
plt.legend(loc='best')
plt.show()