<a href="https://colab.research.google.com/github/thiagodonizetti/IHC21---eye-tracker-analyses/blob/main/COLAB_IHC_21_Fixations_analyses.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fixation analyses
## Dataset
**Description:** Dataset containing number of fixations by Areas of interest captured using eye tracking. They are results from field study performed at @CRECI. The study aims at supporting personalization features for people with Computer Anxiety (PwCA).

**Goal:** Identify UI elements related with different levels of Computer Anxiety which may be acting as distractors and making it difficult to the users to perform the suggested tasks.

### Loading the dataset

In [None]:
import pandas as pd
import numpy as np
import imblearn

import matplotlib.pyplot as plot
from scipy import stats
from sklearn import preprocessing
from scipy.stats import ttest_ind
from scipy.stats import mannwhitneyu
from scipy.stats import shapiro
from scipy.stats import levene


#from ipyfilechooser import FileChooser
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import cross_val_score
from imblearn.over_sampling import RandomOverSampler

from sklearn.preprocessing import MinMaxScaler
from google.colab import files

uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
      name=fn, length=len(uploaded[fn])))
  


### Reading data

In [None]:
df = pd.read_excel('eyetracker.xlsx', sheet_name='data')

FileNotFoundError: ignored

### Shapiro-Wilk test for normality.

In [None]:
df2 = df.drop(columns=['CLASS'])
for col in df2:
    print ('\n',col)
    swtest, p_col = shapiro( df2[col] )
    print( 'Shapiro-Wilk (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print('Sample looks Gaussian (fail to reject H0)')
    else:
        print('Sample does not look Gaussian (reject H0)')


 p
Shapiro-Wilk ( p ) p-value: 0.67812
Sample looks Gaussian (fail to reject H0)

 CARS
Shapiro-Wilk ( CARS ) p-value: 0.18381
Sample looks Gaussian (fail to reject H0)

 H-Search Box
Shapiro-Wilk ( H-Search Box ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 H-Top Menu
Shapiro-Wilk ( H-Top Menu ) p-value: 0.00001
Sample does not look Gaussian (reject H0)

 H-Carousel
Shapiro-Wilk ( H-Carousel ) p-value: 0.00003
Sample does not look Gaussian (reject H0)

 H-Content
Shapiro-Wilk ( H-Content ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Side Menu
Shapiro-Wilk ( C-Side Menu ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Search Box
Shapiro-Wilk ( C-Search Box ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Top Menu
Shapiro-Wilk ( C-Top Menu ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Content
Shapiro-Wilk ( C-Content ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 A-Search Box
Shapiro-W

**Shapiro-Wilk test results**

Results show that the sample does not look Gaussian for all the AOI's tested.

### **Separete groups of CA.**

In [None]:
df_high = df2[df2['CARS'] >= 47]
df_no = df2[df2['CARS'] < 34]
df_low = df2[df2['CARS'] >= 34]
df_low = df_low[df_low['CARS'] < 47]

### Shapiro-Wilk test for normality (HIGH CA).


In [None]:
for col in df_high:
    print ('\n',col)
    swtest, p_col = shapiro( df[col] )
    print( 'Shapiro-Wilk (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print('Sample looks Gaussian (fail to reject H0)')
    else:
        print('Sample does not look Gaussian (reject H0)')


 p
Shapiro-Wilk ( p ) p-value: 0.67812
Sample looks Gaussian (fail to reject H0)

 CARS
Shapiro-Wilk ( CARS ) p-value: 0.18381
Sample looks Gaussian (fail to reject H0)

 H-Search Box
Shapiro-Wilk ( H-Search Box ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 H-Top Menu
Shapiro-Wilk ( H-Top Menu ) p-value: 0.00001
Sample does not look Gaussian (reject H0)

 H-Carousel
Shapiro-Wilk ( H-Carousel ) p-value: 0.00003
Sample does not look Gaussian (reject H0)

 H-Content
Shapiro-Wilk ( H-Content ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Side Menu
Shapiro-Wilk ( C-Side Menu ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Search Box
Shapiro-Wilk ( C-Search Box ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Top Menu
Shapiro-Wilk ( C-Top Menu ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Content
Shapiro-Wilk ( C-Content ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 A-Search Box
Shapiro-W

**Shapiro-Wilk test results**

Results show that the HIGH CA sample does not look Gaussian for all the AOI's tested.

###Shapiro-Wilk test for normality (LOW CA)

In [None]:
for col in df_low:
    print ('\n',col)
    swtest, p_col = shapiro( df[col] )
    print( 'Shapiro-Wilk (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print('Sample looks Gaussian (fail to reject H0)')
    else:
        print('Sample does not look Gaussian (reject H0)')

NameError: ignored

**Shapiro-Wilk test results**

Results show that the LOW CA sample does not look Gaussian for all the AOI's tested.

### Shapiro-Wilk test for normality (No CA).


In [None]:
for col in df_no:
    print ('\n',col)
    swtest, p_col = shapiro( df[col] )
    print( 'Shapiro-Wilk (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print('Sample looks Gaussian (fail to reject H0)')
    else:
        print('Sample does not look Gaussian (reject H0)')


 p
Shapiro-Wilk ( p ) p-value: 0.67812
Sample looks Gaussian (fail to reject H0)

 CARS
Shapiro-Wilk ( CARS ) p-value: 0.18381
Sample looks Gaussian (fail to reject H0)

 H-Search Box
Shapiro-Wilk ( H-Search Box ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 H-Top Menu
Shapiro-Wilk ( H-Top Menu ) p-value: 0.00001
Sample does not look Gaussian (reject H0)

 H-Carousel
Shapiro-Wilk ( H-Carousel ) p-value: 0.00003
Sample does not look Gaussian (reject H0)

 H-Content
Shapiro-Wilk ( H-Content ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Side Menu
Shapiro-Wilk ( C-Side Menu ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Search Box
Shapiro-Wilk ( C-Search Box ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Top Menu
Shapiro-Wilk ( C-Top Menu ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 C-Content
Shapiro-Wilk ( C-Content ) p-value: 0.00000
Sample does not look Gaussian (reject H0)

 A-Search Box
Shapiro-W

**Shapiro-Wilk test results**

Results show that the NO CA sample does not look Gaussian for all the AOI's tested.

### Levine test for variance between classes


In [None]:
for col in df2:
    print ('\n',col)
    
    #compare the variance between the three classes for the same AOI column:
    ltest, p_col = levene(df_high[col], df_low[col], df_no[col])
    
    print( 'Levene (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print('Sample seems to have the same variance (fail to reject H0)')
    else:
        print('Sample does not seem to have the same variance (reject H0)')


 p
Levene ( p ) p-value: 0.39036
Sample seems to have the same variance (fail to reject H0)

 CARS
Levene ( CARS ) p-value: 0.71668
Sample seems to have the same variance (fail to reject H0)

 H-Search Box
Levene ( H-Search Box ) p-value: 0.83344
Sample seems to have the same variance (fail to reject H0)

 H-Top Menu
Levene ( H-Top Menu ) p-value: 0.85114
Sample seems to have the same variance (fail to reject H0)

 H-Carousel
Levene ( H-Carousel ) p-value: 0.48343
Sample seems to have the same variance (fail to reject H0)

 H-Content
Levene ( H-Content ) p-value: 0.26674
Sample seems to have the same variance (fail to reject H0)

 C-Side Menu
Levene ( C-Side Menu ) p-value: 0.36678
Sample seems to have the same variance (fail to reject H0)

 C-Search Box
Levene ( C-Search Box ) p-value: 0.39633
Sample seems to have the same variance (fail to reject H0)

 C-Top Menu
Levene ( C-Top Menu ) p-value: 0.03743
Sample does not seem to have the same variance (reject H0)

 C-Content
Levene ( C-

**Levene test results**

Levene test results show that sample does not seem to have the same variance for:


*   C-Top Menu with *p-value = 0.03743*
*   M-Routes with *p-value = 0.00601*



### Kruskal-Wallis H-test


In [None]:


for col in df2:
    print ('\n',col)
    
    #compare the distribution between the three classes for the same AOI column:
    ktest, p_col = stats.kruskal(df_high[col], df_low[col], df_no[col])
    
    print( 'kruskal (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print("")#print('Sample seems to have the same distribution (fail to reject H0)')
    else:
        print('Sample does not seem to have the same distribution (reject H0)')


 p
kruskal ( p ) p-value: 0.13731


 CARS
kruskal ( CARS ) p-value: 0.00001
Sample does not seem to have the same distribution (reject H0)

 H-Search Box
kruskal ( H-Search Box ) p-value: 0.90599


 H-Top Menu
kruskal ( H-Top Menu ) p-value: 0.52909


 H-Carousel
kruskal ( H-Carousel ) p-value: 0.37048


 H-Content
kruskal ( H-Content ) p-value: 0.30300


 C-Side Menu
kruskal ( C-Side Menu ) p-value: 0.42768


 C-Search Box
kruskal ( C-Search Box ) p-value: 0.33492


 C-Top Menu
kruskal ( C-Top Menu ) p-value: 0.00633
Sample does not seem to have the same distribution (reject H0)

 C-Content
kruskal ( C-Content ) p-value: 0.10733


 A-Search Box
kruskal ( A-Search Box ) p-value: 0.37737


 A-Top Menu
kruskal ( A-Top Menu ) p-value: 0.88359


 A-Description
kruskal ( A-Description ) p-value: 0.14202


 A-Card
kruskal ( A-Card ) p-value: 0.19452


 A-Show Map
kruskal ( A-Show Map ) p-value: 0.31140


 U-Search Box
kruskal ( U-Search Box ) p-value: 0.49364


 U-Top Menu
kruskal ( U-Top M

**Kruskal-Wallis H Test results**

kruskal result for the three classes shows that the sample does not seem to have the same distribution just for the **C-Top** Menu with *p-value = 0.00633*

### **Kruskal-Wallis H Test** for HIGH CA and Low CA

In [None]:
for col in df2:
    print ('\n',col)
    
    #compare the distribution between the three classes for the same AOI column:
    ktest, p_col = stats.kruskal(df_high[col], df_low[col])
    
    print( 'kruskal (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print("")#print('Sample seems to have the same distribution (fail to reject H0)')
    else:
        print('Sample does not seem to have the same distribution (reject H0)')


 p
kruskal ( p ) p-value: 0.10532


 CARS
kruskal ( CARS ) p-value: 0.00010
Sample does not seem to have the same distribution (reject H0)

 H-Search Box
kruskal ( H-Search Box ) p-value: 0.95902


 H-Top Menu
kruskal ( H-Top Menu ) p-value: 0.28833


 H-Carousel
kruskal ( H-Carousel ) p-value: 0.22944


 H-Content
kruskal ( H-Content ) p-value: 0.11844


 C-Side Menu
kruskal ( C-Side Menu ) p-value: 0.17578


 C-Search Box
kruskal ( C-Search Box ) p-value: 0.12854


 C-Top Menu
kruskal ( C-Top Menu ) p-value: 0.00233
Sample does not seem to have the same distribution (reject H0)

 C-Content
kruskal ( C-Content ) p-value: 0.03970
Sample does not seem to have the same distribution (reject H0)

 A-Search Box
kruskal ( A-Search Box ) p-value: 0.94485


 A-Top Menu
kruskal ( A-Top Menu ) p-value: 0.77735


 A-Description
kruskal ( A-Description ) p-value: 0.08672


 A-Card
kruskal ( A-Card ) p-value: 0.06059


 A-Show Map
kruskal ( A-Show Map ) p-value: 0.92948


 U-Search Box
kruskal ( U

 Kruskal-Wallis H test results for **High CA and Low CA** show that sample does not seem to have the same distribution for:

*   C-Top Menu with *p-value = 0.00233*
*   C-Content with *p-value = 0.03970*

### **Kruskal-Wallis H Test** for High CA and No CA

In [None]:
for col in df2:
    print ('\n',col)
    
    #compare the distribution between the three classes for the same AOI column:
    ktest, p_col = stats.kruskal(df_high[col], df_no[col])
    
    print( 'kruskal (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print("")#print('Sample seems to have the same distribution (fail to reject H0)')
    else:
        print('Sample does not seem to have the same distribution (reject H0)')


 p
kruskal ( p ) p-value: 0.75126


 CARS
kruskal ( CARS ) p-value: 0.00048
Sample does not seem to have the same distribution (reject H0)

 H-Search Box
kruskal ( H-Search Box ) p-value: 0.70911


 H-Top Menu
kruskal ( H-Top Menu ) p-value: 0.46480


 H-Carousel
kruskal ( H-Carousel ) p-value: 0.25214


 H-Content
kruskal ( H-Content ) p-value: 0.48319


 C-Side Menu
kruskal ( C-Side Menu ) p-value: 1.00000


 C-Search Box
kruskal ( C-Search Box ) p-value: 0.21000


 C-Top Menu
kruskal ( C-Top Menu ) p-value: 0.06627


 C-Content
kruskal ( C-Content ) p-value: 0.92642


 A-Search Box
kruskal ( A-Search Box ) p-value: 0.23605


 A-Top Menu
kruskal ( A-Top Menu ) p-value: 0.89116


 A-Description
kruskal ( A-Description ) p-value: 0.10720


 A-Card
kruskal ( A-Card ) p-value: 0.40965


 A-Show Map
kruskal ( A-Show Map ) p-value: 0.23365


 U-Search Box
kruskal ( U-Search Box ) p-value: 0.21000


 U-Top Menu
kruskal ( U-Top Menu ) p-value: 0.83437


 U-Information
kruskal ( U-Informatio

 Kruskal-Wallis H test results for **High CA and No CA** show that sample seem to have the same distribution.

### **Kruskal-Wallis H** Test for Low CA and No CA

In [None]:
for col in df2:
    print ('\n',col)
    
    #compare the distribution between the three classes for the same AOI column:
    ktest, p_col = stats.kruskal(df_low[col], df_no[col])
    
    print( 'kruskal (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print("")#print('Sample seems to have the same distribution (fail to reject H0)')
    else:
        print('Sample does not seem to have the same distribution (reject H0)')


 p
kruskal ( p ) p-value: 0.07898


 CARS
kruskal ( CARS ) p-value: 0.00061
Sample does not seem to have the same distribution (reject H0)

 H-Search Box
kruskal ( H-Search Box ) p-value: 0.69405


 H-Top Menu
kruskal ( H-Top Menu ) p-value: 0.73141


 H-Carousel
kruskal ( H-Carousel ) p-value: 0.69592


 H-Content
kruskal ( H-Content ) p-value: 0.53615


 C-Side Menu
kruskal ( C-Side Menu ) p-value: 0.45837


 C-Search Box
kruskal ( C-Search Box ) p-value: 0.82557


 C-Top Menu
kruskal ( C-Top Menu ) p-value: 0.13707


 C-Content
kruskal ( C-Content ) p-value: 0.15552


 A-Search Box
kruskal ( A-Search Box ) p-value: 0.27115


 A-Top Menu
kruskal ( A-Top Menu ) p-value: 0.58984


 A-Description
kruskal ( A-Description ) p-value: 0.80643


 A-Card
kruskal ( A-Card ) p-value: 0.58892


 A-Show Map
kruskal ( A-Show Map ) p-value: 0.15454


 U-Search Box
kruskal ( U-Search Box ) p-value: 0.86168


 U-Top Menu
kruskal ( U-Top Menu ) p-value: 0.51207


 U-Information
kruskal ( U-Informatio

Kruskal-Wallis H test results for **Low CA and No CA** show that sample seem to have the same distribution.


---



### Mann-Whitney U Test




*   **High CA x Low CA**



In [None]:
for col in df2:
    print ('\n',col)
    
    #compare the distribution between High CA and Low CA classes for the same AOI column:
    mtest, p_col = stats.mannwhitneyu(df_high[col], df_low[col])
    
    print( 'Mann-Whitney U Test (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print("")#print('Sample seems to have the same distribution (fail to reject H0)')
    else:
        print('Sample does not seem to have the same distribution (reject H0)')


 p
Mann-Whitney U Test ( p ) p-value: 0.05655


 CARS
Mann-Whitney U Test ( CARS ) p-value: 0.00006
Sample does not seem to have the same distribution (reject H0)

 H-Search Box
Mann-Whitney U Test ( H-Search Box ) p-value: 0.50000


 H-Top Menu
Mann-Whitney U Test ( H-Top Menu ) p-value: 0.15235


 H-Carousel
Mann-Whitney U Test ( H-Carousel ) p-value: 0.12172


 H-Content
Mann-Whitney U Test ( H-Content ) p-value: 0.06397


 C-Side Menu
Mann-Whitney U Test ( C-Side Menu ) p-value: 0.09371


 C-Search Box
Mann-Whitney U Test ( C-Search Box ) p-value: 0.07342


 C-Top Menu
Mann-Whitney U Test ( C-Top Menu ) p-value: 0.00132
Sample does not seem to have the same distribution (reject H0)

 C-Content
Mann-Whitney U Test ( C-Content ) p-value: 0.02162
Sample does not seem to have the same distribution (reject H0)

 A-Search Box
Mann-Whitney U Test ( A-Search Box ) p-value: 0.50000


 A-Top Menu
Mann-Whitney U Test ( A-Top Menu ) p-value: 0.40229


 A-Description
Mann-Whitney U Test ( A-De

**Mann-Whitney U Test** results for **High CA x Low CA** show that the Sample does not seem to have the same distribution for:

*   C-Top Menu with *p-value: 0.00132*
*   C-Content with *p-value: 0.02162*
*   A-Description with *p-value: 0.04674*
*   A-Card with *p-value: 0.03281*


---












*   **High CA X No CA**


In [None]:
for col in df2:
    print ('\n',col)
    
    #compare the distribution between High CA and No CA classes for the same AOI column:
    mtest, p_col = stats.mannwhitneyu(df_high[col], df_no[col])
    
    print( 'Mann-Whitney U Test (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print("")#print('Sample seems to have the same distribution (fail to reject H0)')
    else:
        print('Sample does not seem to have the same distribution (reject H0)')


 p
Mann-Whitney U Test ( p ) p-value: 0.39293


 CARS
Mann-Whitney U Test ( CARS ) p-value: 0.00028
Sample does not seem to have the same distribution (reject H0)

 H-Search Box
Mann-Whitney U Test ( H-Search Box ) p-value: 0.37795


 H-Top Menu
Mann-Whitney U Test ( H-Top Menu ) p-value: 0.24658


 H-Carousel
Mann-Whitney U Test ( H-Carousel ) p-value: 0.13581


 H-Content
Mann-Whitney U Test ( H-Content ) p-value: 0.25874


 C-Side Menu
Mann-Whitney U Test ( C-Side Menu ) p-value: 0.48138


 C-Search Box
Mann-Whitney U Test ( C-Search Box ) p-value: 0.12723


 C-Top Menu
Mann-Whitney U Test ( C-Top Menu ) p-value: 0.03733
Sample does not seem to have the same distribution (reject H0)

 C-Content
Mann-Whitney U Test ( C-Content ) p-value: 0.48159


 A-Search Box
Mann-Whitney U Test ( A-Search Box ) p-value: 0.13238


 A-Top Menu
Mann-Whitney U Test ( A-Top Menu ) p-value: 0.46366


 A-Description
Mann-Whitney U Test ( A-Description ) p-value: 0.05881


 A-Card
Mann-Whitney U Test ( A

**Mann-Whitney U Test** results for **High CA x No CA** show that the sample does not seem to have the same distribution for:

*  C-Top Menu with *p-value: 0.03733*
*  M-Input Address with *p-value: 0.03675*
*  M-Map with *p-value: 0.04549*
*  R-Tabs with *p-value: 0.04082*
*  R-Content with *p-value: 0.04082*


---



* **Low CA x No CA**

In [None]:
for col in df2:
    print ('\n',col)
    
    #compare the distribution between Low CA and No CA classes for the same AOI column:
    mtest, p_col = stats.mannwhitneyu(df_low[col], df_no[col])
    
    print( 'Mann-Whitney U Test (', col, ') p-value: {:.5f}'.format( p_col ) ) 
    alpha = 0.05
    if p_col > alpha:
        print("")#print('Sample seems to have the same distribution (fail to reject H0)')
    else:
        print('Sample does not seem to have the same distribution (reject H0)')


 p
Mann-Whitney U Test ( p ) p-value: 0.04383
Sample does not seem to have the same distribution (reject H0)

 CARS
Mann-Whitney U Test ( CARS ) p-value: 0.00036
Sample does not seem to have the same distribution (reject H0)

 H-Search Box
Mann-Whitney U Test ( H-Search Box ) p-value: 0.37153


 H-Top Menu
Mann-Whitney U Test ( H-Top Menu ) p-value: 0.38430


 H-Carousel
Mann-Whitney U Test ( H-Carousel ) p-value: 0.36618


 H-Content
Mann-Whitney U Test ( H-Content ) p-value: 0.28533


 C-Side Menu
Mann-Whitney U Test ( C-Side Menu ) p-value: 0.24444


 C-Search Box
Mann-Whitney U Test ( C-Search Box ) p-value: 0.44160


 C-Top Menu
Mann-Whitney U Test ( C-Top Menu ) p-value: 0.07532


 C-Content
Mann-Whitney U Test ( C-Content ) p-value: 0.08514


 A-Search Box
Mann-Whitney U Test ( A-Search Box ) p-value: 0.15220


 A-Top Menu
Mann-Whitney U Test ( A-Top Menu ) p-value: 0.31205


 A-Description
Mann-Whitney U Test ( A-Description ) p-value: 0.42230


 A-Card
Mann-Whitney U Test ( A

**Mann-Whitney U Test** results for **Low CA x No CA** show that the sample does not seem to have the same distribution for:

*  M-Routes with *p-value: 0.04324*
*  M-Map Address with *p-value: 0.03839*


---

