<a href="https://colab.research.google.com/github/yilewang/TVB_Demo/blob/master/The_contrast_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a demo for contrast analysis. Contrast analysis is a statistical tool for trends analysis. The basic idea is based on ANOVA but it will allow research to customize contrast weights for different groups. This demo will only focus on posterior test of the contrast analysis (Scheffe's Test). The detailed info could be seen at Dr. Abdi's paper here: https://personal.utdallas.edu/~herve/abdi-contrasts2010-pretty.pdf

In [1]:
# !/usr/bin/python

import numpy as np
import scipy.stats
import pandas as pd
"""
The contrast analysis used for group comparison

Author: Yile Wang
Date: 08/17/2021
"""

'\nThe contrast analysis used for group comparison\n\nAuthor: Yile Wang\nDate: 08/17/2021\n'

In [2]:
def contrast_analysis(datatable, contrast, group_variable = "group", col = "gamma"):
    """ 
    Arg: 
        Pandas DataFrame with all the features and groups info
    Return: 
        The contrast analysis results
    
    For this dataset, it should contain four groups, SNC, NC, MCI, AD;


    """

    # the number of cases for each group
    num_group = len(contrast)
    num_cases = datatable.groupby([group_variable], sort=False).count().iloc[:,0].to_numpy()

    F_table = pd.DataFrame(columns=['features','F_value', 'P_value'])
    mean_array = np.zeros(num_group)
    var_array = np.zeros(num_group)


    # mean calculation
    mean_array = datatable.groupby([group_variable], sort=False).mean().loc[:,col].to_numpy()
    meanNcontrast = np.dot(mean_array, contrast)
    contrast2 = np.square(contrast)

    # variance calculation
    var_array = datatable.groupby([group_variable], sort=False).var().loc[:,col].to_numpy()
    denominator = sum(num_cases) - num_group
    # degree of freedom of the each case
    num_cases_df = num_cases -1

    # compute the sum of squares & mean sum of squares 
    SSE = np.dot(var_array, num_cases_df)
    MSE = SSE/denominator
    tmp_ms_contrast = sum(contrast2/num_cases)

    # compute the MS contrast
    MS_contrast = (meanNcontrast**2) / tmp_ms_contrast
    F_value = MS_contrast/MSE

    # alpha = 0.05
    F_critical = scipy.stats.f.ppf(q=1-0.05, dfn=1, dfd=denominator)

    # for posterior contrast, using scheffe test
    scheffe = F_critical * (num_group-1)
    if F_value >= scheffe:
        p = 0.05
    else:
        p = 'NA'

    print(f"The {col} contrast has F_value {F_value}, and the F_critical Scheffe's Test is {scheffe}")
    _tmp_dict = pd.DataFrame.from_dict([{'features':col,'F_value':F_value, 'P_value':p}])
    F_table = pd.concat([F_table, _tmp_dict], ignore_index=True)
    return F_table

In [3]:
# The data set should be a pandas Datafram, and the groups info should be specificed as a column called 'groups'
# e.x.

#G_table = pd.read_excel('./test.xlsx')


G_table = pd.read_excel("/home/yat-lok/workspace/data4project/lateralization/gc1sec_res/gc1sec_res.xlsx")

def left_right_interact_binder(G_table, group_variable, var1, var2, name1, name2, default_name = "value"):
    df = pd.DataFrame()
    df = pd.concat([df, G_table.loc[:,group_variable], G_table.loc[:,group_variable]], ignore_index=True)
    _tmp_ga = list(G_table.loc[:,var1])
    _tmp_ga.extend(G_table.loc[:,var2].to_list())
    df = pd.concat([df, pd.DataFrame([*_tmp_ga])], axis=1, ignore_index=True)
    _tmp_left = pd.Series([name1]).repeat(74)
    _tmp_right = pd.Series([name2]).repeat(74)
    df = pd.concat([df, pd.DataFrame([*_tmp_left, *_tmp_right])], axis=1, ignore_index=True)
    df.columns = [group_variable, default_name, "tmp"]
    df['U'] = df.loc[:,group_variable].astype(str).str.cat(df.tmp.astype(str), sep='.')
    df = df[['U', default_name]]
    return df


In [10]:
# contrast = [-3, -1, 1, 3] #linear trend
# contrast2 = [1,-1,-1,1] #quadratic trend
# contrast3 = [-1,3,-3,1] #poly
# contrast4 = [0,-2, 0,2]
name_list = ["gamma frequency", "theta frequency", "gamma amplitude","theta amplitude", "PAC", "PLV"]
for ind, i in enumerate(range(2, len(G_table.columns)-1, 2)):
    df = left_right_interact_binder(G_table, "group", str(G_table.columns[i]), str(G_table.columns[i+1]), "left", "right",default_name=name_list[ind])
    #contrast8 = [-1,-1,-1,-1,1,1,1,1]
    contrast9 = [-2,3,2,-3,-2,3,2,-3] # 
    F_table = contrast_analysis(df, contrast9, group_variable="U", col=name_list[ind])
    print(F_table)

The gamma frequency contrast has F_value 7.407485750373516, and the F_critical Scheffe's Test is 27.36118987286973
          features   F_value P_value
0  gamma frequency  7.407486      NA
The theta frequency contrast has F_value 1.790567208139677, and the F_critical Scheffe's Test is 27.36118987286973
          features   F_value P_value
0  theta frequency  1.790567      NA
The gamma amplitude contrast has F_value 1.5803941268911523, and the F_critical Scheffe's Test is 27.36118987286973
          features   F_value P_value
0  gamma amplitude  1.580394      NA
The theta amplitude contrast has F_value 2.102344434083729, and the F_critical Scheffe's Test is 27.36118987286973
          features   F_value P_value
0  theta amplitude  2.102344      NA
The PAC contrast has F_value 4.548097465665645, and the F_critical Scheffe's Test is 27.36118987286973
  features   F_value P_value
0      PAC  4.548097      NA
The PLV contrast has F_value 4.037167196048991, and the F_critical Scheffe's Test 