<a href="https://colab.research.google.com/github/yilewang/TVB_Demo/blob/master/The_contrast_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is a demo for contrast analysis. Contrast analysis is a statistical tool for trends analysis. The basic idea is based on ANOVA but it will allow research to customize contrast weights for different groups. This demo will only focus on posterior test of the contrast analysis (Scheffe's Test). The detailed info could be seen at Dr. Abdi's paper here: https://personal.utdallas.edu/~herve/abdi-contrasts2010-pretty.pdf

In [1]:
# !/usr/bin/python

import numpy as np
import scipy.stats
import pandas as pd
"""
The contrast analysis used for group comparison

Author: Yile Wang
Date: 08/17/2021
"""

'\nThe contrast analysis used for group comparison\n\nAuthor: Yile Wang\nDate: 08/17/2021\n'

In [2]:
def contrast_analysis(datatable, contrast):
    """ 
    Arg: 
        Pandas DataFrame with all the features and groups info
    Return: 
        The contrast analysis results
    
    For this dataset, it should contain four groups, SNC, NC, MCI, AD;


    """

    # the number of cases for each group
    num_group = len(contrast)
    num_cases = datatable.groupby(['groups']).count().iloc[:,0].to_numpy()

    F_table = pd.DataFrame(columns=['features','F_value', 'P_value'])
    mean_array = np.zeros(num_group)
    var_array = np.zeros(num_group)

    for col in datatable.columns[3:]:

        # mean calculation
        mean_array = datatable.groupby(['groups']).mean().loc[:,col].to_numpy()
        meanNcontrast = np.dot(mean_array, contrast)
        contrast2 = np.square(contrast)

        # variance calculation
        var_array = datatable.groupby(['groups']).var().loc[:,col].to_numpy()
        denominator = sum(num_cases) - num_group
        # degree of freedom of the each case
        num_cases_df = num_cases -1

        # compute the sum of squares & mean sum of squares 
        SSE = np.dot(var_array, num_cases_df)
        MSE = SSE/denominator
        tmp_ms_contrast = sum(contrast2/num_cases)

        # compute the MS contrast
        MS_contrast = (meanNcontrast**2) / tmp_ms_contrast
        F_value = MS_contrast/MSE

        # alpha = 0.05
        F_critical = scipy.stats.f.ppf(q=1-0.05, dfn=1, dfd=denominator)

        # for posterior contrast, using scheffe test
        scheffe = F_critical * (num_group-1)
        if F_value >= scheffe:
            p = 0.05
        else:
            p = 'NA'

        print(f"The {col} contrast has F_value {F_value}, and the F_critical Scheffe's Test is {scheffe}")
        F_table = F_table.append({'features':col,'F_value':F_value, 'P_value':p}, ignore_index=True)
    return F_table

In [7]:
# The data set should be a pandas Datafram, and the groups info should be specificed as a column called 'groups'
# e.x.

G_table = pd.read_excel('./test.xlsx')

contrast = [-3, -1, 1, 3] #linear trend
contrast2 = [1,-1,-1,1] #quadratic trend
contrast3 = [-1,3,-3,1] #poly
F_table = contrast_analysis(G_table, contrast)
print(F_table)

The Gc contrast has F_value 8.78447083957519, and the F_critical Scheffe's Test is 11.933338178430578
The Gmax contrast has F_value 0.5379871992845026, and the F_critical Scheffe's Test is 11.933338178430578
The Go-Gc contrast has F_value 26.075328449124477, and the F_critical Scheffe's Test is 11.933338178430578
The Gmax-Gc contrast has F_value 1.5465756599534892, and the F_critical Scheffe's Test is 11.933338178430578
  features    F_value P_value
0       Gc   8.784471      NA
1     Gmax   0.537987      NA
2    Go-Gc  26.075328    0.05
3  Gmax-Gc   1.546576      NA
