# Incorporation of Individual Fairness in Advocate Recommendation

Individual fairness of advocates is to be measured using consistency as a measure i.e. the average difference in ranking between an individual (advocate) and its k-nearest neighbours. In the situation of advocate recommendation, to ensure a realistic consideration of fairness, it is to be measured only between advocates belonging to the same area/same set of areas as that of a case.

In [47]:
import pandas as pd

Creating function for consistency measurement. The function requires as inputs for each case:
- the activation values for each advocate
- the advocate attribute set
- the area/act/chapter/section information of the case

For each query, get the consistency scores of each advocate considering their k-nearest neighbours. The fairness value of each advocate is their fairness value averaged across all queries.

## Importing the attribute dataset

In [3]:
adv_attr_df_path = "/home/workboots/Datasets/DHC/variations/v5/adv_info/train/area_act_chapter_section_info/adv_areas_attr_df.csv"

In [4]:
adv_attr_df = pd.read_csv(adv_attr_df_path, index_col=0, sep="\t")

In [5]:
adv_attr_df

Unnamed: 0,TOTAL_CASE_NUM,WIN_RATIO,ADMINISTRATIVE LAW_CASE_NUM,ADMINISTRATIVE LAW_WIN_RATIO,AGRICULTURE LAW_CASE_NUM,AGRICULTURE LAW_WIN_RATIO,AVIATION LAW_CASE_NUM,AVIATION LAW_WIN_RATIO,BANKING AND FINANCE LAW_CASE_NUM,BANKING AND FINANCE LAW_WIN_RATIO,...,RTI LAW_CASE_NUM,RTI LAW_WIN_RATIO,TAX LAW_CASE_NUM,TAX LAW_WIN_RATIO,TELECOMMUNICATION LAW_CASE_NUM,TELECOMMUNICATION LAW_WIN_RATIO,TRADE LAW_CASE_NUM,TRADE LAW_WIN_RATIO,TRANSPORTATION LAW_CASE_NUM,TRANSPORTATION LAW_WIN_RATIO
ASChanhiok,129,0.728682,0,0.0,1,1.0,3,0.666667,3,1.0,...,1,1.0,8,0.5,2,0.5,4,1.0,3,0.666667
MeeraBhatia,17,0.764706,0,0.0,0,0.0,0,0.000000,0,0.0,...,0,0.0,1,0.0,0,0.0,0,0.0,1,1.000000
OPSaxena,37,0.648649,0,0.0,0,0.0,0,0.000000,3,1.0,...,0,0.0,0,0.0,0,0.0,0,0.0,0,0.000000
NehaKapoor,17,0.823529,0,0.0,0,0.0,0,0.000000,0,0.0,...,0,0.0,0,0.0,0,0.0,1,1.0,0,0.000000
NWaziri,17,0.647059,1,1.0,0,0.0,0,0.000000,0,0.0,...,0,0.0,1,1.0,0,0.0,0,0.0,0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
AkhileshKumar,9,0.555556,0,0.0,0,0.0,0,0.000000,0,0.0,...,0,0.0,0,0.0,0,0.0,0,0.0,0,0.000000
MalvikaTrivedi,8,0.500000,0,0.0,0,0.0,0,0.000000,0,0.0,...,0,0.0,0,0.0,0,0.0,1,1.0,0,0.000000
DilipSingh,10,0.700000,0,0.0,0,0.0,0,0.000000,1,1.0,...,0,0.0,1,1.0,0,0.0,0,0.0,0,0.000000
NoorAnand,6,0.833333,0,0.0,0,0.0,0,0.000000,0,0.0,...,0,0.0,0,0.0,0,0.0,0,0.0,0,0.000000


## Setting up NearestNeighbors

In [6]:
o

In [7]:
nn = NearestNeighbors(n_neighbors=20)
nn.fit(adv_attr_df)

In [8]:
distances, indices = nn.kneighbors(adv_attr_df)

In [10]:
indices

array([[   0,  750,  426, ...,  106,  231,   82],
       [   1, 1149, 1035, ...,  162,  511, 1700],
       [   2,  188,  174, ...,  457, 1482,  261],
       ...,
       [1997, 1407, 1561, ...,  695,  884, 1792],
       [1998, 1888, 1787, ..., 1999, 1820,  540],
       [1999, 1998,  202, ..., 1646,  337,  782]])

## Test pipeline

In [61]:
import numpy as np
import scipy.stats as ss
from collections import defaultdict
from tqdm import tqdm

Creating a random set of activation values

In [12]:
activation_vals = np.random.uniform(0, 1, 2000)
activation_vals = {adv_attr_df.index[i]: v for i, v in enumerate(activation_vals)}

In [82]:
activation_vals = {
    i: {adv_attr_df.index[j]: v for j, v in enumerate(np.random.uniform(0, 1, 2000))} for i in range(100)}

Creating ranking function

In [14]:
def rank_vals(values):
    ranks = ss.rankdata(list(map(lambda x: 1.0 - x, values.values()), method="ordinal")
    print(ranks)
    idx = list(values.keys())
    rank_dict = {idx[i]: v for i, v in enumerate(ranks)}
    return rank_dict

In [15]:
ranks = rank_vals(activation_vals)

In [83]:
def consistency(
    activation_vals_dict,
    attr_df,
    n_neighbors=20):
    overall_scores = defaultdict(lambda: list())
    nn = NearestNeighbors(n_neighbors=n_neighbors+1).fit(attr_df.to_numpy())
    distances, indices = nn.kneighbors(attr_df)
    indices = indices[:, 1:]
    indices = {k: v for k, v in zip(attr_df.index, indices)}
    for activation_vals in tqdm(activation_vals_dict.values()):
        idx = list(activation_vals.keys())
        for individual in activation_vals:
            # Reshape for single sample
            score = abs(activation_vals[individual] - sum(map(lambda x: activation_vals[idx[x]], indices[individual].flatten())) * 1./20)
            overall_scores[individual].append(score)
    c_values = {}
    for individual, scores in overall_scores.items():
        c_values[individual] = 1 - sum(scores) * 1./len(scores)
    return c_values

In [84]:
score = consistency(activation_vals, adv_attr_df)

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 125.13it/s]


In [85]:
score

{'ASChanhiok': 0.7386369058537428,
 'MeeraBhatia': 0.7391943554711489,
 'OPSaxena': 0.7304177803066736,
 'NehaKapoor': 0.7397547384643847,
 'NWaziri': 0.7291775069903993,
 'RajeevMRoy': 0.7274016902823888,
 'JaideepMalik': 0.7160765653696772,
 'HarshPrabhakar': 0.7598698310705064,
 'SMuralidhar': 0.7507707857651419,
 'SarfrazAhmad': 0.7585925424143694,
 'SanjayJai': 0.7459993483451601,
 'SuryakantSingla': 0.7532036052013911,
 'AmitMahjan': 0.7270059862784967,
 'RaviPrakash': 0.7362964037688604,
 'VinayGupta': 0.7742395491078868,
 'AkhiSibal': 0.7461511536173251,
 'SaurabhKirpal': 0.7218439442785332,
 'GuruKrishnaKumar': 0.7323387074490167,
 'GauravSharma': 0.7475051895999051,
 'SandipSethi': 0.7583955898487018,
 'JayantMehta': 0.7497003519903898,
 'GarimaSachdeva': 0.7411949864185882,
 'AnuGupta': 0.7315434713356599,
 'AviralTiwari': 0.7471040513292835,
 'MohanRao': 0.7558045226462606,
 'NirajSingh': 0.7539527479269859,
 'RamanKapur': 0.750491741575191,
 'RajeevVirmani': 0.762540094294