In [2]:
import numpy as np
import pandas as pd

# Part I



Approach:

1. Set critical parameters:
- Total number of students in applicant pool: 100
- Number of students in advantaged pool: 50
- Number of students in disadvantaged pool: 50
- Probability a student has x_i = 1 for x in [1,2] if from advantaged group: 2/3
- Probability a student has x_i = 1 for x in [1,2] if from disadvantaged group: 1/3

2. Initialize student pool based on above parameters.
- Note: can use numpy.random() to generate random numbers between 0 and 1, allows us to model simple probabilities.

3. For each of the four scenarios, generate table of student f-value rankings based on x_1 and x_2 for a given student, and include information on whether they are from the advantaged group (A) or disadvantaged group (D).
- Scenario 1: Use both x_1 and x_2 information but not group information
- Scenario 2: Do not use x_2 and do not use group membership nformation
- Scenario 3: Do not use x_2, but use group membership information.
- Scenario 4: Use x_2 for students in D but do not use group membership information otherwise.

4. Calculate the average f-value for the top 5/18 of the students in the pool for each scenario (efficiency of algorithm). -> 27 students

5. Assess the proportion of admitted students in A and D and compare to the proportion of students in the applicant pool from A and D (equity of algorithm).

6. Repeat above analysis (steps 2-5) for a large number of iterations to check if the algorithm converges to the theoretical expectation.

In [30]:
## Initialize student pool based on above parameters
def disadvantaged_random():
    val = np.random.random()
    if (val) >= 0.67:
        return 1
    else:
        return 0

def advantaged_random():
    val = np.random.random()
    if (val) >= 0.34:
        return 1
    else: 
        return 0

d_x1 = []
d_x2 = []
a_x1 = []
a_x2 = []
for x in range(0,50):
    d_x1.append(disadvantaged_random())
    d_x2.append(disadvantaged_random())
    a_x1.append(advantaged_random())
    a_x2.append(advantaged_random())




In [31]:
x_1 = a_x1 + d_x1
x_2 = a_x2 + d_x2
g = ['A' for x in range(0,50)] + ['B' for x in range(0,50)]
f = []


students_df = pd.DataFrame({"x_1":x_1,
                            "x_2":x_2,
                            "g":g })

In [32]:
students_df["f"] = students_df.apply(lambda x: 1  if (x["x_1"]==1 and x["x_2"]==1) else 0, axis=1)

In [33]:
students_df.sort_values(["f"], ascending=False)

Unnamed: 0,x_1,x_2,g,f
30,1,1,A,1
26,1,1,A,1
38,1,1,A,1
48,1,1,A,1
72,1,1,B,1
...,...,...,...,...
39,0,0,A,0
36,1,0,A,0
35,0,0,A,0
34,0,1,A,0


In [34]:
# dataframe of students with (1,1)
np.mean(students_df['f'] == 1)

0.26

In [78]:
# dataframe of students with (1,1)
true_f_df = students_df[students_df['f'] == 1]

# dataframe of students without (1,1) who are A
false_f_df_a = students_df[(students_df['f'] == 0) & (students_df['g'] == 'A') & ((students_df['x_1'] == 1) | (students_df['x_2'] == 1))]

# dataframe of students without (1,1) who are B
false_f_df_d = students_df[students_df['f'] == 0 & (students_df['g'] == 'B') & ((students_df['x_1'] == 1) | (students_df['x_2'] == 1))]

# number of students with (1,1) and target goal
potential_admits = (np.mean(students_df["f"])) * 100
target = 27

# final list of admitted students
final_list = []

# if 'f == 1' is less than target
if (potential_admits < target):
    final_list = true_f_df
    count = target - potential_admits
    while (count > 0):
        final_list = pd.concat([final_list, false_f_df_a.sample(n=2), false_f_df_d.sample(n=1)])
        count -= 3
        potential_admits += 3
        print(len(final_list))
if (potential_admits > target & len(final_list) > 0) :
    final_list = final_list.sample(n=target)
    print(len(final_list))
if (potential_admits > target & len(final_list) == 0) :
    final_list = true_f_df.sample(n=target)
    print(len(final_list))
if (potential_admits == target) :
    final_list = true_f_df

final_list

29
27


Unnamed: 0,x_1,x_2,g,f
45,1,1,A,1
97,1,1,B,1
49,0,1,A,0
30,1,1,A,1
48,1,1,A,1
17,1,1,A,1
44,1,1,A,1
27,1,1,A,1
59,1,0,B,0
42,1,1,A,1


In [94]:
# average f
print(f"The average F score of students admitted is {round(np.mean(final_list['f']), 3)}")

# proporation of a and d admitted to population
a_equity = final_list['g'].value_counts()['A'] / students_df['g'].value_counts()['A']
b_equity = final_list['g'].value_counts()['B'] / students_df['g'].value_counts()['B']

print(f"The equity of A admited is {a_equity}. \nThe equity of B admitted is {b_equity}")


The average F score of students admitted is 0.889
The equity of A admited is 0.42. 
The equity of B admitted is 0.12


In [203]:
# SCEARNIO 3 !

def disadvantaged_random():
    val = np.random.random()
    if (val) >= 0.67:
        return 1
    else:
        return 0

def advantaged_random():
    val = np.random.random()
    if (val) >= 0.34:
        return 1
    else: 
        return 0

def print_equity(num_A,num_D):
    print('assessing equity: ')
    print('% of students from A: ',100*(num_A/(num_A+num_D)))
    print('% of students from D: ',100*(num_D/(num_A+num_D)))

    return 50-100*(num_A/(num_A+num_D))

d_x1 = []
d_x2 = []
a_x1 = []
a_x2 = []
for x in range(0,50):
    d_x1.append(disadvantaged_random())
    d_x2.append(disadvantaged_random())
    a_x1.append(advantaged_random())
    a_x2.append(advantaged_random())

x_1 = a_x1 + d_x1
x_2 = a_x2 + d_x2
g = ['A' for x in range(0,50)] + ['D' for x in range(0,50)]
f = []

students_df = pd.DataFrame({"x_1":x_1,
                            "x_2":x_2,
                            "g":g })
students_df["f"] = students_df.apply(lambda x: 1  if (x["x_1"]==1 and x["x_2"]==1) else 0, axis=1)

num_admitted_students = round(100*5/18)

# dataframe of students with x_1 = 1
true_f_df = students_df[students_df['x_1'] == 1]
sorted_students_df = true_f_df.sort_values(["f"], ascending=False)
# display(sorted_students_df)

# number of students with (1,1) and target goal
potential_admits = len(sorted_students_df)
target = num_admitted_students

# final list of admitted students
final_list = []

if (potential_admits > target & len(final_list) == 0) :
    final_list = sorted_students_df[:][0:target]
    print(len(final_list))
if (potential_admits == target) :
    final_list = sorted_students_df

display(final_list)
# average f
print(f"The average F score of students admitted is {round(np.mean(final_list['f']), 3)}")

# proporation of a and d admitted to population
a_equity = final_list['g'].value_counts()['A'] / students_df['g'].value_counts()['A']
d_equity = final_list['g'].value_counts()['D'] / students_df['g'].value_counts()['D']

print(f"The equity of A admited is {a_equity}. \nThe equity of D admitted is {d_equity} \nBecause we are considering only x_1, more people will pass initial round and gain very high efficency level amounts of equity based on random selection. But, by ranking of f D equity drastically goes down.")


28


Unnamed: 0,x_1,x_2,g,f
97,1,1,D,1
25,1,1,A,1
81,1,1,D,1
79,1,1,D,1
32,1,1,A,1
33,1,1,A,1
35,1,1,A,1
36,1,1,A,1
1,1,1,A,1
78,1,1,D,1


The average F score of students admitted is 1.0
The equity of A admited is 0.42. 
The equity of D admitted is 0.14 
Because we are considering only x_1, more people will pass initial round and gain very high efficency level amounts of equity based on random selection. But, by ranking of f D equity drastically goes down.


In [234]:
# SCEARNIO 4 ! (x_2 in D, no group otherwise)

def disadvantaged_random():
    val = np.random.random()
    if (val) >= 0.67:
        return 1
    else:
        return 0

def advantaged_random():
    val = np.random.random()
    if (val) >= 0.34:
        return 1
    else: 
        return 0

def print_equity(num_A,num_D):
    print('assessing equity: ')
    print('% of students from A: ',100*(num_A/(num_A+num_D)))
    print('% of students from D: ',100*(num_D/(num_A+num_D)))

    return 50-100*(num_A/(num_A+num_D))

d_x1 = []
d_x2 = []
a_x1 = []
a_x2 = []
for x in range(0,50):
    d_x1.append(disadvantaged_random())
    d_x2.append(disadvantaged_random())
    a_x1.append(advantaged_random())
    a_x2.append(advantaged_random())

x_1 = a_x1 + d_x1
x_2 = a_x2 + d_x2
g = ['A' for x in range(0,50)] + ['D' for x in range(0,50)]
f = []

students_df = pd.DataFrame({"x_1":x_1,
                            "x_2":x_2,
                            "g":g })

students_df["f"] = students_df.apply(lambda x: 1  if (x["x_1"]==1 and x["x_2"]==1) else 0, axis=1)

num_admitted_students = round(100*5/18)

# dataframe of students with (1, 1)
true_f_df = students_df[students_df['f'] == 1][students_df['g'] == "D"]
print("Number of D with (1,1):  ", len(true_f_df))

# dataframe of students with (1,any)
half_f_df = students_df.drop(true_f_df.index)
half_f_df = half_f_df[half_f_df['x_1'] == 1]
sorted_students_df = half_f_df.sort_values(["f"], ascending=False)

# target goal
target = num_admitted_students

# final list of admitted students
final_list = []

## add D students first that have (1,1) ##
final_list = true_f_df
potential_admits = len(half_f_df)  # edit size of students left 
target = target - len(true_f_df)  # edit size of students to add

# if (potential_admits < target): -> won't happen
if (potential_admits > target) :
    final_list = pd.concat([final_list, sorted_students_df[:][0:target]])
# if (potential_admits > target & len(final_list) == 0) : -> won't happen
if (potential_admits == target) :
    final_list = pd.concat([final_list, sorted_students_df])

display(final_list)
# average f
print(f"The average F score of students admitted is {round(np.mean(final_list['f']), 3)}")

# proporation of a and d admitted to population
a_equity = final_list['g'].value_counts()['A'] / students_df['g'].value_counts()['A']
d_equity = final_list['g'].value_counts()['D'] / students_df['g'].value_counts()['D']

print(f"The equity of A admited is {a_equity}. \nThe equity of D admitted is {d_equity} \nBecause we are considering only x_1 for F = 1, more people will pass initial round and gain very high efficency level amounts of equity.")


Number of D with (1,1):   6


  true_f_df = students_df[students_df['f'] == 1][students_df['g'] == "D"]


Unnamed: 0,x_1,x_2,g,f
56,1,1,D,1
57,1,1,D,1
59,1,1,D,1
65,1,1,D,1
71,1,1,D,1
88,1,1,D,1
0,1,1,A,1
27,1,1,A,1
42,1,1,A,1
41,1,1,A,1


The average F score of students admitted is 0.821
The equity of A admited is 0.36. 
The equity of D admitted is 0.2 
Because we are considering only x_1 for F = 1, more people will pass initial round and gain very high efficency level amounts of equity.


PART II

Approach:
1. Define how the college admittance status and group membership of each Generation 1 student impacts the group membership of each Generation 2 student.
- Note: Assume asexual reproduction, each Generation 1 student produces 1 Generation 2 student that is influenced by their parent.

2. Define parameters for x_1 and x_2 probabilities for subsequent generations as influenced by group membership that is influenced by the previous generation's group membership and admittance to college.

3. Use the algorithm from Part I to simulate college admissions process for the next generation based on the previous generation's group membership and college admission status.

4. Repeat step 3. for 100-1000 generations.

5. Assess the social mobility (change in proportion of groups A and D over time) in all four scenarios.

6. Perform sensitivity analysis by varying the parameters of the model and then assessing the social mobility?