<a href="https://colab.research.google.com/github/tatiana-ka/sample-size-simulator-soc-science/blob/main/Simulator2defineNcoders.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Simulation to define a number of coders to hire from the panel**

Quite often in social sicence researchers need to obtain an "objective" evaluation of something subjective. For example, to evaluate liking/attractiveness/understandability/creativity of a text. To create stable measure for something subjective like this a common practice is to have each instance (each text/ each picture) evaluated by a number of independent people. These people are frequently recruited from crowdsourcing pltforms and panels such as Amazon MTurk or Prolific. The idea is that if you have enough people trying to objectuvly evaluate something subjective, you'll get a desent ibjective measure buyiling on the wisdom of the crowd. 

To put bluntly, if 10 people evaluate creativity of a product (ideally a product they themselves might use), that is more objective than one expert evaluation. After all, if these people are the potential benefitiaries, their evaluation is more important anyways. 

Usually the number of evaluations of each instance needed to create a balanced objective evaluation is defined based on some prior literature or common sense or both. The tricky side, however, could be defining the sample size needed to reach at least the minimum required number of evaluations for each instance of text. 

For example: A researcher needs each text to be evaluated at least 10 times to create an objective evaluation measure. If researchers gives each of the crowdsourcing workers 5 texts to evaluate, these texts are randomly pulled from a list of N texts. Let's say N = 100. The next worker gets another 5 texts to evaluate they could be completely different to the ones the first worker evaluated. Or it could be that both of them evaluated 1 same text and the rest did not overlap. So after 2 workers, we end up with 8 texts that were evaluated once and 1 text that was evaluated twice. 

Many survey platforms allow for this kind of set up: you upload your file with 100 texts and pipe this text into the interface so that each participant sees 1 random text from the list of 100. Then you loop this action a few times. In this example 5 times. 

Since many platforms offer that, it's no problem to set it up. However, setting up the limit that a text gets kicked out of the presentation list once it reaches 10 evaluations is more complicated. If you know how to easily set this limit up, don't bother reading on.

Those who do not know how to limit maximum evaluations per text have to rely on randomness. That means that some texts will be evaluated more than 10 times and others less. It's all random. Having more than 10 evaluations is usually not a problem methodologically - only makes the measure less noisy. The tricky part is to figure out how many people to recruit so that each text has a high probablity to get at least 10 required evaluations. 

Of course, one could just let the evaluation run and recruit participants until each idea gets 10 evaluations. But that probably will not fly very well. Usually you need to define the number of workers/participants you need in advance (when you post the crowdsourcing task). More importantly, quite often you need to preregister the study and explain reasonably, how you defined your sample size. Here "I'm just gonna let it roll and see" will not fly. 

The following code provides a function that helps to define this sample. 
In the function you need to enter: 
* number of texts ```n_ideas``` that need evaluation, 
* number of ideas each worker needs to evaluate ```ideas_pp```
* total number of workers planned to recruit ```n_coders```
* since it is a simulation of this crowdousing evaluation task, it is necessary to enter number of times you want to run a simulation ```simulation_loops```. The default is 1, i reccomend 10 000. 

The function then returns minimum and maximum number of times an idea was evaluated during each simulation. After that you can calculate a probability of a text being evaluated less than the required number of times. If we run the simulation 10 000 this probability should be quite stable. In social sciences p-value is still quite a big thing, so going for a sample size where a probability of a text being evaluated less than the required number of times is below 5% is a desent choice. Because it's defendable. 

The function requres number of workers, which is odd, since that's what we need to find in this problem anyways. So the created function needs to be plugged in a loop and run for each thinable sample size.

The following code walks through these steps. 

In [1]:
#load necessary packages
import numpy as np
np.random.seed(42)
import pandas as pd

**Simulation function**
  
* Creates IDs for n_ideas
* Randomly draws n = ideas_pp ideas
* Repeats n_coders times
* Repeats the whole loop sinulation_loops times
* Calculates minimum and maximum number of times an idea is evaluated



In [2]:
#n_ideas # ideas in a sample
#n_coders # number of coders
#ideas_pp # number of ideas per person
#simulation_loops # number of times i want to run a simulation

def sumulate_eval_number(n_ideas, n_coders, ideas_pp, simulation_loops = 1):
  count_min = np.empty(simulation_loops)
  count_max = np.empty(simulation_loops)

  for i in range(simulation_loops):
    ideas = np.array(range(n_ideas))
    sample_draws = np.empty(0)
    for _ in range(n_coders): # simulate a coding study
      instance_draws = np.random.choice(ideas, size = ideas_pp, replace = False) # random sample each coder gets
      sample_draws = np.append(sample_draws, instance_draws) # list all ideas that got coded at some point


    unique, count = np.unique(sample_draws, return_counts=True)

    count_min[i] = np.min(count)
    count_max[i] = np.max(count)

  return count_min, count_max 

**Simulate for:** 
* a range of N 400 to 700 crowdsourcing workers (step of 10)
* with 100 texts to evaluate
* with 5 texts per person
* each text should be evaluated at least 10 times
* create 10000 simulations

*Depending on how big is the range of n workers you set, the simulation might take a bit to calculate. If the search for sample is broad, it might be a good idea to set simulation_loops to 1000 first to get the estimate. Don't forget to rerun for 10000 simulations after. 1000 can be a bit unstable.* 

In [6]:
cmin_p = []
coders = np.arange(400,700,10) #define range of possible N of workers

for num in coders:
  cmin, cmax = sumulate_eval_number(n_ideas = 100,n_coders = num,ideas_pp = 5, simulation_loops= 10000) #define the parameters (except N of workers)
  cmin_p.append(np.sum(cmin <=10)/len(cmin)) # 10 stands for minimum number of evaluations per instance

df = pd.DataFrame({'N_workers': coders, 'Probability': cmin_p}) #pack the results nicely in a table
df #print the table


Unnamed: 0,N_workers,Probability
0,400,0.6261
1,410,0.5208
2,420,0.4173
3,430,0.3277
4,440,0.2635
5,450,0.1937
6,460,0.1496
7,470,0.1124
8,480,0.0834
9,490,0.0601


Based on this output, we would need 500 crowdosurcing workers. The probablity that a text would be rated less than 10 times in this case would be 4.56%. 