# Analisi costi task
---

# Riepilogo dati disponibili

File di riferimento con input usati in precedenza:

    Dataset-TaskDesign02-proc-fragments-pairs
    
Dati disponibili per ogni nodo (attività):

    process_id
    node_id
    node_label
    node_image

    
Totale coppie originali: **989**

Totale processi: **4**

Totale nodi: **76**

Totale immagini di frammento (1 per nodo): **76**

Label (nome nodo) unici: **65**

Label veramente unici (ignorando maiuscole/minuscole) : **59**

### Proc82

Totale nodi: **13**

Lista label:

    Apply Online
	Send Online Protocol
	Evaluate
	Acceptance
	Pay for Aptitude test
	Send letter of rejection
	Rank Students according to GPA and the test results
	Check Documents
	Documents received
	Send Documents by Post
	Keep in the Applicant pool
	Take Aptitude test
	Invite to an aptitude test

### Proc83

Totale nodi: **23**

Lista label:

    check documents
	less than 16 cp in mathematics
	send letter of rejection
	sufficient cp in mathematics
	take oral exam
	evaluate
	wait for bachelor's certificate
	send bachelor's certificate
	send letter of provisional acceptance
	rejected
	check certificate
	provisional acceptance cancelled
	accepted provisionally
	send letter of acceptance
	provisional acceptance confirmed
	average grade is less than good
	average grade is good or better
	accepted
	certificate received
	certificate received
	documents received
	apply online
	send documents by post


### Proc84

Totale nodi: **23**

Lista label:

    check application in time
	send application
	complete and in time?
	check application complete
	complete application
	receive application
	fill out application form
	german?
	add certificate of german language
	set additional requirements
	hand application over to examining board
	check if bachelor is sufficient
	add certificate of bachelor degree
	invite for talk
	receive rejection
	check if bachelors-grade within top 85%
	send rejection
	immatriculate
	receive acceptance
	send acceptance
	rank with other applicants
	document
	talk to applicant


### Proc89

Totale nodi: **17**

Lista label:

    documents received
	rejected
	send letter of rejection
	send letter of acceptance
	rejected
	check bachelor's degree
	bridging courses > 30 cp
	conduct interview
	check documents
	go to interview
	send interview invitation
	forward documents
	documents received
	evaluate
	bridging courses < 30 cp
	apply online
	send documents by post
    
## Coppie originali

### Proc82 e Proc83

Totale coppie possibili: **299** (13 * 23)

Totale coppie effettive (nel file): **299**

### Proc82 e Proc84

Totale coppie possibili: **299** (13 * 23)

Totale coppie effettive (nel file): **299**

### Proc83 e Proc89

Totale coppie possibili: **391** (23 * 17)

Totale coppie effettive (nel file): **391**

    
## Sovrapposizioni

(Case insensitive)
(Le sovrapposizioni 82-83 e 82-89 coincidono)

### Proc82 e Proc83

Totale nodi sovrapposti: **6**

Lista nodi sovrapposti:

    evaluate
    check documents
    documents received
    send documents by post
    send letter of rejection
    apply online

    
### Proc82 e Proc84

Totale nodi sovrapposti: **0**

Lista nodi sovrapposti: **N/A**
    
### Proc82 e Proc89

Totale nodi sovrapposti: **6**

Lista nodi sovrapposti:

    evaluate
    check documents
    documents received
    send documents by post
    send letter of rejection
    apply online

    
### Proc83 e Proc84

Totale nodi sovrapposti: **0**

Lista nodi sovrapposti: **N/A**
    
### Proc83 e Proc89

Totale nodi sovrapposti: **8**

Lista nodi sovrapposti:

    evaluate
    check documents
    documents received
    rejected
    send documents by post
    send letter of rejection
    send letter of acceptance
    apply online
    
### Proc84 e Proc89

Totale nodi sovrapposti: **0**

Lista nodi sovrapposti: **N/A**



## Costi Crowdflower

Riferimenti
- https://success.crowdflower.com/hc/en-us/articles/202703165-Get-Results-Job-Costs
- https://success.crowdflower.com/hc/en-us/articles/217741663-Guide-to-Pay-Page
- https://success.crowdflower.com/hc/en-us/articles/201855719-Guide-to-Basic-Job-Settings-Page

### Stima costo

Su Crowdflower è riportata la formula:
    
    Estimated job cost = (Judgments per row * (Pages of work * Price per page)) + buffer + transaction fee
    
In realtà a `Price per page` va aggiunto un fee del 27.5%

`Transaction fee` corrisponde al 20% del costo totale (contributors) escluso il buffer

Quindi:
    
    Contributors judgment cost = Judgments per row * Pages of work * Price per page * 1.275
    
    Estimated job cost = Contributors judgment cost * 1.20 + buffer
    
Il valore `Pages of work` varia se vengono usate le domande test o meno.

Se le domande test sono in uso ne viene inserita 1 per pagina

Quindi:

    Pages of work (no test) = ceiling(Total rows / Rows per page)
    
    Pages of work (test) = ceiling(Total rows / (Rows per page - 1))
    

In [2]:
import math

def contributors_cost(judgments_per_row, pages_of_work, price_per_page, fee_per_page = 0.275):
    return judgments_per_row * pages_of_work * price_per_page * (1 + fee_per_page)

def estimated_job_cost(contributors_cost, buffer = 0, cf_fee = 0.2):
    return contributors_cost * (1 + cf_fee) + buffer

def pages_of_work(tot_rows, rows_per_page, use_tests = False):
    eff_rows_per_page = rows_per_page if not use_tests else rows_per_page - 1
    return math.ceil(tot_rows / eff_rows_per_page)

def price_per_row(price_per_page, rows_per_page, use_tests = False):
    eff_rows_per_page = rows_per_page if not use_tests else rows_per_page - 1
    return price_per_page / eff_rows_per_page

In [3]:
# Formatting helpers
def fmtfl(fl):
    """
    Format a float to 2 decimal digits
    """
    return "{:.2f}".format(fl)

def fmtbold(s):
    """
    Make a string bold
    """
    return f"\033[1m{s}\033[0m"

In [4]:
def print_task_cost(task_name, judgments_per_row, rows, rows_per_page, price_per_page, 
                    use_tests = False, fee_per_page = 0.275, cf_fee = 0.2, buffer = 0):
    """
    Use the tasks settings to calculate and print the task costs.
    """
    _pow = pages_of_work(rows, rows_per_page, use_tests)
    
    print(f"{fmtbold(task_name)} costs with:")
    print(f"  {judgments_per_row} judgments per row")
    print(f"  {_pow} pages of work")
    print(f"    from {rows} rows and {rows_per_page} rows per page")
    if use_tests:
        print(f"    USING test questions (1 test question per page)")
    else:
        print(f"    no test questions used")
    
    _ppr = price_per_row(price_per_page, rows_per_page, use_tests)
    print(f"  ${price_per_page} price per page (${fmtfl(_ppr)} price per row)")
    
    if fee_per_page == 0.275 and cf_fee == 0.2 and buffer == 0:
        print(f"  Default fees and no buffer")
    else:
        print(f"  CUSTOM fees")
        print(f"    fee per page: ${fee_per_page}")
        print(f"    crowdflower's fee: ${cf_fee}")
        print(f"  BUFFER: ${buffer}")
    
    # Calculate and print costs
    cc = contributors_cost(judgments_per_row, _pow, price_per_page, fee_per_page)
    ejc = estimated_job_cost(cc, buffer, cf_fee)
    cc_out = f"Contributors cost:  ${fmtfl(cc)}"
    ejc_out = f"Estimated job cost: ${fmtfl(ejc)}"
    print(f"{fmtbold(cc_out)}")
    print(f"{fmtbold(ejc_out)}")

In [5]:
# Testing the output
# Not a real cost
"""
Example Task costs with:
  3 judgments per row
  40 pages of work
    from 200 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
Contributors cost:  $15.30
Estimated job cost: $18.36
"""
print_task_cost("Example Task", judgments_per_row = 3, rows = 200, rows_per_page = 5, price_per_page = 0.1)

[1mExample Task[0m costs with:
  3 judgments per row
  40 pages of work
    from 200 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $15.30[0m
[1mEstimated job cost: $18.36[0m


## Design 1 - Similarity Between Activities

Questo design si basa su ContextOne e si possono riutilizzare le stesse 989 coppie originali come input.

#### Design 1; TEST RUN; 100

Test run del Design 1 con 100 coppie

In [6]:
print_task_cost("Design1-Test-100", judgments_per_row = 3, rows = 100, rows_per_page = 5, price_per_page = 0.1)

[1mDesign1-Test-100[0m costs with:
  3 judgments per row
  20 pages of work
    from 100 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $7.65[0m
[1mEstimated job cost: $9.18[0m


#### Design 1; TEST RUN; 299

Test run del Design 1 con 299 coppie (come coppie processi 82-83 o 82-84)

In [7]:
print_task_cost("Design1-Test-299", judgments_per_row = 3, rows = 299, rows_per_page = 5, price_per_page = 0.1)

[1mDesign1-Test-299[0m costs with:
  3 judgments per row
  60 pages of work
    from 299 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $22.95[0m
[1mEstimated job cost: $27.54[0m


#### Design 1; TEST RUN; 391

Test run del Design 1 con 391 coppie (come coppie processi 83-89)

In [8]:
print_task_cost("Design1-Test-391", judgments_per_row = 3, rows = 391, rows_per_page = 5, price_per_page = 0.1)

[1mDesign1-Test-391[0m costs with:
  3 judgments per row
  79 pages of work
    from 391 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $30.22[0m
[1mEstimated job cost: $36.26[0m


#### Design 1; FULL RUN; 989

Full run del Design 1 con le 989 coppie originali

In [9]:
print_task_cost("Design1-Full-989", judgments_per_row = 3, rows = 989, rows_per_page = 5, price_per_page = 0.1)

[1mDesign1-Full-989[0m costs with:
  3 judgments per row
  198 pages of work
    from 989 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $75.73[0m
[1mEstimated job cost: $90.88[0m


## Design 2 - Activity Enrichment Through Keywords

Questo design può avere come input row del tipo

    | process_id | node_id | node_label | node_img |
    
estratte dai vecchi dati.

Ogni row si riferisce ad un nodo (attività) a cui nel task i worker dovranno assegnare una lista di parole chiave.

Ci sono 76 nodi distinti con altrettante immagini associate, quindi ci saranno 76 row.

In [10]:
print_task_cost("Design2-Full-76", judgments_per_row = 3, rows = 76, rows_per_page = 5, price_per_page = 0.1)

[1mDesign2-Full-76[0m costs with:
  3 judgments per row
  16 pages of work
    from 76 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $6.12[0m
[1mEstimated job cost: $7.34[0m


Si sono avuti i seguenti risultati dopo l'effettivo lancio del test con la seguente configurazione

    print_task_cost("Design2-Full-76", 
                    judgments_per_row = 3, rows = 76, 
                    rows_per_page = 5, price_per_page = 0.1)
                    

Trusted judgments: **228** (judgments accettati e inclusi nell'output)

Untrusted judgments: **45** (judgments rifiutati durante lo svolgimento dell'esperimento e rimossi dall'output)

Total judgments: **273** (trusted + untrusted, questo è il totale dei judgments pagati)

Costo (come riportato da CF): **$6.84**

Dato che le row sono poche si potrebbero aumentare i judgment per row richiesti per avere più liste di parole chiave associate alle attività

In [11]:
# 4 Judgments per row
print_task_cost("Design2-Full-76-4jdg", judgments_per_row = 4, rows = 76, rows_per_page = 5, price_per_page = 0.1)

[1mDesign2-Full-76-4jdg[0m costs with:
  4 judgments per row
  16 pages of work
    from 76 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $8.16[0m
[1mEstimated job cost: $9.79[0m


In [12]:
# 5 Judgments per row
print_task_cost("Design2-Full-76-5jdg", judgments_per_row = 5, rows = 76, rows_per_page = 5, price_per_page = 0.1)

[1mDesign2-Full-76-5jdg[0m costs with:
  5 judgments per row
  16 pages of work
    from 76 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $10.20[0m
[1mEstimated job cost: $12.24[0m


In [13]:
# 6 Judgments per row
print_task_cost("Design2-Full-76-6jdg", judgments_per_row = 6, rows = 76, rows_per_page = 5, price_per_page = 0.1)

[1mDesign2-Full-76-6jdg[0m costs with:
  6 judgments per row
  16 pages of work
    from 76 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $12.24[0m
[1mEstimated job cost: $14.69[0m


In [14]:
# 7 Judgments per row
print_task_cost("Design2-Full-76-7jdg", judgments_per_row = 7, rows = 76, rows_per_page = 5, price_per_page = 0.1)

[1mDesign2-Full-76-7jdg[0m costs with:
  7 judgments per row
  16 pages of work
    from 76 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $14.28[0m
[1mEstimated job cost: $17.14[0m


In [15]:
# 10 Judgments per row
print_task_cost("Design2-Full-76-10jdg", judgments_per_row = 10, rows = 76, rows_per_page = 5, price_per_page = 0.1)

[1mDesign2-Full-76-10jdg[0m costs with:
  10 judgments per row
  16 pages of work
    from 76 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $20.40[0m
[1mEstimated job cost: $24.48[0m


## Design 4 - Search Query Feedback

Questo design, se come query si considerano i label dei nodi, come input può avere row nel formato

    | query_process_id | query_node_id | query_node_label | ...(continua sotto)...
    | result_proc_id | result_group_id | result_group_img | result_group_labels |

    query_process_id: id del processo da cui viene estratta la query
    query_node_id: id del nodo da cui viene estratta la query
    query_node_label: la query è il label di un nodo
    result_proc_id: l'id del processo ottenuto come risultato dalla query
    result_group_id: id del gruppo di nodi contenuti nel risultato 
                     (non si confronta più un nodo alla volta ma un insieme di nodi)
    result_group_img: risultato della query, l'immagine del frammento di processo contenente dei nodi
    result_group_labels: label delle attività contenuti nell'immagine che rappresenta il risultato della query
    
Raggruppando (soggettivamente) i nodi già presenti nel vecchio input si ottengono:

Gruppi proc 82: **4**

Gruppi proc 83: **6**

Gruppi proc 84: **6**

Gruppi proc 89: **5**

Se si considerano le coppie originali (82-83, 82-84, 83-89) si ottengono i numeri di row

L'ordine dei processi nella coppia è importante perché dal primo processo vengono estratti i label mentre dal secondo i gruppi.

**Primo caso**

Row 82-83: **78** (13 label di 82 * 6 gruppi di 83)

Row 82-84: **78** (13 * 6)

Row 83-89: **115** (23 * 5)

Totale row primo caso: **271** (78 + 78 + 115)

**Secondo caso**

Row 83-82: **92** (23 label di 83 * 4 gruppi di 82)

Row 84-82: **92** (23 * 4)

Row 89-83: **102** (17 * 6)

Totale row secondo caso: **286** (92 + 92 + 102)

Totale row primo e secondo caso: **557** (271 + 286)

Per l'esperimento si considerano tutte le row del primo caso (271) più le row della coppia 83-82 dal secondo caso (92)

**Totale row esperimento:** **363** (271 + 92)

In [16]:
print_task_cost("Design4-Test-100", judgments_per_row = 3, rows = 100, rows_per_page = 5, price_per_page = 0.1)

[1mDesign4-Test-100[0m costs with:
  3 judgments per row
  20 pages of work
    from 100 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $7.65[0m
[1mEstimated job cost: $9.18[0m


In [47]:
print_task_cost("Design4-Test-35", judgments_per_row = 2, rows = 80, rows_per_page = 5, price_per_page = 0.1)

[1mDesign4-Test-35[0m costs with:
  2 judgments per row
  16 pages of work
    from 80 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $4.08[0m
[1mEstimated job cost: $4.90[0m


In [16]:
print_task_cost("Design4-Test-100", judgments_per_row = 2, rows = 90, rows_per_page = 5, price_per_page = 0.1)

[1mDesign4-Test-100[0m costs with:
  2 judgments per row
  18 pages of work
    from 90 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $4.59[0m
[1mEstimated job cost: $5.51[0m


In [17]:
print_task_cost("Design4-Full-271", judgments_per_row = 3, rows = 271, rows_per_page = 5, price_per_page = 0.1)

[1mDesign4-Full-271[0m costs with:
  3 judgments per row
  55 pages of work
    from 271 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $21.04[0m
[1mEstimated job cost: $25.24[0m


In [18]:
print_task_cost("Design4-Full-286", judgments_per_row = 3, rows = 286, rows_per_page = 5, price_per_page = 0.1)

[1mDesign4-Full-286[0m costs with:
  3 judgments per row
  58 pages of work
    from 286 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $22.19[0m
[1mEstimated job cost: $26.62[0m


In [19]:
print_task_cost("Design4-Full-271+286", judgments_per_row = 3, rows = 271+286, rows_per_page = 5, price_per_page = 0.1)

[1mDesign4-Full-271+286[0m costs with:
  3 judgments per row
  112 pages of work
    from 557 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $42.84[0m
[1mEstimated job cost: $51.41[0m


In [20]:
print_task_cost("Design4-Full-Final-271+92", judgments_per_row = 3, rows = 271+92, rows_per_page = 5, price_per_page = 0.1)

[1mDesign4-Full-Final-271+92[0m costs with:
  3 judgments per row
  73 pages of work
    from 363 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $27.92[0m
[1mEstimated job cost: $33.51[0m


## Design 7 - One To Many With Labels

Questo design chiede al worker di stabilire le somiglianze tra due frammenti di processo e poi, data un'attività contenuta nel primo frammento, di stabilire quali attività del secondo frammento siano rispettivamente match e part-of.

In pratica se il Design 1 usa un approccio 1:1, confrontando ogni possibile coppia di attività, questo design ne sfrutta uno 1:n, confrontando un'attività con le altre attività contenute in un frammento di processo.

Il contenuto di una row è uguale a quello del design 4

Sia il numero di row che i costi sono uguali a quelli del design 4.

In [21]:
print_task_cost("Design7-Test-100", judgments_per_row = 3, rows = 100, rows_per_page = 5, price_per_page = 0.1)

[1mDesign7-Test-100[0m costs with:
  3 judgments per row
  20 pages of work
    from 100 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $7.65[0m
[1mEstimated job cost: $9.18[0m


In [22]:
print_task_cost("Design7-Full-271", judgments_per_row = 3, rows = 271, rows_per_page = 5, price_per_page = 0.1)

[1mDesign7-Full-271[0m costs with:
  3 judgments per row
  55 pages of work
    from 271 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $21.04[0m
[1mEstimated job cost: $25.24[0m


In [23]:
print_task_cost("Design7-Full-271+92", judgments_per_row = 3, rows = 271+92, rows_per_page = 5, price_per_page = 0.1)

[1mDesign7-Full-271+92[0m costs with:
  3 judgments per row
  73 pages of work
    from 363 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $27.92[0m
[1mEstimated job cost: $33.51[0m


## Design 8 -  Search With Context

Questo design è una via di mezzo tra il design 4 ed il design 7.

Oltre alla semplice query testuale viene anche mostrato il contesto (immagine del frammento di processo) 
dell'attività da cui è estratta la query.

Allo stesso tempo il numero ed il tipo di domande è in linea con le domande presenti nel design 4.

Il contenuto di una row è uguale a quello del design 4

Sia il numero di row che i costi sono uguali a quelli del design 4.

In [24]:
print_task_cost("Design8-Test-100", judgments_per_row = 3, rows = 100, rows_per_page = 5, price_per_page = 0.1)

[1mDesign8-Test-100[0m costs with:
  3 judgments per row
  20 pages of work
    from 100 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $7.65[0m
[1mEstimated job cost: $9.18[0m


In [25]:
print_task_cost("Design8-Full-271+92", judgments_per_row = 3, rows = 271+92, rows_per_page = 5, price_per_page = 0.1)

[1mDesign8-Full-271+92[0m costs with:
  3 judgments per row
  73 pages of work
    from 363 rows and 5 rows per page
    no test questions used
  $0.1 price per page ($0.02 price per row)
  Default fees and no buffer
[1mContributors cost:  $27.92[0m
[1mEstimated job cost: $33.51[0m
