## Correlation Game

The following code generates random dots in a square.

Partners take turns in filling 'X' and 'Y' in the second cell. 

Your job is to guess how well the data is correlated by Pearson's correlation coefficient.

In each iteration :
    1. Fill in 'X' and 'Y' in 'Code Cell 1'.
    2. Run 'Code Cell 1' and observe the plot.
    3. Each write down your guess for Pearson's correlation coefficient.
    4. Run 'Code Cell 2' to get the actual value and write this down.
    5. Now compare the difference between your guesses and actual score. The closest player gets a point.
    6. Repeat steps 1-5 to play best of five rounds.
    7. The partner with the most points wins.

Code Cell 0:

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
import scipy as sp

import random as rn

from scipy.stats.stats import pearsonr
from scipy.stats.stats import spearmanr

# This line makes the plots display directly in the notebook, rather than in a separate window
%matplotlib inline

Code Cell 1:

In [None]:
# We create a two dimensional array of your chosen numbers. 
number_grid = pd.DataFrame()

# RANDOMISING THE COORDINATES:

# How many coordinates shall we generate...
# ... and what should the max and min values be?

number_of_coords = 5
min_val = 0
max_val = 100

# Here I have generated the values using comprehensions.
# I have used the rn.random function, which gives a (float) value between 0 and 1
# You could also use rn.randint, as we saw last week.

x_vals = [min_val + max_val*rn.random() for i in range(number_of_coords)]
y_vals = [min_val + max_val*rn.random() for i in range(number_of_coords)]

number_grid['X']= x_vals
number_grid['Y']= y_vals

number_grid.plot(x='X',y='Y',kind='scatter')

In [None]:
# STORING GUESSES:

number_of_players = 2 # This code would work for any number of players.

guesses_DF = pd.DataFrame()

player_guesses = [ 0.5 , 0.7] # Replace these with your own guesses.

guesses_DF['Player'] = range(1,number_of_players + 1)
guesses_DF['Guess'] = player_guesses

guesses_DF

Code Cell 2:

In [None]:
Y = number_grid['Y']
X = number_grid['X']

# The following function computes the best fitting regression line and basic statistics.
slope, intercept, r_value, p_value, std_err = sp.stats.linregress(X, Y) 

plt.plot(X,Y,'b.') # Plot the data.
plt.plot(X, X*slope + intercept, 'r') # Plot the regression line. 

plt.xlim([min(X)-5,max(X)+5]) 
plt.ylim([min(Y)-5,max(Y)+5]) 

# SWITCH TO SPEARMAN:

# We will need to store the correlation to check who has won:
corr_coeff = spearmanr(X,Y)[0]

print "Spearman correlation : ", corr_coeff


In [None]:
# CALCULATING THE DIFFERENCES

# Add a column reporting the differences between guess and solution.

guesses_DF['Difference'] = abs(guesses_DF['Guess'] - corr_coeff)

# You could even sort the dataframe from best guess to worst:

guesses_DF.sort(columns='Difference')


## Exercise 2.1

Adapt the code so that:

a) The coordinates are randomised.

b) Spearman's rank correlation coefficient is used instead of the Pearson coefficient.

## Exercise 2.2

Adapt the code so that:

a) You can type in your guesses to be stored by the program.

b) The differences between each player's guess and the solution are calculated and displayed.

___

HINT:

The function abs( ) returns the absolute value of a number.

i.e. It gives the size of a number without considering + or -.

e.g. abs(7) = 7, abs(-7) = 7

abs( ) can be useful for getting the positive difference between two numbers.

___
