In [77]:
# Initialize Otter
import otter
grader = otter.Notebook("Lab_2_functions.ipynb")

# Lab 2: Statistical analysis of data using numpy

Lab slides: https://docs.google.com/presentation/d/1ykwwcQ0onMvAjUxfJmKl9tbo-rJPdB5pRwDEmpDsd-g/edit?usp=sharing

For this lab the goal is to write functions to pull out one data channel and print out statistics for failed versus successful picks. This will involve using your function from the lecture activity to get just the data you want, then another function to do the statistics (essentially the **calc_stats** function from the lecture activity). 

Written properly, you only need one function to do stats for the entire data channel, just the succesful ones, or just the unsuccessful ones. For any data channel. In the homework you'll use these functions to do this for all of the data and write it back out.

In [78]:
# Libraries that we need to import - numpy and json (for loading the description file)
import numpy as np
import json as json

### Reading in data

TODO Copy over code to do the following:
- read in the numerical data and pull out the numerical data and the successful/unsuccessful data
- create a boolean index variable for the successful picks

Note: For this lab I'm going to "hard-wire" all of the numbers (number of time steps, number of data dimensions, etc) to make testing easier. When you move this code over to the homework you'll replace all of the hard-wired numbers with the variables you're calculating in homework 1.


In [79]:
all_data = np.loadtxt('Data/proxy_pick_data.csv',dtype=float,delimiter=",")
pick_data = np.loadtxt('Data/proxy_pick_data.csv',dtype=float,delimiter=",")
pick_channel_data =pick_data[:,:-1]
pick_successful = pick_data[:,-1]
b_successful = np.array(pick_successful,dtype = bool)

# Hard-wiring these values for the testing code
n_timesteps = 40
n_picks = 660
n_total_dims = 33


In [80]:
grader.check("get_data")

## Doing the slice

Get the data for one of the channels. 

TODO: Copy over your function **get_channel_data** from lecture activity 2. Note: if your code did not handle doing 1, 2, or 3 dimensions, now you'll need it to. 


In [81]:
# This reads in the json data
try:
    with open("Data/proxy_data_description.json", "r") as fp:
        pick_data_description = json.load(fp)
except FileNotFoundError:
    print(f"The file was not found; check that the data directory is in the current one and the file is in that directory")


In [82]:
# TODO: Copy get_channel_data over to here
def get_channel_data(all_data, n_picks, start_index, n_time_steps, n_total_dims, n_dims):
    """ Get the data for just one channel (eg, wrist torque)
    @param all_data - the pick_channel_data numpy array
    @param n_picks - number of picks (number of rows in all_data)
    @param start_index - where to start getting data from 
    @param n_time_steps - number of time steps
    @param n_total_dims - what the skip value is - the total number of channels
    @param n_dims - total number of dimensions to use (1, 2, or 3)
    @return Return array should be n_picks X (n_timesteps * n_dims)"""

    # TODO Your slice code goes here. Note that I kept most of the variable names the same, so you should only have
    #  to change the wrist torque specific ones
   
    channel_data = np.zeros((n_picks, n_time_steps * n_dims))
    channel_data[0:n_picks,0:n_time_steps] = pick_channel_data[:, start_index::n_total_dims]
    if n_dims >= 2:
        channel_data[:, 1::3] = pick_channel_data[:, start_index+1::n_total_dims]
        channel_data[:, 2::3] = pick_channel_data[:, start_index+2::n_total_dims]

# this should be changed to a for loop but is working currently

    return channel_data

In [83]:
# Test 1 - the wrist torque data using hard-wired values
wrist_torque_start_index = 3
n_dims_wrist_torque = 3

wrist_torque_data = get_channel_data(pick_channel_data, 
                                     n_picks=n_picks, 
                                     start_index=wrist_torque_start_index,
                                     n_time_steps=n_timesteps,
                                     n_total_dims=n_total_dims,
                                     n_dims=n_dims_wrist_torque)

In [84]:
# SELF TESTS
# Feel free to copy over the asserts from lecture activity 2 to debug the above

In [85]:
# Tests for Motor effort finger 1
motor_effort_f1_start_index = 14
n_dims_motor_effort_f1 = 1
motor_effort_f1_data = get_channel_data(pick_channel_data, 
                                        n_picks=n_picks, 
                                        start_index=motor_effort_f1_start_index,
                                        n_time_steps=n_timesteps,
                                        n_total_dims=n_total_dims,
                                        n_dims=n_dims_motor_effort_f1)

In [86]:
# Check size and first, last element
assert(motor_effort_f1_data.shape == (n_picks, n_timesteps * n_dims_motor_effort_f1))
assert(np.isclose(motor_effort_f1_data[0, 0], 0.0))
assert(np.isclose(motor_effort_f1_data[-1, -1], 51.11))

In [87]:
print(pick_channel_data.shape[0])
grader.check("check_slice")

660


## Compute stats: Write a function to calculate the four stats

This is a variation on what you did in lab 1; in this case, we're going to do it with two functions. The first calculates the stats and returns the dictionary (**calc_stats**) the second does the **for** loop to make one dictionary for each dimension in the data.

- Step 1 [this problem] - do the **calc_stats** function
- Step 2 [next problem] - do the loop to calculate the stats for each x,y,z channel

In [88]:
def calc_stats(data):
    """Calculate min, max, mean and standard deviation for the array and put in a dictionary
    @param data a numpy array
    @return a dictionary"""

    # Use keys Min, Max, Mean, and SD
    my_dict = {"Min" : np.min(data),
              "Max" : np.max(data),
              "Mean" : np.mean(data), 
              "SD": np.std(data) }
    return my_dict

In [89]:
# Test the function with known data
test_data = np.linspace(0, 1, 10)
ret_dict = calc_stats(test_data)

assert(np.isclose(ret_dict["Min"], 0.0))
assert(np.isclose(ret_dict["Max"], 1.0))
assert(np.isclose(ret_dict["Mean"], 0.5))
assert(np.isclose(ret_dict["SD"], 0.319, atol=0.01))

In [90]:
grader.check("stats_channel")

### Now do the second half - 

This function calculates the stats for an entire channel of the data, and stores the result in a list of dictionaries

In [91]:
def calc_stats_for_channel(data, n_dims):
    """ Calculate the stats for a channel
    @param data - an n_picks X n_timesteps * n_dims size rray
    @param n_dims - 1, 2, or 3 (just x, or x,y, and z)
    @return A list of dictionaries. The list is the lenght of n_dims"""

    stats_list = []
    # TODO Copy in your for loop from the statistics problem in Lab 1
    # - You do NOT need to get the data out from pick data - it's done for you
    # - You DO need to slice the data into the x,y,z channels
    # - You need to loop n_dims times
    # - Don't forget to return the array
    x_slice = data[:, 0::n_dims]
    if n_dims >= 2:
        y_slice = data[:, 1::n_dims]
        z_slice = data[:, 2::n_dims]

        print(x_slice.shape)

    if n_dims == 1:
        all_slices = np.array([x_slice])
    else:
        all_slices = np.array([x_slice,y_slice,z_slice])
    print(all_slices.shape)

    for i in range(all_slices.shape[0]):
        my_dict = {"Min" : np.min(all_slices[i,:,:]),
                "Max" : np.max(all_slices[i,:,:]),
                "Mean" : np.mean(all_slices[i,:,:]),
                "SD": np.std(all_slices[i,:,:]) }
        stats_list.append(my_dict)
    print(stats_list)
    return stats_list



In [92]:
# SCRATCH CELL
# If you're having trouble, try setting n_dims to 1 and use test_data for the data input

In [93]:
# Testing with known data - make a fake data set with 5 picks, 4 time steps, and x, y, z data
#  
test_stats = np.zeros((5, 4 * 3))
# Set the x data to be ones
test_stats[:, 0::3] = np.ones((5, 4))
# Set the y data to be twos
test_stats[:, 1::3] = np.ones((5, 4)) * 2
# Set the z data to be threes
test_stats[:, 2::3] = np.ones((5, 4)) * 3

# Now get the actual stats
ret_stats_array = calc_stats_for_channel(test_stats, n_dims=3)

# Check the mean result for x, y, and z - should be 1, 2, and 3 respectively
assert(ret_stats_array[0]["Mean"] == 1.0)
assert(ret_stats_array[1]["Mean"] == 2.0)
assert(ret_stats_array[2]["Mean"] == 3.0)

(5, 4)
(3, 5, 4)
[{'Min': 1.0, 'Max': 1.0, 'Mean': 1.0, 'SD': 0.0}, {'Min': 2.0, 'Max': 2.0, 'Mean': 2.0, 'SD': 0.0}, {'Min': 3.0, 'Max': 3.0, 'Mean': 3.0, 'SD': 0.0}]


In [94]:
# this should work - you can check the result against the values in Data/HW1_check_results.json
ret_stats_wrist_torque = calc_stats_for_channel(wrist_torque_data, n_dims_wrist_torque)

(660, 40)
(3, 660, 40)
[{'Min': -0.995878292, 'Max': 1.070451089, 'Mean': -0.08034154168640152, 'SD': 0.16572799217195466}, {'Min': -1.24642742, 'Max': 0.607428456, 'Mean': -0.08235645442053031, 'SD': 0.1825844421076967}, {'Min': -0.62552044, 'Max': 0.340460618, 'Mean': 0.01800652827314394, 'SD': 0.12296693121622884}]


In [95]:
# As should this
res_stats_motor_effort_f1 = calc_stats_for_channel(motor_effort_f1_data, n_dims_motor_effort_f1)

(1, 660, 40)
[{'Min': -330.8699951, 'Max': 174.8500061, 'Mean': 35.32253659580265, 'SD': 33.617106097552536}]


In [96]:
grader.check("loop_data_calc_stats")

## Boolean slicing to get successful versus unsuccessful statistics out

Use the functions you just wrote - plus the boolean index you made at the beginning - to get out the min and max z values for successful versus unsuccessful picks. 

For this problem I have written code that is *incorrect*. You know the functions themselves are correct - you just tested them. The following bits of code have something wrong with either the way the function is called OR with the way the results are gotten back.


In [97]:
# Use b_successful to pick out the rows that are successful. Send all column data for the selected rows.
#   Wrist torque data has 3 dimensions (x,y,z)
#   There's two errors here - one that actually will create incorrect results, one that just *happens* to work
#   correctly, although it doesn't do what the first sentance says...
ret_wrist_torque_successful = calc_stats_for_channel(wrist_torque_data[b_successful,:], n_dims=3)

print(wrist_torque_data.shape)
print(b_successful.shape)

# The minimum should be in the third (last) element in the list, the "min" key
z_min_successful = ret_wrist_torque_successful[2]["Min"]
z_max_successful = ret_wrist_torque_successful[2]["Max"]

# Use b_successful NOT true to pick out the picks that are successful.
#  This generates a weird error - it's because not does not work over a numpy array. Try b_successful == False instead
ret_wrist_torque_unsuccessful = calc_stats_for_channel(wrist_torque_data[b_successful==False,:], n_dims=3)

# The minimum should be in the third (last) element in the list, the "min" key
z_min_unsuccessful = ret_wrist_torque_unsuccessful[2]["Min"]
# Why copying and pasting and changing variable names can cause problems...
z_max_unsuccessful = ret_wrist_torque_unsuccessful[2]["Max"]



print(f"Successful: Minimum {ret_wrist_torque_successful} and maximum {z_max_successful} value of wrist torque z channel")
print(f"Unsuccessful: Minimum {z_min_unsuccessful} and maximum {z_max_unsuccessful} value of wrist torque z channel")

(355, 40)
(3, 355, 40)
[{'Min': -0.995878292, 'Max': 0.632657241, 'Mean': -0.07780834972429576, 'SD': 0.15833816521660998}, {'Min': -1.24642742, 'Max': 0.607428456, 'Mean': -0.0796942166728169, 'SD': 0.1669326700600358}, {'Min': -0.293665094, 'Max': 0.340460618, 'Mean': 0.010519443599788734, 'SD': 0.12105133264820692}]
(660, 120)
(660,)
(305, 40)
(3, 305, 40)
[{'Min': -0.780959042, 'Max': 1.070451089, 'Mean': -0.08329001101934426, 'SD': 0.17388785665493053}, {'Min': -0.701354968, 'Max': 0.506712789, 'Mean': -0.08545512458590164, 'SD': 0.19921496676357384}, {'Min': -0.62552044, 'Max': 0.326538637, 'Mean': 0.026721003876557375, 'SD': 0.12459433689364202}]
Successful: Minimum [{'Min': -0.995878292, 'Max': 0.632657241, 'Mean': -0.07780834972429576, 'SD': 0.15833816521660998}, {'Min': -1.24642742, 'Max': 0.607428456, 'Mean': -0.0796942166728169, 'SD': 0.1669326700600358}, {'Min': -0.293665094, 'Max': 0.340460618, 'Mean': 0.010519443599788734, 'SD': 0.12105133264820692}] and maximum 0.340460

In [98]:
grader.check("boolean_slicing")

## Optional/Extra credit: print out all of the indices where the maximum value for the successful pick was reached

See the tutorial on **np.where**

TODO: Use **np.where** to pick out the row, col pair that has the maximum z value of the successful pick

Partial credit for picking out any row, col that has the maximum z value in **wrist_torque_data**, full extra credit for only printing out the row, col indices of successful picks with that *z* value.

In [99]:
# Use np.where to get out the indices. You can use == OR np.isclose() here; either works. In general, use .isclose for 
#  floating point comparisons.
# Append the row number of any matches to this list
all_rows_with_max = []


# Look at JUST the z values in wrist_torque_data

for c in range (len(ret_wrist_torque_successful.columns)):
        for r in range (len(ret_wrist_torque_successful)):
            if ret_wrist_torque_successful[r,c]==ret_wrist_torque_successful[2]["Max"]:
                print(f"row: {r}, time step:{c}")
# Pseudo code - see tutorial for exact format
# for all row, column in all_indices_from_where
#.   if this is row is successful 
#.      print(f"Row: {r}, Time step: {c}")



AttributeError: 'list' object has no attribute 'columns'

In [None]:
grader.check("optional_where")

## Hours and collaborators
Required for every assignment - fill out before you hand-in.

Listing names and websites helps you to document who you worked with and what internet help you received in the case of any plagiarism issues. You should list names of anyone (in class or not) who has substantially helped you with an assignment - or anyone you have *helped*. You do not need to list TAs.

Listing hours helps us track if the assignments are too long.

In [100]:

# List of names (creates a set)
worked_with_names = {"none"}
# List of URLS TCW3 (creates a set)
websites = {"googled 'how to apply a 1d boolean to a 2d array'"}
# Approximate number of hours, including lab/in-class time
hours = 1.5

In [101]:
grader.check("hours_collaborators")

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Submit just the .ipynb file to Gradescope (Lab 2 functions). You do not need to submit the data files. Don't change the provided variable names or autograding will fail. Look at the Gradescope grading rubric for code-quality checks.

In [102]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False, run_tests=True)

Running your submission against local test cases...




RuntimeError: c:\Users\yeasshhhh\anaconda3\Lib\site-packages\zmq\_future.py:679: RuntimeWarning: Proactor event loop does not implement add_reader family of methods required for zmq. Registering an additional selector thread for add_reader support via tornado. Use `asyncio.set_event_loop_policy(WindowsSelectorEventLoopPolicy())` to avoid this warning.
  self._get_loop()
