<a href="https://colab.research.google.com/github/tmckim/materials-sp24-colab/blob/main/lec_demos/lec05.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Before you start - Save this notebook!

When you open a new Colab notebook from the WebCampus (like you hopefully did for this one), you cannot save changes. So it's  best to store the Colab notebook in your personal drive `"File > Save a copy in drive..."` **before** you do anything else.

The file will open in a new tab in your web browser, and it is automatically named something like: "**Copy of lec05.ipynb**". You can rename this to just the title of the assignment "**lec05.ipynb**". Make sure you do keep an informative name (like the name of the assignment) so that you know which files to submit back to WebCampus for grading! More instructions on this are at the end of the notebook.


**Where does the notebook get saved in Google Drive?**

By default, the notebook will be copied to a folder called “Colab Notebooks” at the root (home directory) of your Google Drive. If you use this for other courses or personal code notebooks, I recommend creating a folder for this course and then moving the assignments AFTER you have completed them. <br>

I also recommend you give the folder where you save your notebooks^ a different name than the folder we create below that will store the notebook resources you need each time you work through a course notebook. This includes any data files you will need, links to the images that appear in the notebook, and the files associated with the autograder for answer checking.<br>
You should select a name other than '**NS499-DataSci-course-materials**'. <br>
This folder gets overwritten with each assignment you work on in the course, so you should **NOT** store your notebooks in this folder that we use for course materials! <br><br>For example, you could create a folder called 'NS499-**notebooks**' or something along those lines. 

__________

### We will now do the setup steps as separate cells to help with issues finding files in google drive/colab. <br> If you restart colab, you must rerun all **5** steps in each of these cells!

In [None]:
# Step 1
# Setup and add files needed to access gdrive
from google.colab import drive                                   # these lines mount your gdrive to access the files we import below
drive.mount('/content/gdrive', force_remount=True)

In [None]:
# Step 2
# Change directory to the correct location in gdrive (modified way to do this from before)
import os
os.chdir('/content/gdrive/MyDrive/NS499-DataSci-course-materials/')

In [None]:
# Step 3
# Remove the files that were previously there- we will replace with all the old + new ones for this assignment
!rm -r materials-sp24-colab                                        

In [None]:
# Step 4
# These lines clone (copy) all the files you will need from where I store the code+data for the course (github)
# Second part of the code copies the files to this location and folder in your own gdrive
!git clone https://github.com/tmckim/materials-sp24-colab '/content/gdrive/My Drive/NS499-DataSci-course-materials/materials-sp24-colab/'

In [None]:
# Step 5
# Change directory into the folder where the resources for this assignment are stored in gdrive (modified way from before)
os.chdir('/content/gdrive/MyDrive/NS499-DataSci-course-materials/materials-sp24-colab/lec_demos/')

In [None]:
# Import packages and other things needed
# Don't change this cell; Just run this cell
# If you restart colab, make sure to run this cell again after the first ones above^

from datascience import *
import numpy as np

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use("fivethirtyeight")

#### Today's Lecture

In today's lecture, you'll learn how to:

1. review and work with arrays
2. access elements in arrays
3. learn how to use `np.arange`
4. work with data from tables
5. create tables from scratch (compared to loading in tables from csv files)

## Arrays ##

In [None]:
# An array of 4 numbers
first_four = make_array(1, 2, 3, 4)

In [None]:
# Show the array
first_four

In [None]:
# Add to the array
first_four + 1

In [None]:
# The original array is unchanged, just like when we call show/select/drop on a Table
first_four

In [None]:
# A new array
next_four = make_array(5, 6, 7, 8)

In [None]:
# A third array
only_three = make_array(5, 6, 7)

In [None]:
# This line will cause an error - why?
first_four + only_three

In [None]:
# Show this array again
next_four

In [None]:
# Adding the array of numbers
sum(next_four)

In [None]:
# Another way to do this
next_four.sum()

In [None]:
# Using numpy to average the array
np.average(next_four)

In [None]:
# Using numpy to take the mean of the array
np.mean(next_four)

In [None]:
# Use the len function to determine the length
len(next_four)

In [None]:
# Arrays also have a member variable array_name.size that contains the size of the array
# Note, size here isn't a function so it doesn't require () after it
next_four.size

### Accessing Elements

In [None]:
# Show array again as reminder
next_four

In [None]:
# How to index an array
next_four.item(0)

In [None]:
# What item are we indexing here?
next_four.item(1)

**Bonus!** This is called **array indexing**. There is a shorter "equivalent" syntax that people often use. However, for this class you only need to know about `.item()`.<br><br>
`array[ INDEX ]`

In [None]:
# Shorter and more compact
next_four[1]

## Ranges ##

We use ranges to make arrays of number sequences easily. The numpy `np.arange(start, stop, step)` function produces an array starting at `start` and ending *before* `stop`, in increments of `step`.

In [None]:
# how we previously learned to make an array
make_array(0, 1, 2, 3, 4, 5, 6)

In [None]:
# Use numpy to make a range- what numbers will we get?
np.arange(0,7,1)

In [None]:
# Can we write it shorter? Default setting for step size
np.arange(0,7)

In [None]:
# Even shorter- assumes we start from 0
np.arange(7)

In [None]:
# Anything else yoou want me to try?
np.arange(0, 20, 2)

## Columns of Tables are Arrays ##

### A more interesting table

In [None]:
# Originally from https://github.com/erikgregorywebb/datasets/blob/master/nba-salaries.csv
# salary is in millions of dollars
# Create a table named nba from the data file using the Table method and read_table function
nba = Table.read_table('nba_salaries.csv')
nba

In [None]:
# Create a table called point_guards by finding where the position column is labeled PG (point guard) during the 2020 season
point_guards = nba.where('position', 'PG').where('season', 2020)

In [None]:
# Show the table
point_guards

In [None]:
# Drop (remove) the position column
point_guards.drop('position')

In [None]:
# Show the table
point_guards

In [None]:
# Further remove columns from the table
point_guards = point_guards.drop('rank', 'position', 'season')

In [None]:
# Show the first 10 rows of the table
point_guards.show(10)

In [None]:
# Combine table functions: first sort by salary and then show the first 10 rows
# What is the default sorting direction? Hint: what is the default for the optional argument, descending?
point_guards.sort('salary').show(10)

In [None]:
# The line below will cause a value error- order of combining methods matters!
nba.drop('position').where('position', 'PG')

In [None]:
# Create a table of only the data from the 2020 season
nba = Table.read_table('nba_salaries.csv').where('season', 2020).drop('season')
nba.show(5)

In [None]:
# Create a table of Golden State Warriors data
warriors = nba.where('team', 'Golden State Warriors')
warriors.show(5)

In [None]:
# Show the salary column
warriors.select('salary')

In [None]:
# What type is the salary column?
type(warriors.select('salary'))

In [None]:
# Access the numbers in the salary column
warriors.column('salary')

In [None]:
# Average the numbers in the salary column
np.average(warriors.column('salary'))

In [None]:
# Another way to do this
warriors.column('salary').mean()

In [None]:
# Why doesn't this work?
np.average(warriors.select('salary'))

In [None]:
# Create a table with Phoenix Suns data
suns = nba.where('team', 'Phoenix Suns')

In [None]:
# Subtract averages from the salary columns to compare the two teams
np.average(warriors.column('salary')) - np.average(suns.column('salary'))

In [None]:
# How many rows in the warrior table?
warriors.num_rows

In [None]:
# How many rows in the suns table?
suns.num_rows

# Ways to Create a Table #

## Creating a Table from Scratch ##

In [None]:
# Make an array of streets
streets = make_array('Virginia', 'Evans', 'Sierra', 'McCarran')
streets

In [None]:
# Create an empty table
empty_table = Table()
empty_table

In [None]:
# Create a table using the array we made above (streets)
campus_streets = Table().with_column('Streets', streets)
campus_streets

In [None]:
# Add a column to the table
campus_streets.with_column('Blocks from campus', np.arange(4))
campus_streets

In [None]:
# Add a column to the table- what's the difference from the above code?
campus_streets = campus_streets.with_column('Blocks from campus', np.arange(4))
campus_streets

In [None]:
# Create a table with multiple columns in one step
Table().with_columns(
    'Streets', streets,
    'Blocks from campus', np.arange(4)
)

### Saving
Remember to save your notebook before closing.
Choose **Save** (and make sure you've already saved a copy in your drive) from the **File** menu.