# CS 5A Lab 2 Concepts

This notebook is here for you to explore and practice 
various operations on tables.  It uses the file `data/courses.csv`.

You don't need to turn this in; it's just here as a reference for
you to practice and learn from.

You are encouraged to go through the notebook and run every cell.

As you do, try predict *before* you press shift-enter, what the cell
will compute.  When you are right, it shows that you are learning how to
do data science with the `datascience` tools!

When you are wrong, it shows that there is still something you need to learn,
so make a note of it.

You can also use this as a reference for hints in solving the lab problems.

In [None]:
%load_ext jupyter_ai_magics

In [1]:
from datascience import *
import numpy as np

## datascience table

In [3]:
courses = Table().read_table("data/courses.csv")
courses

Course Number,Course Title,Instructor,Days,Size
5,Intro to Data Science,Avani Tanna,MW,175
8,Intro to Computer Science,Diba Mirza,TR,218
9,Intermediate Python,Richert Wang,TR,150
16,Problem Solving I,Ziad Matni,TR,150
24,Problem Solving II,Maryam Majedi,TR,150
32,Object Oriented Design,Nabeel Nasir,MW,155
40,Foundations of Computer Science,Maryam Majedi,MW,145


In [4]:
type(courses)

datascience.tables.Table

## Lots of things you can do with tables

In [5]:
courses.show(3)

Course Number,Course Title,Instructor,Days,Size
5,Intro to Data Science,Avani Tanna,MW,175
8,Intro to Computer Science,Diba Mirza,TR,218
9,Intermediate Python,Richert Wang,TR,150


In [6]:
courses.sort("Size", descending=False)

Course Number,Course Title,Instructor,Days,Size
40,Foundations of Computer Science,Maryam Majedi,MW,145
9,Intermediate Python,Richert Wang,TR,150
16,Problem Solving I,Ziad Matni,TR,150
24,Problem Solving II,Maryam Majedi,TR,150
32,Object Oriented Design,Nabeel Nasir,MW,155
5,Intro to Data Science,Avani Tanna,MW,175
8,Intro to Computer Science,Diba Mirza,TR,218


In [7]:
courses.sort("Size", descending=True)

Course Number,Course Title,Instructor,Days,Size
8,Intro to Computer Science,Diba Mirza,TR,218
5,Intro to Data Science,Avani Tanna,MW,175
32,Object Oriented Design,Nabeel Nasir,MW,155
9,Intermediate Python,Richert Wang,TR,150
16,Problem Solving I,Ziad Matni,TR,150
24,Problem Solving II,Maryam Majedi,TR,150
40,Foundations of Computer Science,Maryam Majedi,MW,145


In [8]:
courses.labels

('Course Number', 'Course Title', 'Instructor', 'Days', 'Size')

In [9]:
courses.num_rows

7

In [10]:
courses.num_columns

5

### Look at columns

In [11]:
# one column
courses.select("Course Title")

Course Title
Intro to Data Science
Intro to Computer Science
Intermediate Python
Problem Solving I
Problem Solving II
Object Oriented Design
Foundations of Computer Science


In [12]:
# multiple columns
courses.select("Course Title", "Instructor")

Course Title,Instructor
Intro to Data Science,Avani Tanna
Intro to Computer Science,Diba Mirza
Intermediate Python,Richert Wang
Problem Solving I,Ziad Matni
Problem Solving II,Maryam Majedi
Object Oriented Design,Nabeel Nasir
Foundations of Computer Science,Maryam Majedi


### Look at rows

In [13]:
# one row
courses.take(0)

Course Number,Course Title,Instructor,Days,Size
5,Intro to Data Science,Avani Tanna,MW,175


In [14]:
# multiple rows
courses.take(2, 1)

Course Number,Course Title,Instructor,Days,Size
9,Intermediate Python,Richert Wang,TR,150
8,Intro to Computer Science,Diba Mirza,TR,218


### Get column (as table) vs get column (as numpy array)

In [15]:
# type datascience.tables.Table
courses.select("Course Title")

Course Title
Intro to Data Science
Intro to Computer Science
Intermediate Python
Problem Solving I
Problem Solving II
Object Oriented Design
Foundations of Computer Science


In [16]:
# type numpy.ndarray
courses.column("Course Title")

array(['Intro to Data Science', 'Intro to Computer Science',
       'Intermediate Python', 'Problem Solving I', 'Problem Solving II',
       'Object Oriented Design', 'Foundations of Computer Science'],
      dtype='<U31')

### Look at a cell - combo of column and row

In [17]:
# take from a table gives a table
courses.select("Course Number").take(1)

Course Number
8


In [18]:
# take from a numpy array gives a number
courses.column("Course Number").take(1)

8

### Filtering

In [19]:
courses

Course Number,Course Title,Instructor,Days,Size
5,Intro to Data Science,Avani Tanna,MW,175
8,Intro to Computer Science,Diba Mirza,TR,218
9,Intermediate Python,Richert Wang,TR,150
16,Problem Solving I,Ziad Matni,TR,150
24,Problem Solving II,Maryam Majedi,TR,150
32,Object Oriented Design,Nabeel Nasir,MW,155
40,Foundations of Computer Science,Maryam Majedi,MW,145


In [20]:
courses.where("Course Number", 24)

Course Number,Course Title,Instructor,Days,Size
24,Problem Solving II,Maryam Majedi,TR,150


In [21]:
courses.where("Course Number", are.below(24))

Course Number,Course Title,Instructor,Days,Size
5,Intro to Data Science,Avani Tanna,MW,175
8,Intro to Computer Science,Diba Mirza,TR,218
9,Intermediate Python,Richert Wang,TR,150
16,Problem Solving I,Ziad Matni,TR,150


In [22]:
courses.where("Course Number", are.below_or_equal_to(24))

Course Number,Course Title,Instructor,Days,Size
5,Intro to Data Science,Avani Tanna,MW,175
8,Intro to Computer Science,Diba Mirza,TR,218
9,Intermediate Python,Richert Wang,TR,150
16,Problem Solving I,Ziad Matni,TR,150
24,Problem Solving II,Maryam Majedi,TR,150


### Grouping

In [23]:
courses.group("Days")

Days,count
MW,3
TR,4


### Chaining functions

### What if I want courses below or equal to 24 and only on Mondays and Wednesdays?

In [24]:
courses.where("Course Number", are.below_or_equal_to(24)).where("Days", "MW")

Course Number,Course Title,Instructor,Days,Size
5,Intro to Data Science,Avani Tanna,MW,175


#### You can chain almost anything together...

### What if I want to number the courses 1, 2, 3, ...

In [25]:
courses

Course Number,Course Title,Instructor,Days,Size
5,Intro to Data Science,Avani Tanna,MW,175
8,Intro to Computer Science,Diba Mirza,TR,218
9,Intermediate Python,Richert Wang,TR,150
16,Problem Solving I,Ziad Matni,TR,150
24,Problem Solving II,Maryam Majedi,TR,150
32,Object Oriented Design,Nabeel Nasir,MW,155
40,Foundations of Computer Science,Maryam Majedi,MW,145


In [26]:
ordered_column = np.arange(1, 8)
ordered_column

array([1, 2, 3, 4, 5, 6, 7])

In [27]:
courses = courses.with_columns("My Course Number", ordered_column) 
courses

Course Number,Course Title,Instructor,Days,Size,My Course Number
5,Intro to Data Science,Avani Tanna,MW,175,1
8,Intro to Computer Science,Diba Mirza,TR,218,2
9,Intermediate Python,Richert Wang,TR,150,3
16,Problem Solving I,Ziad Matni,TR,150,4
24,Problem Solving II,Maryam Majedi,TR,150,5
32,Object Oriented Design,Nabeel Nasir,MW,155,6
40,Foundations of Computer Science,Maryam Majedi,MW,145,7


### Useful fact for challenge question

In [28]:
# you can multiply numpy arrays by constants, and add them

In [29]:
ordered_column

array([1, 2, 3, 4, 5, 6, 7])

In [30]:
two_times = 2*ordered_column
two_times

array([ 2,  4,  6,  8, 10, 12, 14])

In [31]:
ordered_column + two_times

array([ 3,  6,  9, 12, 15, 18, 21])