In [1]:
# Click into this cell and press shift-enter before using this notebook.
# This line loads the ability to use %%ai in your file
%load_ext jupyter_ai_magics
# These lines import the Python modules we commonly use in CMPSC 5A
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots

# Lecture 4, CMPSC 5A, F25 (Tue 10/07)

# Notes to self 
* Check font size
* Check that you are sharing the screen on the zoom session
* Ask staff to help monitor the zoom chat
* Remind students to run the top cell

## Where are we in the reading?

For today, I asked you to read:
* [Chapter 1: What is Data Science](https://inferentialthinking.com/chapters/01/what-is-data-science.html)
* [Chapter 3: Programming in Python](https://inferentialthinking.com/chapters/03/programming-in-python.html)

For today, you were asked to read:
* [Chapter 2 (Causality and Experiments)](https://inferentialthinking.com/chapters/02/causality-and-experiments.html)
* [Section 4.1 (Numbers)](https://inferentialthinking.com/chapters/04/1/Numbers.html)
* [Section 4.2 (Strings)](https://inferentialthinking.com/chapters/04/2/Strings.html)

For this coming Thursday, please also read:
* [Section 4.3 (Comparisons)](https://inferentialthinking.com/chapters/04/3/Comparison.html)

So we'll have completed Chapters 1-4 by the end of the week.

For next week:
* [Chapter 5 (Sequences)](https://inferentialthinking.com/chapters/05/Sequences.html)

## Preparing for quiz1

By now almost all of you are registered for CMPSC 5A in PrarieTest at:
* <https://us.prairietest.com>

Please verify that you are by clicking that link now.

Most, but not all of you, have made an appointment at the testing center to take Quiz 1.  You do that at the same link:
* <https://us.prairietest.com>

You should see appointments for:
* Friday October 10: 9, 10:30, noon, 1:30, 3
* Monday October 13: 9 (full), 10:30, noon, 1:30, 3

Please make an appointment *right now* if you haven't yet.

## Practicing for Quiz 1

As previously announced, you can practice for Quiz 1 as many times as you like here.

* <https://us.prairielearn.com/pl/course_instance/194488/assessment/2596222>
* NOTE: the version above is *slightly different* in format from the one we sent out earlier, so please do check it out.

We are going to actually take another version of this quiz right now, in class, in PrarieLearn, though only for an ic/hwk grade.  
* Today's [ic04](https://us.prairielearn.com/pl/course_instance/194488/assessment/2595272) is worth about 1% of your course grade or less
* The quiz on Friday/Monday is worth about 2.5% of your course grade, and the one
  scheduled for Friday Oct 24, and Monday Oct 27 will be about the same.
* The midterm and final will each be worth about 15% of your course grade (each).

The content on this quiz is intentionally *super easy*.  You do need to understand the `**` operator and the `%` operator in Python, but apart from that, it's just basic arithmetic.    The point is to get used to the PrairieLearn environment, and the UCSB Testing Center while it's still *low stakes*.

Please use the opportunity to get used to the system while the content is still relatively easy.  It *will* get harder. 

## Section 3.4: Introduction To Tables  (Review from last time)


In your text, [Section 3.4: Introduction to Tables](https://inferentialthinking.com/chapters/03/4/Introduction_to_Tables.html) covers *Tables* which are fundamental to *this* course in Data Science.

### `Table()` is specific to `datascience`

Note that in this course we use the Python `datascience` library that was developed at UC Berkeley for the course *Data 8*, and goes along with the textbook we are using in this course.

In that library, the `Table()` data type is the main *abstraction* that is used for a table of data: it's something like a spreadsheet with rows and columns.

Other datascience libraries exist, and they use different libraries.  I'm not going to derail things too much by talking about those, but I'll briefly mention the other libraries and what they use instead of `Table()`

| Library        | Table Abstraction |
|----------------|-------------------|
| `datascience`  | `Table`           |
| `pandas`       | `DataFrame`       |
| `numpy`        | `ndarray`         |
| `polars`       | `DataFrame`       |
| `dask`         | `DataFrame`       |
| `pyspark`      | `DataFrame`       |

So, when we talk about a `Table()`, while that's a specific concept in the `datascience` library.  But it's also a pretty generic *idea*.

Let's look at how to create an empty Table. An empty table is useful because it can be extended to contain new rows and columns.

In [None]:
# create an empty Table
Table()

If you get a name error on Table(), remember that you have to shift-enter the
top cell in your notebook at the start of the session so that you run
the `from datascience import *` command.

Without that, `Table()` is undefined.



##### with_columns
The `with_columns` method on a table constructs a new table with additional labeled columns. 

In [None]:
# add one column
Table().with_columns('Number of petals', make_array(8, 34, 5))

In [None]:
# add two or more columns
Table().with_columns(
    'Number of petals', make_array(8, 34, 5),
    'Name', make_array('lotus', 'sunflower', 'rose')
)

We can give it a name and further extend it!

In [None]:
flowers = Table().with_columns(
    'Number of petals', make_array(8, 34, 5),
    'Name', make_array('lotus', 'sunflower', 'rose')
)

flowers.with_columns(
    'Color', make_array('pink', 'yellow', 'red')
)

Note that the code above created a new table with three columns and displayed it, but the original `flowers` variable remains *unchanged*

In [None]:
flowers

We can find the number of rows in `flowers` with the `.num_rows` property 



In [None]:
flowers.num_rows

Note that `num_rows` is technically a *property* and not a *method*, because we don't put `()` after the name `num_rows`.  See what happens if we try: 

In [None]:
flowers.num_rows()

Also note that `len(flowers)` gives us the number of column, not the number of rows.

In [None]:
len(flowers)

In [None]:
flowers.num_columns

#### read_table

If you have a CSV file of existing data, you can read that data into a `Table` object from the `datascience` library.

We will use the `Table` method `read_table` to read a CSV (comma-separated values) file below. 

In [None]:
movies = Table.read_table("data/movies_by_year_with_ticket_price.csv")
movies

Here's another CSV file from the UCSB Gaucho Sports Analytics club (remember they made a presentation last week?)

Note that one of the first questions we'll have about this data is: what do all of the fields mean?   It's pretty clear that each row represents one pitch in one game, and some of the other fields are clear too, but many are a little mysterious.  For some, we don't know that units (for example is the ZoneSpeed in miles per hour or some other unit? Are angles in degrees or radians?)

I'll try to find some of this out.

In [2]:
pitches = Table.read_table("data/UCSB_baseball_1_24_25.csv")
pitches

PitchNo,Date,Time,PAofInning,PitchofPA,Pitcher,PitcherId,PitcherThrows,PitcherTeam,Batter,BatterId,BatterSide,BatterTeam,PitcherSet,Inning,Top/Bottom,Outs,Balls,Strikes,TaggedPitchType,AutoPitchType,PitchCall,KorBB,TaggedHitType,PlayResult,OutsOnPlay,RunsScored,Notes,RelSpeed,VertRelAngle,HorzRelAngle,SpinRate,SpinAxis,Tilt,RelHeight,RelSide,Extension,VertBreak,InducedVertBreak,HorzBreak,PlateLocHeight,PlateLocSide,ZoneSpeed,VertApprAngle,HorzApprAngle,ZoneTime,ExitSpeed,Angle,Direction,HitSpinRate,PositionAt110X,PositionAt110Y,PositionAt110Z,Distance,LastTrackedDistance,Bearing,HangTime,pfxx,pfxz,x0,y0,z0,vx0,vy0,vz0,ax0,ay0,az0,HomeTeam,AwayTeam,Stadium,Level,League,GameID,PitchUID,EffectiveVelo,MaxHeight,MeasuredDuration,SpeedDrop,PitchLastMeasuredX,PitchLastMeasuredY,PitchLastMeasuredZ,ContactPositionX,ContactPositionY,ContactPositionZ,GameUID,UTCDate,UTCTime,LocalDateTime,UTCDateTime,AutoHitType,System,HomeTeamForeignID,AwayTeamForeignID,GameForeignID,Catcher,CatcherId,CatcherThrows,CatcherTeam,PlayID,PitchTrajectoryXc0,PitchTrajectoryXc1,PitchTrajectoryXc2,PitchTrajectoryYc0,PitchTrajectoryYc1,PitchTrajectoryYc2,PitchTrajectoryZc0,PitchTrajectoryZc1,PitchTrajectoryZc2,HitSpinAxis,HitTrajectoryXc0,HitTrajectoryXc1,HitTrajectoryXc2,HitTrajectoryXc3,HitTrajectoryXc4,HitTrajectoryXc5,HitTrajectoryXc6,HitTrajectoryXc7,HitTrajectoryXc8,HitTrajectoryYc0,HitTrajectoryYc1,HitTrajectoryYc2,HitTrajectoryYc3,HitTrajectoryYc4,HitTrajectoryYc5,HitTrajectoryYc6,HitTrajectoryYc7,HitTrajectoryYc8,HitTrajectoryZc0,HitTrajectoryZc1,HitTrajectoryZc2,HitTrajectoryZc3,HitTrajectoryZc4,HitTrajectoryZc5,HitTrajectoryZc6,HitTrajectoryZc7,HitTrajectoryZc8,ThrowSpeed,PopTime,ExchangeTime,TimeToBase,CatchPositionX,CatchPositionY,CatchPositionZ,ThrowPositionX,ThrowPositionY,ThrowPositionZ,BasePositionX,BasePositionY,BasePositionZ,ThrowTrajectoryXc0,ThrowTrajectoryXc1,ThrowTrajectoryXc2,ThrowTrajectoryYc0,ThrowTrajectoryYc1,ThrowTrajectoryYc2,ThrowTrajectoryZc0,ThrowTrajectoryZc1,ThrowTrajectoryZc2,PitchReleaseConfidence,PitchLocationConfidence,PitchMovementConfidence,HitLaunchConfidence,HitLandingConfidence,CatcherThrowCatchConfidence,CatcherThrowReleaseConfidence,CatcherThrowLocationConfidence
1,1/24/25,00:49.0,1,1,"Hoover, Chase",800623,Left,SAN_GAU,"Sebring, Jonah",1,Right,GAU_ALU,Undefined,1,Top,0,0,0,Fastball,Four-Seam,BallCalled,Undefined,Undefined,Undefined,0,0,,91.0084,-1.09813,-0.222841,2135.84,174.636,11:45,6.06336,-0.55963,6.53684,-11.3192,22.2808,-1.97106,4.11275,-0.92828,81.3317,-3.38021,-0.58091,0.417197,,,,,,,,,,,,0.93815,12.3245,0.57572,50,5.9815,0.59712,-132.007,-2.70767,1.61797,34.0238,-10.9187,SAN_GAU,SAN_GAU,CaesarUyesaka,TeamExclusive,Team,20250124-CaesarUyesaka-Private-2,eb74ff60-daaf-11ef-b64c-2ba247267c32,90.3469,,,9.67669,,,,,,,e2d5b440-508a-48cd-9e58-715d78e5423d,1/25/25,00:49.0,2025-01-24T16:00:49.0163182-08:00,2025-01-25T00:00:49.0163182Z,,v3,471258,471258,,"Fernandez, Ian",10002000.0,Right,SAN_GAU,398d403b-67e0-48a0-bb44-e2c7e5001ccc,53.9538,-133.022,17.0119,6.05743,-2.38188,-5.45937,0.55862,0.54884,0.80899,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,High,High,High,,,,,
2,1/24/25,01:03.4,1,2,"Hoover, Chase",800623,Left,SAN_GAU,"Sebring, Jonah",1,Right,GAU_ALU,Undefined,1,Top,0,1,0,Fastball,Four-Seam,StrikeCalled,Undefined,Undefined,Undefined,0,0,,92.3685,-3.50465,1.71863,2181.56,147.65,11:00,6.01495,-0.47514,6.38037,-13.5201,19.2783,-11.4254,1.65875,0.15424,83.0622,-6.14811,-0.350586,0.41219,,,,,,,,,,,,6.46831,11.5307,0.35602,50,5.7556,-3.52596,-133.765,-8.50702,11.5041,33.8016,-11.6662,SAN_GAU,SAN_GAU,CaesarUyesaka,TeamExclusive,Team,20250124-CaesarUyesaka-Private-2,f40dc5d0-daaf-11ef-b64c-2ba247267c32,91.4442,,,9.30632,,,,,,,e2d5b440-508a-48cd-9e58-715d78e5423d,1/25/25,01:03.4,2025-01-24T16:01:03.4172910-08:00,2025-01-25T00:01:03.4172910Z,,v3,471258,471258,,"Fernandez, Ian",10002000.0,Right,SAN_GAU,9d6f4556-5bb0-4fec-bc38-8db0a13c59f8,54.1133,-134.8,16.9008,6.01072,-8.14967,-5.8331,0.46942,-3.87835,5.75207,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,High,High,High,,,,,
3,1/24/25,01:19.9,1,3,"Hoover, Chase",800623,Left,SAN_GAU,"Sebring, Jonah",1,Right,GAU_ALU,Undefined,1,Top,0,1,1,Fastball,Four-Seam,StrikeCalled,Undefined,Undefined,Undefined,0,0,,92.3818,-2.72257,2.01215,2188.68,152.9,11:00,6.14297,-0.37914,6.24929,-12.6188,20.3023,-9.75394,2.57715,0.66444,82.9749,-5.20597,0.250861,0.412961,,,,,,,,,,,,5.66973,11.737,0.23391,50,5.93392,-4.27578,-133.818,-6.64709,10.0943,33.7663,-11.2777,SAN_GAU,SAN_GAU,CaesarUyesaka,TeamExclusive,Team,20250124-CaesarUyesaka-Private-2,fde72290-daaf-11ef-b64c-2ba247267c32,91.2737,,,9.40686,,,,,,,e2d5b440-508a-48cd-9e58-715d78e5423d,1/25/25,01:19.9,2025-01-24T16:01:19.9131118-08:00,2025-01-25T00:01:19.9131118Z,,v3,471258,471258,,"Fernandez, Ian",10002000.0,Right,SAN_GAU,aa58cf17-1fb5-4fa9-8dad-846ea05d34aa,54.2439,-134.885,16.8831,6.13826,-6.29085,-5.63886,0.37401,-4.59464,5.04717,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,High,High,High,,,,,
4,1/24/25,01:36.0,1,4,"Hoover, Chase",800623,Left,SAN_GAU,"Sebring, Jonah",1,Right,GAU_ALU,Undefined,1,Top,0,1,2,Fastball,Four-Seam,BallCalled,Undefined,Undefined,Undefined,0,0,,91.3319,-1.88778,2.40633,2192.25,164.566,11:30,5.93813,-0.4836,6.53088,-14.0982,19.2538,-4.9678,3.02945,1.31105,81.9114,-4.66019,1.50505,0.415654,,,,,,,,,,,,3.37855,11.0349,0.31832,50,5.80041,-5.35426,-132.368,-4.67373,5.87509,33.4922,-12.985,SAN_GAU,SAN_GAU,CaesarUyesaka,TeamExclusive,Team,20250124-CaesarUyesaka-Private-2,07767e00-dab0-11ef-b64c-2ba247267c32,90.6822,,,9.42045,,,,,,,e2d5b440-508a-48cd-9e58-715d78e5423d,1/25/25,01:36.0,2025-01-24T16:01:36.0105966-08:00,2025-01-25T00:01:36.0105966Z,,v3,471258,471258,,"Fernandez, Ian",10002000.0,Right,SAN_GAU,3bee5cfa-d0a1-40f9-9516-3902dc395943,53.9613,-133.366,16.7461,5.93398,-4.2866,-6.49252,0.48056,-5.52942,2.93754,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,High,High,High,,,,,
5,1/24/25,01:52.3,1,5,"Hoover, Chase",800623,Left,SAN_GAU,"Sebring, Jonah",1,Right,GAU_ALU,Undefined,1,Top,0,2,2,Fastball,Four-Seam,StrikeSwinging,Strikeout,Undefined,Undefined,0,0,,92.0337,-1.48845,0.583075,2270.69,153.097,11:00,6.13509,-0.35335,5.94452,-11.9449,21.6826,-10.3432,3.7587,-0.67443,82.243,-3.85621,-1.27501,0.417368,,,,,,,,,,,,5.42044,11.9123,0.31229,50,6.00969,-0.88196,-133.339,-3.72011,9.54182,34.6098,-11.2044,SAN_GAU,SAN_GAU,CaesarUyesaka,TeamExclusive,Team,20250124-CaesarUyesaka-Private-2,112f0c50-dab0-11ef-b64c-2ba247267c32,90.3099,,,9.79065,,,,,,,e2d5b440-508a-48cd-9e58-715d78e5423d,1/25/25,01:52.3,2025-01-24T16:01:52.3162094-08:00,2025-01-25T00:01:52.3162094Z,,v3,471258,471258,,"Fernandez, Ian",10002000.0,Right,SAN_GAU,f4c9e2a0-266f-4c6c-88ac-ed9431cd1fc1,54.5468,-134.514,17.3049,6.12953,-3.33973,-5.6022,0.34773,-1.2059,4.77091,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,High,High,High,,,,,
6,1/24/25,02:21.6,2,1,"Hoover, Chase",800623,Left,SAN_GAU,"Oakley, Nick",2,Left,GAU_ALU,Undefined,1,Top,1,0,0,Fastball,Four-Seam,StrikeCalled,Undefined,Undefined,Undefined,0,0,,91.6672,-1.62129,1.30692,2240.39,162.384,11:30,6.10438,-0.28967,6.33736,-13.8946,19.3327,-5.74906,3.45301,0.43472,82.3911,-4.34377,0.266632,0.414876,,,,,,,,,,,,3.38339,10.8954,0.19693,50,5.97942,-2.78575,-132.943,-4.08457,5.9508,33.0795,-13.0108,SAN_GAU,SAN_GAU,CaesarUyesaka,TeamExclusive,Team,20250124-CaesarUyesaka-Private-2,22a3e3c0-dab0-11ef-b64c-2ba247267c32,90.8522,,,9.27603,,,,,,,e2d5b440-508a-48cd-9e58-715d78e5423d,1/25/25,02:21.6,2025-01-24T16:02:21.5804398-08:00,2025-01-25T00:02:21.5804398Z,,v3,471258,471258,,"Fernandez, Ian",10002000.0,Right,SAN_GAU,50c2d71e-4e40-4440-9fad-c4c0762559f0,54.1552,-133.973,16.5398,6.10029,-3.67948,-6.50538,0.28655,-2.97103,2.9754,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,High,High,High,,,,,
7,1/24/25,02:33.9,2,2,"Hoover, Chase",800623,Left,SAN_GAU,"Oakley, Nick",2,Left,GAU_ALU,Undefined,1,Top,1,0,1,Fastball,Four-Seam,BallCalled,Undefined,Undefined,Undefined,0,0,,91.7695,-5.02291,1.0898,2218.23,157.244,11:15,6.02778,-0.43011,6.57719,-10.178,23.0964,-9.12764,0.56347,-0.19183,82.1026,-7.07607,-0.569432,0.41517,,,,,,,,,,,,5.14414,14.2334,0.35868,50,5.67761,-2.14889,-132.655,-11.8117,8.93969,34.9198,-7.43867,SAN_GAU,SAN_GAU,CaesarUyesaka,TeamExclusive,Team,20250124-CaesarUyesaka-Private-2,29ef90c0-dab0-11ef-b64c-2ba247267c32,90.788,,,9.66692,,,,,,,e2d5b440-508a-48cd-9e58-715d78e5423d,1/25/25,02:33.9,2025-01-24T16:02:33.8502638-08:00,2025-01-25T00:02:33.8502638Z,,v3,471258,471258,,"Fernandez, Ian",10002000.0,Right,SAN_GAU,4afb0bfc-8d5d-4e2d-ae5d-f48cdb35f956,53.9164,-133.682,17.4599,6.02176,-11.5929,-3.71933,0.42575,-2.41179,4.46985,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,High,High,High,,,,,
8,1/24/25,02:57.8,2,3,"Hoover, Chase",800623,Left,SAN_GAU,"Oakley, Nick",2,Left,GAU_ALU,Undefined,1,Top,1,1,1,Cutter,Slider,BallCalled,Undefined,Undefined,Undefined,0,0,,85.981,1.3554,0.414612,2438.81,217.007,1:15,6.03032,-0.69555,6.06847,-32.3823,5.59205,3.21768,4.58632,-0.04373,77.6466,-4.70405,0.994061,0.443524,,,,,,,,,,,,-1.50386,2.89217,0.66189,50,6.11856,-1.03726,-124.741,1.89346,-2.33967,28.041,-27.6744,SAN_GAU,SAN_GAU,CaesarUyesaka,TeamExclusive,Team,20250124-CaesarUyesaka-Private-2,38406be0-dab0-11ef-b64c-2ba247267c32,84.9841,,,8.33436,,,,,,,e2d5b440-508a-48cd-9e58-715d78e5423d,1/25/25,02:57.8,2025-01-24T16:02:57.8562542-08:00,2025-01-25T00:02:57.8562542Z,,v3,471258,471258,,"Fernandez, Ian",10002000.0,Right,SAN_GAU,307d7d1f-ea8f-4b1b-a5f7-0a3b19de4c18,54.4247,-125.731,14.0205,6.03439,2.87123,-13.8372,0.69707,-0.9546,-1.16983,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,High,High,High,,,,,
9,1/24/25,03:15.3,2,4,"Hoover, Chase",800623,Left,SAN_GAU,"Oakley, Nick",2,Left,GAU_ALU,Undefined,1,Top,1,2,1,Fastball,Four-Seam,StrikeSwinging,Undefined,Undefined,Undefined,0,0,,93.0939,-2.52921,0.910805,2198.64,172.146,11:45,6.21281,-0.25043,6.15999,-7.22122,25.2711,-3.31744,3.2728,0.31458,83.5225,-4.03902,0.312646,0.410263,,,,,,,,,,,,1.99012,14.1402,0.18285,50,6.01651,-1.99533,-134.898,-6.00466,3.60193,34.214,-6.58166,SAN_GAU,SAN_GAU,CaesarUyesaka,TeamExclusive,Team,20250124-CaesarUyesaka-Private-2,429f2540-dab0-11ef-b64c-2ba247267c32,91.8739,,,9.57145,,,,,,,e2d5b440-508a-48cd-9e58-715d78e5423d,1/25/25,03:15.3,2025-01-24T16:03:15.2787950-08:00,2025-01-25T00:03:15.2787950Z,,v3,471258,471258,,"Fernandez, Ian",10002000.0,Right,SAN_GAU,c4066ff9-1722-43cc-bd15-759ce3d78be3,54.3322,-135.992,17.107,6.2052,-5.79414,-3.29083,0.24852,-2.11054,1.80097,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,High,High,High,,,,,
10,1/24/25,03:30.1,2,5,"Hoover, Chase",800623,Left,SAN_GAU,"Oakley, Nick",2,Left,GAU_ALU,Undefined,1,Top,1,2,2,Fastball,Four-Seam,BallCalled,Undefined,Undefined,Undefined,0,0,,92.2586,-4.73537,0.811818,2251.24,160.968,11:15,6.11764,-0.17624,6.27986,-10.8762,22.4603,-7.28083,0.83636,-0.03469,82.3763,-6.90777,-0.504462,0.415557,,,,,,,,,,,,4.04882,13.7004,0.11964,50,5.76212,-1.57683,-133.325,-11.2451,7.09517,35.7227,-8.16546,SAN_GAU,SAN_GAU,CaesarUyesaka,TeamExclusive,Team,20250124-CaesarUyesaka-Private-2,4b778cc0-dab0-11ef-b64c-2ba247267c32,90.7033,,,9.88225,,,,,,,e2d5b440-508a-48cd-9e58-715d78e5423d,1/25/25,03:30.1,2025-01-24T16:03:30.1110766-08:00,2025-01-25T00:03:30.1110766Z,,v3,471258,471258,,"Fernandez, Ian",10002000.0,Right,SAN_GAU,5481b27f-a806-4231-8f43-c41c1a771a21,54.2133,-134.45,17.8613,6.11194,-10.9881,-4.08273,0.17278,-1.80011,3.54759,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,High,High,High,,,,,


In [None]:
pitches.labels

In [None]:
pitches.num_columns

In [None]:
pitches.num_rows

## Section 3.4: Introduction To Tables  (*new material this lecture*)

### Four things to know for lab tomorrow
Let's go over four things that I want you to know about Tables;
you'll be asked about these tomorrow in your discussion section.

It's open book / open note, so you may refer to this while doing the
exercise; we'll also have this available for your reference.  So, for now,
you don't need to memorize this.   But you do need to know where to find it, and
how to apply this "summary" in a real coding situation.

| What you want to do | How do do it |
|-|-|
| Find out the number of columns in table `t` | `t.num_columns` |
| Find out the number of rows in table `t` | `t.num_rows` |
| List the names of the columns in table `t` | `t.labels` |
| Show just the first n rows of table `t` | `t.show(n)` where `n` is an integer |


#### num_columns/num_rows
The method `num_columns` gives the number of columns in the table, and `num_rows` the number of rows.

In [None]:
movies.num_columns

Note that in the table above, it says `t.num_columns`, while here, we have `movies.num_columns`.

The idea is that you put the name of the variable that holds the `Table` object in place of `t`.

Earlier in this notebook, we defined a `Table` called `flowers`.  So we can find the number of columns via:

In [None]:
flowers.num_columns

In [None]:
flowers

We can also get the number of rows in a table `t` with `t.num_rows`:

In [None]:
flowers.num_rows

In [None]:
movies.num_rows

Note that `num_rows` is not the same as `rows` which does something else entirely:

In [None]:
flowers.rows

We can see that plain old `.rows` gives us a value that is of a different type.  We won't need to investigate that further right now, but be aware that `.num_rows` and `.rows` are different.

Check out these various `type` expressions:

In [None]:
type(flowers.rows)

In [None]:
type(flowers.num_rows)

In [None]:
type(flowers)

### `t.labels`

To see the column names in Table `t`, we use `t.labels`, like this:


In [None]:
movies.labels

In [None]:
flowers.labels

### `t.show(n)`

To show just the first `n` rows of a table, we can use `t.show(n)`, like this:

In [None]:
flowers.show(1)

In [None]:
flowers.show(2)

In [None]:
movies.show(5)

## More on Tables

#### Accessing data in a column
We can use a column’s label to access the array of data in the column as shown below. 

In [None]:
movies.column("#1 Movie")

## Asking AI about `datascience` table methods

There are a variety of *methods* we can use on tables; these include 
* `show`
* `select`
* `drop`
* `sort`
* `where`
and many others.

Let's ask chatgpt for help:

In [None]:
%%ai openai-chat:gpt-3.5-turbo
Can you give me a brief summary of some of the most useful methods that
we can use on a Table() from the Python datascience library?

That was somewhat helpful.  But let's change our prompt and see if we can come up with something more useful.

That's more useful.  But it would be even better if we had 
a brief explanation with each one.

In [None]:
%%ai openai-chat:gpt-3.5-turbo
Can you give me a brief summary of some of the most useful methods that
we can use on a Table() from the Python datascience library, with examples
that use the `movies` table defined in this notebook?

In [None]:
%%ai openai-chat:gpt-3.5-turbo
Can you repeat that, but add a brief explanation 
along with each example that would help a beginner understand
what the methods does, and anything else they should know?

## Trying out the methods that ChatGPT suggested.

In the cells above, you see where we asked ChatGPT for some methods
that we can use on a table object.

Note that the answers may vary each time you run the AI queries.  So the examples below are taken from one particular instance of the answers given by ChatGPT.  

Let's try each of them out, and see if they work, or don't work the way that ChatGPT suggested they would.

As we'll see, **ChatGPT doesn't always get it right!**


### `.show()`

This method displays the contents of the table in a format that is easily readable for the user.

```
movies.show()
```

In [None]:
movies.show() # This should work

In [None]:
movies.show(3)  # show just the first 3 rows

### `.select()`

Select method allows you to specify and display only the columns of interest from the table.

```
selected_columns = movies.select('Title', 'Genre')
selected_columns.show()
```


In [None]:
selected_columns = movies.select('Title', 'Genre') # This doesn't work
selected_columns.show()

In [None]:
movies.labels

In [None]:
selected_columns = movies.select('Year', 'Total Gross') # This works
selected_columns.show(3)

### `.sort()`

    Sort method arranges the rows of the table in ascending order based on the specified column.

```
sorted_table = movies.sort('Year')
sorted_table.show(10)
```

In [None]:
sorted_table = movies.sort('Year')
sorted_table.show(10)

In [None]:
sorted_table = movies.sort('Number of Movies')
sorted_table.show(10)

### Digression: looking up the documentation 

On the Canvas site for the course, we have two links that are really helpful when dealing with instances of the `Table` object from the Python `datascience` module:

* The [Datascience Package Documentation](https://www.data8.org/datascience/) at <https://www.data8.org/datascience/>
* The [Datascience Python Reference (cheat sheet)](https://www.data8.org/sp22/python-reference.html) from the Spring 2022 Data 8 course at Berkeley at <https://www.data8.org/sp22/python-reference.html>

If we look up the sort routine there, we can discover that we can use `descending=True` to sort biggest to smallest, like this:

In [None]:
movies.sort('Year', descending=True).show(5)

In [None]:
movies.sort('Year').show(5)

### `.where()`

    Where method filters the rows of the table based on a specified condition.

```
filtered_rows = movies.where('Director', are.equal_to('Christopher Nolan'))
filtered_rows.show(5)
```

In [None]:
# This is an AI hallucination!  There is no Director field in this table!
filtered_rows = movies.where('Director', are.equal_to('Christopher Nolan'))
filtered_rows.show(5)

In [None]:
# This, on the other hand, works
filtered_rows = movies.where('#1 Movie', are.containing('Star Wars'))
filtered_rows.show()

We can see from the results above that when a Star Wars movie comes out, it is often the #1 movie in that year (by whatever criteria are being used in this dataset!)  Though, not always: Star Wars II, Attack of the Clones, is conspicuously absent.

### How do you use `where`?

Note: when using `where`, there is a collection of methods that you can use to specify which records you want.   These are called *where predicates*, and they are documented here:
* <https://www.data8.org/datascience/reference-nb/datascience-reference.html#Table.where-Predicates>

We will go over them all at a later date, but for now, here's a quick list of them:

```
    are.equal_to(...)
    are.above(...)
    are.above_or_equal_to(...)
    are.below(...)
    are.below_or_equal_to(...)
    are.between(...)
    are.between_or_equal_to(...)
    are.contained_in(...)
    are.containing(...)
    are.strictly_between(...)
```

### `.group()`

    Group method groups the rows of the table based on a specified column, aggregating the data within each group.

```
grouped_table = movies.group('Director')
grouped_table.show()
```

In [None]:
# There is no "Director" column... we already know that!
grouped_table = movies.group('Director')
grouped_table.show()

As it turns out, the movies table is not a good table to use as an example for the following methods:
* group
* join
* pivot
* barh

So we are just going to skip over those for now, and return to them later; for the most part, they really don't come up until Chapters 7 and 8 anyway.

In [None]:
### `.scatter()`

    Scatter method creates a scatter plot using the data from the specified columns in the table.
```
movies.scatter('Year', 'Revenue (Millions)')
```

In [None]:
movies.scatter('Year', 'Revenue (Millions)') # Another hallucination!

ChatGPT is making stuff up again; we asked it to use the Movies table from our document, but it just sort of "guessed" at what the column names would be.

But the bottom line is that `scatter` will make an (x,y) plot with:
* one point for every row in the table
* the first label we specify is for the x values
* the second label we specify is for the y values

Here are some examples:


In [None]:
movies.labels  # let's see the column names (labels) that DO exist

In [None]:
movies.show(2) # let's see the first two rows

In [None]:
movies.scatter('Year','Average Ticket Price')

In [None]:
movies.scatter('Average Ticket Price','Total Gross')

In [None]:
movies.scatter('Year','Total Gross')

In [None]:
movies.scatter('Year','Number of Movies')

## If there's extra time

To ask ChatGPT a question inside our notebook, we can create a code cell and put this in it:

```
%%ai openai-chat:gpt-3.5-turbo
```

Let's try asking it for some data about the UCs.

In [None]:
%%ai openai-chat:gpt-3.5-turbo
Please give me a table of the University of California campuses, with the name of each, the year it was founded, the number of undergrad students, the number of grad students, and the number of total faculty.
    

# Building a Data Science table with this data

We can build a data science table with this data by rewriting it like this:


In [None]:
Table().with_columns( 'Campus Name', make_array("UC Berkeley", "UC Davis"))

We can add the Year founded column like this:

In [None]:
ucs = Table().with_columns( 'Campus Name', make_array("UC Berkeley", "UC Davis"),
                            'Year Founded', make_array(1868, 1905) )
ucs

Adding more columns works the way you'd expect:

In [None]:
ucs = Table().with_columns( 'Campus Name', make_array("UC Berkeley", "UC Davis"),
                            'Year Founded', make_array(1868, 1905),
                            'Undergrads', make_array(31853, 30986) )
ucs

Can the AI put this together for us?   Let's see:

In [None]:
%%ai openai-chat:gpt-3.5-turbo
Please give me a table of the University of California campuses, with the name of each, the year it was founded, the number of undergrad students, the number of grad students, and the number of total faculty.

Please give me the table with ordinary table formatting, but then please also give me the code to build a table using
the Python datascience moduile, following this example:

ucs = Table().with_columns( 'Campus Name', make_array("UC Berkeley", "UC Davis"),
                            'Year Founded', make_array(1868, 1905),
                            'Undergrads', make_array(31853, 30986) )
ucs

In [None]:


ucs = Table().with_columns(
        'Campus Name', make_array("UC Berkeley", "UC Davis", "UC Irvine", "UC Los Angeles", "UC Merced", "UC Riverside", "UC San Diego", "UC San Francisco", "UC Santa Barbara", "UC Santa Cruz"),
        'Year Founded', make_array(1868, 1905, 1965, 1919, 2005, 1954, 1960, 1873, 1944, 1965),
        'Undergrad Students', make_array(31853, 30986, 29588, 31577, 7442, 20640, 30285, 0, 21685, 17052),
        'Grad Students', make_array(11922, 7029, 6549, 16526, 1030, 4315, 8825, 3738, 3603, 1601),
        'Total Faculty', make_array(2383, 3256, 1526, 4010, 192, 1449, 3594, 1310, 1413, 596)
)
ucs

We'll do a lot more with this table in the classes to come!

Let's review the "cheat sheet" of things you can do with a Table build with the python datascience module:
* <https://www.data8.org/sp22/python-reference.html>

If there's still time, we can try some of those things on this table.  You are encouraged to explore that on your own, after class as well.