In [None]:
# Click into this cell and press shift-enter before using this notebook.
# This line loads the ability to use %%ai in your file
%load_ext jupyter_ai_magics
# These lines import the Python modules we commonly use in CMPSC 5A
from datascience import *
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plots

# Lecture 9, CMPSC 5A, F25

10/21/2025, Thursday of Week 4

# Notes to self 
* Check font size
* Check that you are sharing the screen on the zoom session
* Ask staff to help monitor the zoom chat
* Remind students to run the top cell

## Where are we in the reading?

You should have already read:
* [Chapter 1: What is Data Science](https://inferentialthinking.com/chapters/01/what-is-data-science.html)
* [Chapter 2 (Causality and Experiments)](https://inferentialthinking.com/chapters/02/causality-and-experiments.html)
* [Chapter 3: Programming in Python](https://inferentialthinking.com/chapters/03/programming-in-python.html)
* [Chapter 4: Data Types](https://inferentialthinking.com/chapters/04/Data_Types.html)
* [Chapter 5 (Sequences)](https://inferentialthinking.com/chapters/05/Sequences.html)
* [Chapter 6 (Tables)](https://inferentialthinking.com/chapters/06/Tables.html)
* [Chapter 7 (Visualizations)](https://inferentialthinking.com/chapters/07/Visualization.html)
* [Chapter 8 (Functions and Tables: Intro)](https://inferentialthinking.com/chapters/08/Functions_and_Tables.html)
* [Section 8.1 (Applying a Function to a Column)](https://inferentialthinking.com/chapters/08/1/Applying_a_Function_to_a_Column.html)


## Upcoming Quiz, Midterm

As a reminder, here are the dates for the next quiz and the next midterm:

| Item | % of grade | Dates | Week(s) |
|-|-|-|-|
|Quiz2 | approx 2.5% | Friday Oct 24, and Monday Oct 27 | Week 4/5|
|Exam1 (midterm) | 15% |Friday Oct 31, and Monday Nov 3 | Week 5/6|
|Exam2 (final*) | 15% | Mon Dec 8, Tue Dec 9, Wed Dec 10 | Finals Week|

\*Also, during the official final exam slot, Tue Dec 9, 4-7pm, times tbd

## Register now:

* Go to <https://us.prarietest.com>
* Find `CMPSC 5A`
* Find `Exams available for reservation`
* Quiz1 and Exam1 are available for making reservations *now*.
* Please make your reservations early.

**Please watch your email over the next few days** for information about practice quizzes.


# ic12-10.23

* <https://www.gradescope.com/courses/1150376/assignments/7036408>

## Reminder to practice for Quiz2

PracticeQuiz2 at <https://us.prairielearn.com> has five questions on it.

* Then, Friday Oct 24/Monday Oct 27, Quiz 2 will be structured the same
  * The problems are the same; you'll just get different variants
  * Just one attempt per problem.
  * So make them count!
  * Quiz 2 will count the same as a lab; about 2.5% of your course grade.
* Then, Friday Oct 31/Monday Nov 3, Exam 1 (Midterm) will include all of these questions, plus more
  * There will be a second practice quiz next week for the additional question types for the Midterm
  * The midterm will have at least 10 question types (including the five from Quiz2), but no more than 20.
  * The midterm is 15% of your course grade.

So: practice, practice, practice!

## Tables we'll use in today's lecture

Let's load these two tables first, since the examples we'll review below need them

In [None]:
courseTable = Table.read_table("data/ucsb-s25-courses.csv")
movies = Table.read_table("data/movies_by_year_with_ticket_price.csv")

## Review of lecture08

* We reviewed a handy function for randomly sampling a table

In [None]:
def randomSample(myTable, size):
    n = myTable.num_rows 
    allRowIndexes = np.arange(n) 
    randomIndexes = \
      np.random.choice(
        allRowIndexes, size, replace=False
      )
    randomRows = myTable.take(randomIndexes)
    return randomRows

* We reviewed how we can break the COURSEID field from the courses table apart into it's different parts using functions such as these:

In [None]:
def courseIdToDept(courseId):
    return courseId[0:8].strip()

def courseIdToCourseNum(courseId):
    return courseId[8:13].strip()

def courseIdToSuffix(courseId):
    return courseId[14:].strip()

We reviewed how we can use a for loop and a call to the print function 
with a Python format string to demonstrate these functions working:

In [None]:
courseIds = randomSample(courseTable,5).column("COURSEID")

for courseId in courseIds:
   dept = courseIdToDept(courseId)
   courseNum = courseIdToCourseNum(courseId)
   suffix = courseIdToSuffix(courseId)
   print(f"courseId={courseId:15} dept={dept:9} courseNum={courseNum:6} suffix={suffix:3}")


* We reviewed how to add a column to a table using `apply`
  * Option 1: `table.apply(function, "Column Name")`
  * Option 2: `table.apply(function, 0 )   # 0 is a column number `
  * This makes a new `numpy.ndarray` that we can add to a table using `with_column`

Here's an example of using apply to add columns to a table

In [None]:
departments = courseTable.apply(courseIdToDept, 0)
courseNums = courseTable.apply(courseIdToCourseNum, 0)
suffixes = courseTable.apply(courseIdToSuffix, 0)
courses = courseTable.with_columns("dept",departments,"courseNum",courseNums,"suffix",suffixes)

In [None]:
courses

* We reviewed using the `where` clause to create smaller tables from a larger one that choose specific rows based on criteria

In [None]:
csCourses = courses.where("dept",are.equal_to("CMPSC"))
pstatCourses = courses.where("dept",are.equal_to("PSTAT"))
artHistory = courses.where("dept",are.equal_to("ARTHI"))

In [None]:
csCourses

In [None]:
pstatCourses

* We reviewed bar graphs, where you first group data to find counts:

In [None]:
desired_departments = make_array("CMPSC", "PSTAT", "MATH")
coursesSubset = courses.where("dept", are.contained_in(desired_departments))
coursesSubset.num_rows
selectedCoursesGroupedByDepartment = coursesSubset.group("dept")
selectedCoursesGroupedByDepartment

Then make a bar graph of the resulting counts:

In [None]:
selectedCoursesGroupedByDepartment.barh("dept")

In [None]:
* We mentioned that `help(are)` can be used to look up the *where predicates*

* We reviewed what a *predicate* is (a function that returns True or False)

* We reviewed how to use the where method, for example: `table.where("column", are.equal(value)`

* We talked about *data cleaning*, where we remove data that
  * Contains errors
  * Is inconsistent
  * DOES NOT accurately gives us answer to the question(s) we are asking (note typo in last class' notes)


## Some functions from last time

Last time, we defined a few functions we'll need again today, so let's define those again now

In [None]:
def is_undergrad(courseid):
    """ returns true if course is undergrad, otherwise returns false """
    return courseid[8]==' ' or courseid[8]=='1'

def isGraduate(courseNum):
    """ True if courseNum isn't a space and it isn't a 1 (so it must be a graduate course) """
    return courseNum[8] != ' ' and courseNum[8] != '1'

def is_lecture(sectionNum):
    # if section number is divisible by 100, it's a lecture
    return sectionNum % 100 == 0

def tenOrMoreStudents(enrollment):
    return enrollment > 10

Once we have this method, we can use it like this:

In [None]:
courses.where("COURSEID", is_undergrad)

Now we can pull out all of the graduate courses:


In [None]:
undergrad_courses = courses.where("COURSEID", isGraduate)
graduate_courses = courses.where("COURSEID", isGraduate)

In [None]:
print(f"courses.num_rows = {courses.num_rows}")
print(f"undergrad_courses.num_rows = {undergrad_courses.num_rows}")
print(f"graduate_courses.num_rows = {graduate_courses.num_rows}")
print(f"undergrad_courses.num_rows + graduate_courses.num_rows = {undergrad_courses.num_rows + graduate_courses.num_rows}")

This gives us much more confidence that our functions are calculating correctly!

We can also check random rows from our `graduate_courses` table:

In [None]:
randomSample(graduate_courses, 5)

Now we can apply all of these predicates to our dataset to get the courses
we want to count:
* Only undergrad courses
* Only lectures
* Only courses with 10 or more students

In [None]:
coursesToCount = courses \
    .where("COURSEID", is_undergrad) \
    .where("SECTION", is_lecture) \
    .where("ENROLLED", tenOrMoreStudents)
coursesToCount

# Using `group` for things other than counting

The most basic use of the `group` function is to group rows in a table based on a particular column, 
and then count up the values.

However, we can take it a step further using functions such as `sum`, `min`, `max`, and others; basically any function that can take a `numpy.ndarray` as it's argument.

First, we'll show what this looks like.  Then we'll dive in a bit deeper.

Here's: `coursesToCount.group("dept",sum)`


In [None]:
coursesToCount.group("dept",sum)

What we get back is the 'sum' function applied to each column of data.
* The numbers in `ENROLLED sum` show the total of the enrollment
* The numbers in `MAXRENROLL sum` show the total of the max enrollment, i.e. how big the class could be if full
* The numbers if `QUARTER sum` and `SECTION sum` are not really meaningful; they are adding up numbers that are really just identifiers.  We might have done well to exclude those columns before doing our grouping.
* Lecture sum shows us the number of rows for which that field is True.  Since that's every row of data, this actually shows us the same as the counts from the original `.group()` call.

So really, a better way to use `group` with the `sum` function is to first use `select` so that we only have columns on which 
finding a sum is meaningful.  
    
For example, consider this table:

In [None]:
coursesToCount.select("COURSEID", "dept","ENROLLED","MAXENROLL")

If we take this table, and group by dept and take the sum, we get this:

In [None]:
coursesToCount.select("COURSEID", "dept","ENROLLED","MAXENROLL").group("dept",sum)

You can see that the COURSEID sum column is blank, because it doesn't make any sense to find the "sum" of the courseid values.   So, in the next example, we'll omit it.  Here is how we can find the subject areas with the most students enrolled.

In [None]:
coursesToCount \
    .select("COURSEID", "dept","ENROLLED","MAXENROLL") \
    .group("dept",sum).sort("ENROLLED sum",descending=True)

We can also use the `max` function to determine which subject area has a single class with the largest enrollment.  This is applying the `max` function across all of the ENROLLED values in each group by subject area.

In [None]:
coursesToCount \
    .select("dept","ENROLLED","MAXENROLL") \
    .group("dept",max).sort("ENROLLED max",descending=True)

Let's give this table a name with an assignment statement

In [None]:
largestCourseSizes = coursesToCount \
    .select("dept","ENROLLED","MAXENROLL") \
    .group("dept",max).sort("ENROLLED max",descending=True)

In [None]:
largestCourseSizes

Suppose we wanted to add to this table the actual name of the course that is the largest one in each subject area.

First, here's an *incorrect* answer:

In [None]:
largestCourseSizes = coursesToCount \
    .select("COURSEID", "dept","ENROLLED","MAXENROLL") \
    .group("dept",max).sort("ENROLLED max",descending=True)
largestCourseSizes

We might think this is saying that ECON 199RA-2 is the largest class offered by ECON. But that is not the case. 

Instead, `ECON 199RA-2` is the `max` of the `COURSEID` column *lexigographically*, i.e. the one that would occur last if we sorted by this column!  That's not really meaningful if we are looking to identify the largest course (by enrollment) taught by ECON.

Instead, we can write a function that takes the `dept` as a parameter, and returns as it's value, the course id of the largest course in that department.  Then we can use `apply` to make a new `numpy.ndarray` of those values, and then we can add that column to the table.

Here's how that looks. First let's do it by hand for just `ECON`


Here are all of the ECON courses, in descending order by enrollment:

In [None]:
coursesToCount.where("dept", are.equal_to("ECON")).sort("ENROLLED",descending=True).show(5)

In [None]:
What we want is the `COURSEID` field from the top row.  So that's this:


In [None]:
coursesToCount.where("dept", are.equal_to("ECON")).sort("ENROLLED",descending=True).column('COURSEID').item(0)

**This doesn't work if there are ties**.  Note that if there are ties for the largest course, this method won't work; we'll only pull out one of the courses that has that many students.  Maybe we'll come back later and show how we'd solve the problem if there were ties.  But it's a good start!

Let's turn this into a function.  Instead of hard coding `"ECON"`, we replace this with a parameter.

In [None]:
def deptToLargestCourse(dept):
    return coursesToCount.where("dept", are.equal_to(dept)).sort("ENROLLED",descending=True).column('COURSEID').item(0)

Now lets test this on a few departments:

In [None]:
deptToLargestCourse("ECON")

In [None]:
deptToLargestCourse("CMPSC")

In [None]:
deptToLargestCourse("CLASS")

Seems to be working.  Let's try the apply function now.

In [None]:
largestCourseSizes = coursesToCount \
    .select("dept","ENROLLED") \
    .group("dept",max).sort("ENROLLED max",descending=True) \
    .with_column("Largest Enrollment Course", largestCourseSizes.apply(deptToLargestCourse, "dept"))
largestCourseSizes.show()

## Bar Graphs

We can now make some bar graphs with this data.

Here's a bar graph of the number of courses offered by each department:


In [None]:
counts = coursesToCount.group("dept")
counts

In [None]:
counts.barh("dept") # Bar graph of count, by department

Or we can make a bar graph by enrollment:

In [None]:
enrollment = coursesToCount.select("dept","ENROLLED").group("dept",sum).sort("ENROLLED sum",descending=True)
enrollment.show(10)

In [None]:
enrollment.barh("dept","ENROLLED sum")


## The rest of lecture

* A weird bug with `.show()`
* `print` vs. `return`
* `list` vs. `numpy.ndarray`
* Some slides about group, pivot and join
* Some Time to work on ic13 and ic14


## A weird bug!

Here's a weird bug that someone had with their code during office hours.

It came up in the context of lab02, so I've changed some of the details to not give away answers, but it's pretty similar.



In [None]:
courseTable = Table.read_table("data/ucsb-s25-courses.csv")

In [None]:
def courseIdToDept(courseId):
    return courseId[0:8].strip()

def courseIdToCourseNum(courseId):
    return courseId[8:13].strip()

def courseIdToSuffix(courseId):
    return courseId[14:].strip()

def isLecture(sectionNum):
    # if section number is divisible by 100, it's a lecture
    return sectionNum % 100 == 0

In [None]:
courses = courseTable.with_columns(
    "dept", courseTable.apply(courseIdToDept, 0),
    "courseNum", courseTable.apply(courseIdToCourseNum, 0),
    "suffix", courseTable.apply(courseIdToSuffix, 0),
    "isLecture", courseTable.apply(isLecture, "SECTION")
)


In [None]:
csLectures = courses \
   .where("dept",are.equal_to("CMPSC")) \
   .where("isLecture",are.equal_to(True)) \
   .where("STATUS",are.not_equal_to("Closed")).show(4) 

In [None]:
csLectures.group("INSTRUCTOR")

In [None]:
csLectures

Wait what?  What happened?

Here's what we had before:

In [None]:
csLectures = courses \
   .where("dept",are.equal_to("CMPSC")) \
   .where("isLecture",are.equal_to(True)) \
   .where("STATUS",are.not_equal_to("Closed")).show(4) 

In [None]:
csLectures

Why?!!  This leads us to explore "print" vs. "return"

## Print vs. Return

Let's look at two function definitions:

In [None]:
def addNumbers(a, b):
    return a + b

def sumNumbers(a, b):
    print(a + b)

Suppose we use call expressions with each of these:

In [None]:
addNumbers(3, 5)

In [None]:
sumNumbers(3, 5)

It looks like they are the same!  But they are not!

One way to see the difference is what happens when we assign the result.  Let's look at the values, and the types we get back:

In [None]:
result1 = addNumbers(3,5)
result1

In [None]:
type(result1)

In [None]:
result2 = sumNumbers(3,5)

In [None]:
result2

In [None]:
type(result2)

Let's summarize that so we don't have to keep scrolling up and down to see the results:

<table>
    <thead>
        <tr>
            <th>With Return</th>
            <th>With Print</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td markdown="1">
<pre>
def addNumbers(a, b):            
    return a + b
</pre>
            </td>
            <td markdown="1">
<pre>
def sumNumbers(a, b):         
    print(a + b)
</pre>
            </td>
        </tr>
        <tr>
            <td>return doesn't require ( ) </td>
            <td>print requires ( ) </td>
        </tr>
          <tr>
            <td>Assigning to variable <em>works</em>; the sum is <em>returned</em></td>
            <td>Assigning to a variable <em>doesn't work</em>; <code>None</code> is returned</td>
        </tr>
    </tbody>
</table>

## *Every* function returns a value, but it's sometimes `None`



One way to understand this more easily is if you memorize this rule:
* If a function encounters a `return` statement, the expression after the `return` is what we *get* as the *result* of the function call, i.e. the value we can assign to a variable.
* If a function *has not return* statement, then at the end, there is an implied statement like this:
  ```
  return None
  ```


As a result, this is what happens:
<table>
    <thead>
        <tr>
            <th>If you write this function:</th>
            <th>What you get is actually this function:</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td markdown="1">
<pre>
def sumNumbers(a, b):         
    print(a + b)
</pre>
            </td>
            <td markdown="1">
<pre>
def sumNumbers(a, b):         
    print(a + b)
    return None
</pre>
            </td>
        </tr>
        <tr>
            <td>A function with no explicit <code>return</code> statement... </td>
            <td>... turns into one with <code>return None</code> as the last line </td>
        </tr>
    </tbody>
</table>


That doesn't mean that functions that `return None` are bad, or useless!  

It just means we have to be aware of this behavior.

    



### The `.show()` function of the `Table()` object returns `None`

That's why this doesn't do what we want:

In [None]:
csLectures = courses \
   .where("dept",are.equal_to("CMPSC")) \
   .where("isLecture",are.equal_to(True)) \
   .where("STATUS",are.not_equal_to("Closed")).show(4) 
csLectures.barh("COURSEID","ENROLLED")

## What's the fix?

The fix is to remove the `.show(4)` method from the end of the chain.

We can chain the `.where(...)` method calls because each of them *returns* a new Table object.

But show(4) returns `None` so that's why we get the error `'NoneType' object has no attribute 'barh'`

Here's that code where we remove the `show(4)` from the end of the chain, and
put it on a separate line.

Now the bar graph works!

In [None]:
csLectures = courses \
   .where("dept",are.equal_to("CMPSC")) \
   .where("isLecture",are.equal_to(True)) \
   .where("STATUS",are.not_equal_to("Closed"))
csLectures.show(4)
csLectures.barh("COURSEID","ENROLLED")

# What if I wanted just a bar graph for first four courses?

So we had this code, and it worked, but what we got was:

* A display of the top four rows in the table
* A bar graph of ALL of the courses in the table

In [None]:
coursesToCount.labels

In [None]:
csLectures = coursesToCount \
   .where("dept",are.equal_to("CMPSC")) \
   .where("SECTION",is_lecture) \
   .where("STATUS",are.not_equal_to("Closed"))
csLectures.show(4)
csLectures.barh("COURSEID","ENROLLED")


What if we wanted a bar graph of just the first four rows in the table?

That's where we need to understand the difference between `show(4)` and `take([0,1,2,3])`

* `.show(4)` means "show me the first four rows in the table, but leave the table unchanged".
  - It's similar to `print` in that it just *displays* something on the *screen* but that's *all* it does.
  - And, after it displays, we get *nothing* back that we can assign or work with.
* `take([0,1,2,3])` by contrast, *gives us something back*: a new table with the rows 0, 1, 2 and 3.
  - Remember that the row numbers start at 0, not at 1.
  - So, the row numbers of the first four rows are 0,1,2,3

Here's what that looks like; notice a few things:
* Note that we *can* chain the take to the end of the chain of where's
* Note that we can assign the result to the variable `csLectures`
* And notice that when we use that variable with the `barh`, we get only the first four courses.

In [None]:
csLectures = coursesToCount \
   .where("dept",are.equal_to("CMPSC")) \
   .where("SECTION", is_lecture) \
   .where("STATUS",are.not_equal_to("Closed")) \
   .take([0,1,2,3])
csLectures.barh("COURSEID","ENROLLED")

## We often use `np.arange(...)` instead of a list of specific numbers

In this code, we used `[0,1,2,3]`, a list of four integers, to specify the row numbers.

#### But, it's more common to use `np.arange(4)` which gives us this automatically:

In [None]:
np.arange(4)

Compare these two ways of specifying *take the first four rows*:

<table>
    <thead>
        <tr>
            <th><code>take()</code> argument is <code>[0,1,2,3]</code></th>
            <th><code>take()</code> argument is <code>np.arange(4)</code></th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td >
<pre>
csLectures = courses \
   .where("dept",are.equal_to("CMPSC")) \
   .where("isLecture",are.equal_to(True)) \
   .where("STATUS",are.not_equal_to("Closed")) \
   .take([0,1,2,3])
csLectures.barh("COURSEID","ENROLLED")
</pre>
            </td>
            <td>
<pre>
csLectures = courses \
   .where("dept",are.equal_to("CMPSC")) \
   .where("isLecture",are.equal_to(True)) \
   .where("STATUS",are.not_equal_to("Closed")) \
   .take(np.arange(4))
csLectures.barh("COURSEID","ENROLLED")
</pre>
</td>
</tr>
</tbody>
</table>



Though this both do the same thing:
* The `np.arange(4)` is preferred because it gets right to the point: we want the first four rows.
* Coding is about ore than just getting the right result; it's about writing the code in a way that in the future, both *you* and *other people* can read the code and understand it.


## Why it is important to be able to understand the code

It's understandable that at first, you may just be focused on *getting the result* and not on whether code is readable/understandable.

But, readability/understandability are important for many reasons; here are two:
* _Reliability_: If people will be relying on your data, tables, and visualizations (charts and graphs), you want to be sure they are accurate. That means you need to be able to read the code to be *sure* it's working as intended.
* _Reuse_: You, or others, may need to do similar data analysis in the future, and you may want to look back at code you've already written to use it as a model for some new code.   If the code is hard to understand, that will be harder to accomplish.

### Back to print vs return



One of the advantages of functions that return values is that we can use the values they return in a new expression.

For example:

In [None]:
def addNumbers(a, b):
    return a + b

# Multiplies 2+3 times 3+4
# That is 5 times 7, giving 35
x = addNumbers(2,3) * addNumbers(3,4)
x

BUT this doesn't work if you try it with functions that just *print* the sum:

In [None]:
def sumNumbers(a, b):
    print(a + b)

# This does *not* multiply 2+3 times 3+4
# It prints 5 and prints 7
# But each part returns None and you can't multiply None * None
x = sumNumbers(2,3) * sumNumbers(3,4)
x

### Functions that return values can be nested:

In [None]:
x   = addNumbers(addNumbers(1,2), addNumbers(3,4))
# x = addNumbers(        3      ,       7.       )
# x =                     10
x

## Lists vs. Arrays

Let's compare two ways of grouping things in Python: lists and arrays

| Lists | Arrays |
|-------|--------|
| Built in to Python | Requires `import numpy as np` | 
| Create with `[2, 3, 4]` |  Create with `make_array(2, 3, 4) `
| `type` returns `list`      | `type` returns `np.ndarray` |
| Can contain things of different types | Must all be the same type; will be coerced to string if you try different types |



In [None]:
# Example of lists

things = ["UCSB", True, 1944, 10.0]
type(things)

In [None]:
print(f"type(things[0])={type(things[0])}")
print(f"type(things[1])={type(things[1])}")
print(f"type(things[2])={type(things[2])}")
print(f"type(things[3])={type(things[3])}")

In [None]:
nums = [4, 6, 23, 9]
nums

In [None]:
schools = ["UCSB", "UCSD", "UCLA"]
schools

In [None]:
# Examples of arrays

schools_array = make_array("UCSB", "UCSD", "UCLA")
type(schools_array)

In [None]:
# If you try to make a numpy.ndarray with things of different types
# they all become strings
things_array = make_array("UCSB", True, 1944, 10.0)
things_array

In [None]:
print(f"type(things_array[0])={type(things_array[0])}")
print(f"type(things_array[1])={type(things_array[1])}")
print(f"type(things_array[2])={type(things_array[2])}")
print(f"type(things_array[3])={type(things_array[3])}")

## What lists and arrays have in common

You can use either one to make a new column in a table. 

You can even mix and match:


In [None]:
abbreviations = ["UCSB","UCSD","UCLA"]
official_towns = make_array("Santa Barbara", "San Diego", "Los Angeles")
real_towns = ["Goleta", "La Jolla", "Westwood" ]

schools = Table().with_columns("Abbreviations",abbreviations,"Official",official_towns,"Real",real_towns)
schools

The main advantage of a numpy.ndarray is the math you can do with it

In [None]:
# This works
distances_in_miles = make_array(40, 28, 104)
times_in_minutes = make_array(60, 75, 98)
times_in_hours = times_in_minutes/60
times_in_hours

In [None]:
speeds_in_miles_per_hour = distances_in_miles / times_in_hours
speeds_in_miles_per_hour

### Divide a numpy.ndarray times a scalar

As you can see, we can treat numpy.ndarrays as *vectors*, i.e. a one-dimensional list of numbers.

A single number like `60` above is called a *scalar*.

When we divide a numpy.ndarray (vector) by a scalar, it "scales" the numbers like this:

```python
times_in_hours = times_in_minutes/60
```

So each element in `times_in_hours` is one of the elements of `times_in_minutes` divided by 60.

This also works with addition, subtraction and mutiplication:

In [None]:
nums = make_array(1, 2, 4)
bigger_nums = nums + 1
bigger_nums

In [None]:
smaller_nums = nums - 1
smaller_nums

In [None]:
much_bigger_nums = nums * 100
much_bigger_nums

In [None]:
### These same tricks do NOT work on plain old python lists

In [None]:
nums_list = [1, 2, 4]
bigger_nums_list = nums_list + 1
bigger_nums_list

In [None]:
# This works, but it doesn't do what you think it does
# (at least not if you expect to get back [3,6,13]
nums_list = [1, 2, 4]
bigger_nums_list = nums_list * 3
bigger_nums_list

That's right, multiplying a list by an integer just repeats the list multiple times.

So, lists are fine, but for doing math and calculations, `numpy.ndarrays` are often more convenient.