<a href="https://colab.research.google.com/github/veyselberk88/Data-Science-Tools-and-Ecosystem/blob/main/lec05.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="./ccsf.png" alt="CCSF Logo" width=200px style="margin:0px -5px">

# Lecture 05: Sequences

Associated Textbook Sections: [5.0, 5.1, 5.2, 5.3](https://ccsf-math-108.github.io/textbook/chapters/05/Sequences.html)

---

## Overview

* [Sequences](#Sequences)
* [Beyond the Python Library](#Beyond-the-Python-Library)
* [Arrays](#Arrays)
* [Indexing Sequences](#Indexing-Sequences)
* [NumPy](#NumPy)

## Set Up the Notebook

In [None]:
from datascience import *
import numpy as np

---

## Sequences

A sequenced data type represents an ordered collection

<img src="./sequence_blocks.png"  alt="A sequence visualized a collection of blocks" width = 40%>


---

### Built-In Sequence Data Types

There are several built-in sequence data types in Python such as:
*  Strings (`str`):
    * A text sequence of characters
    * `"data science"`
*  Lists (`list`):
    * A sequence of a mixture of data types
    *  `['a', 1, max]`
*  Ranges (`range`):
    * A sequence of numbers
    * `range(10)`

---

### Demo: Built-In Sequence Types

Create a string and notice the length of a string includes the blank space.

In [None]:
a_string = "data science"
a_string

'data science'

In [None]:
len(a_string)

12

---

Create a list, notice it contains a mixture of data types, and check it's length

In [None]:
a_list = ["a",2,max]
a_list

['a', 2, <function max>]

In [None]:
len(a_list)

3

In [None]:
type(a_list)

list

---

Create a range from 0 up to (but not including) 10.

In [None]:
a_range = range(10)
a_range

range(0, 10)

In [None]:
list(a_range)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [None]:
len(a_range)

10

---

Convert the range to a list to see the items.

In [None]:
...

---

## Beyond the Python Library

### Modules, Libraries, and Packages

* Modules, libraries, and packages are collections of Python code
    * Python has a built-in module called `math` that contains a collection of mathematical functions and values
    * `datascience` is a package written by staff at UC Berkeley for this course
    * `Matplotlib` is a standard data visualization library
* The `import` command is a way to load modules, libraries, and packages into the coding environment
    * `from datascience import *` imports everything from the `datascience` package
    * `import numpy as np` imports the NumPy package and provides it with the common shorter name (alias) `np`

---

## Arrays

---

### Arrays

* An array (`numpy.ndarray`) is a sequenced type from the `NumPy` package
* All elements in an array have the same data type
* Math operations work on each element of the array separately
* When adding arrays, elements are added one by one (if they have the same length)
* They're better than lists for handling big datasets efficiently
* We'll frequently use arrays instead of lists in this course
* You can make an array with the `make_array` function from the `datascience` package

---

### Demo: Arrays

Create an array using `make_array`.

In [None]:
my_array = make_array(1,2,3,4)
my_array

array([1, 2, 3, 4])

In [None]:
type(my_array)


numpy.ndarray

---

Explore using array arithmetic and notice that the operations create a new array.

In [None]:
my_array*2

array([2, 4, 6, 8])

In [None]:
my_array**2

array([ 1,  4,  9, 16])

In [None]:
my_array+1

array([2, 3, 4, 5])

In [None]:
# my_array is unchanged
my_array

array([1, 2, 3, 4])

---

Use some functions such as `len` and `sum` on an array

In [None]:
len(my_array)


4

In [None]:
# Built-in sum
sum(my_array)

10

In [None]:
# NumPy sum - optimized for arrays
np.sum(my_array)

10

---

There are some rules for array arithmetic. For example, you can only add arrays of the same length.

In [None]:
another = make_array(10,20,30,40)

In [None]:
my_array+another

array([11, 22, 33, 44])

In [None]:
yet_another = make_array(100,200,300)

In [None]:
# A ValueError
my_array + yet_another

ValueError: operands could not be broadcast together with shapes (4,) (3,) 

In [None]:
np.sum(another)/len(another)

25.0

---

You can make an array with non-numeric data types.

In [None]:
tunas_array = make_array("bluefin","albacore","jim")
tunas_array

array(['bluefin', 'albacore', 'jim'],
      dtype='<U8')

---

## Indexing Sequences

---

### Indexes

<img src="./sequence_blocks_with_indices.png"  alt="A sequence visualized a collection of blocks with the index values included" width = 45%>

Sequence data types have a variety of ways to access each item in the sequence:
* A standard way: Use the item's position number in the sequence (index) with bracket `[...]` notation
    * Indices start with `0`
* Array way specific in MATH 108: NumPy arrays in our class can be indexed with the `.item()` method

---

### Demo: Indexing

Get the first character from a string.

In [None]:
another_string = "San Francisco"
another_string

'San Francisco'

In [None]:
another_string[0]

'S'

---

Get the last character from a string.

In [None]:
another_string[-1]

'o'

In [None]:
# An IndexError
another_string[13]

IndexError: string index out of range

In [None]:
another_string[12]

'o'

In [None]:
# Another way to access the last item in a sequence.
another_string[-1]

'o'

In [None]:
# And another way ...
num_items = len(another_string)
another_string[num_items-1]

'o'

---

Access items in an array using .item and bracket notation.

In [None]:
an_array = make_array('eats', 'shoots', 'leaves')
an_array

array(['eats', 'shoots', 'leaves'],
      dtype='<U6')

In [None]:
an_array[0]

'eats'

In [None]:
an_array.item(0)

'eats'

In [None]:
an_array[-1]

'leaves'

In [None]:
an_array.item(-1)

'leaves'

---

Be careful! Check your data types because sometimes you don't get what you expect.

In [None]:
another_array = make_array(10, 20, 30)
another_array

array([10, 20, 30])

In [None]:
type(another_array[0])

numpy.int64

In [None]:
type(another_array.item(0))

int

---

## NumPy

---

### NumPy Functions

* NumPy is a Python library with a collection of tools optimized to work with arrays
* In our reference material, array tools commonly start with `np`
* One function `np.arange` is really helpful for generating arrays of numbers
    * General command: `np.arange(start, stop, step)`
    * the `start` and `step` values have a default of 0 and 1, respectively
    * `np.arange(5)` creates the array `array([0, 1, 2, 3, 4])`
    * `np.arange(1, 5)` creates the array `array([1, 2, 3, 4])`
    * `np.arange(1, 5, 2)` creates the array `array([1, 3])`
    * `np.arange` is not the same as `range`

---

### Demo: NumPy Functions

Generate arrays with `arange`.

In [None]:
np.arange(0,200,30)


array([  0,  30,  60,  90, 120, 150, 180])

In [None]:
type(np.arange(100))


numpy.ndarray

In [None]:
my_array = np.arange(0,200,30)
my_array

array([  0,  30,  60,  90, 120, 150, 180])

---

Demonstrate a few NumPy functions, attributes, and methods.

In [None]:
np.average(my_array)

90.0

In [None]:
np.diff(my_array)

array([30, 30, 30, 30, 30, 30])

In [None]:
np.cumsum(my_array)

array([  0,  30,  90, 180, 300, 450, 630])

In [None]:
# an array attribute
my_array.size

7

In [None]:
# an array method
my_array.min()

0

---

## Attribution

This content is licensed under the <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)</a> and derived from the <a href="https://www.data8.org/">Data 8: The Foundations of Data Science</a> offered by the University of California, Berkeley.

<img src="./by-nc-sa.png" width=100px>