<a href="https://colab.research.google.com/github/veyselberk88/Data-Science-Tools-and-Ecosystem/blob/main/lec06.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="./ccsf.png" alt="CCSF Logo" width=200px style="margin:0px -5px">

# Lecture 06: Tables

Associated Textbook Sections: [3.4](https://ccsf-math-108.github.io/textbook/chapters/03/4/Introduction_to_Tables.html)

---

## Overview

* [Tables](#Tables)
* [Attributes and Properties](#Attributes-and-Properties)
* [Some Table Methods](#Some-Table-Methods)

---

## Set Up the Notebook

In [None]:
from datascience import *
import numpy as np

---

## Tables

---

### Early Beginnings

<a href="https://academic.oup.com/book/4975/chapter-abstract/147431903" title="Tables and tabular formatting in Sumer, Babylonia, and Assyria, 2500 bce–50 ce"><img src="./Shuruppag_data_table.jpeg" alt="The world’s oldest datable mathematical table, from Shuruppag, c. 2600 BCE.  The first two columns contain identical lengths in descending order from 600 to 60 rods (c. 3600–360 m) and the final column contains the square area of their product" width=40%></a>

Ancient Mesopotamia (modern-day Iraq):
* Sumer, Babylonia, and Assyria had clay tablets from around 2600-1600 BCE that provide examples of some of the earliest recorded numerical tables
* The tablets demonstrate their proficiency in recording mathematical and astronomical data

The image above shows the world's oldest dateable mathematical table (on record), from the Sumerian city of Shuruppag, c. 2600 BCE.  The first two columns contain identical lengths in descending order from 600 to 60 rods (c. 3600-360 m) and the final column contains the square area of their product

---

### Table Structure

* The `datascience` library contains a data type called a `Table`.
* A `Table` is a sequence of labeled columns
* Each row represents one individual
* Data within a column represents one attribute of the individuals

<img src="./table_structure.png" alt="A table with the columns and rows indicated." width=50%>

---

### Loading Data

* Data analysis usually includes connecting to various data sources
* We focus on loading data from CSV files
    * Comma Separated Values
    * `.csv` extension
* The `Table.read_table` function will load the contents of a CSV into the notebook as a `Table`.

---

### Demo: Loading Data

<a href="https://upload.wikimedia.org/wikipedia/commons/b/bb/US_States_by_Total_Area.svg" title="US States by Total Area"><img src="US_States_by_Total_Area.svg" alt="US states by area" width=40%></a>

Import data in `states_area.csv` that contains land and water area data sourced form [Wikipedia](https://en.wikipedia.org/wiki/List_of_U.S._states_and_territories_by_area).

In [None]:
states = Table.read_table('states_area.csv')
states

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
Alaska,665384.0,570641.0,94743.1
Texas,268596.0,261232.0,7364.75
California,163695.0,155779.0,7915.52
Montana,147040.0,145546.0,1493.91
New Mexico,121590.0,121298.0,292.15
Arizona,113990.0,113594.0,396.22
Nevada,110572.0,109781.0,790.65
Colorado,104094.0,103642.0,451.78
Oregon,98378.5,95988.0,2390.53
Wyoming,97813.0,97093.1,719.87


In [None]:
type(states)

datascience.tables.Table

---

## Attributes and Properties

A `Table` has information that we can access by command. For example:
* `t.labels` - the labels of a table called `t`
* `t.num_columns` - the number of columns in `t`
* `t.num_rows` - the number of rows in `t`
* `t.rows` - a collection of all the rows in `t`

---

### Demo: Attributes and Properties

There are various attributes of a table that you can access as well using the dot notation such as `labels`, `num_columns`, and `num_rows`.

In [None]:
states.labels

('State', 'Total Area (sq mi)', 'Land Area (sq mi)', 'Water Area (sq mi)')

In [None]:
states.num_columns

4

In [None]:
states.num_rows

50

In [None]:
states.rows

Rows(State      | Total Area (sq mi) | Land Area (sq mi) | Water Area (sq mi)
Alaska     | 665384             | 570641            | 94743.1
Texas      | 268596             | 261232            | 7364.75
California | 163695             | 155779            | 7915.52
Montana    | 147040             | 145546            | 1493.91
New Mexico | 121590             | 121298            | 292.15
Arizona    | 113990             | 113594            | 396.22
Nevada     | 110572             | 109781            | 790.65
Colorado   | 104094             | 103642            | 451.78
Oregon     | 98378.5            | 95988             | 2390.53
Wyoming    | 97813              | 97093.1           | 719.87
... (40 rows omitted))

---

In [None]:
states.rows[15] #example to print the 15th row of the table

Row(State='Nebraska', Total Area (sq mi)=77347.809999999998, Land Area (sq mi)=76824.169999999998, Water Area (sq mi)=523.63999999999999)

## Some Table Methods

There is a collection of methods (functions) associated with every `Table` created. For example:
* `t.show(n)` - displays the first `n` rows of a table called `t`
* `t.select(label)` - constructs a new table with just the specified columns
* `t.drop(label)` - constructs a new table in which the specified columns are omitted
* `t.sort(label)` - constructs a new table with rows sorted by the specified column
* `t.where(label, condition)` - constructs a new table with just the rows that match the condition
    * Initially, the `condition` will be made up using Predicates such as `are.above`, `are.equal_to`, etc.
* More can be found in the [`datascience` documentation](https://datascience.readthedocs.io/en/master/tables.html)

---

### Demo: `show`

Explore the `show` table method.

In [None]:
states.show(5)

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
Alaska,665384,570641,94743.1
Texas,268596,261232,7364.75
California,163695,155779,7915.52
Montana,147040,145546,1493.91
New Mexico,121590,121298,292.15


In [None]:
states.show(3)

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
Alaska,665384,570641,94743.1
Texas,268596,261232,7364.75
California,163695,155779,7915.52


In [None]:
# show(3) does not produce a Table
states_show_3 = ...
type(states_show_3)

---

### Demo: `select`

Use the `select` table method to select columns by column labels and column indexes.

In [None]:
y=states.select('State')
y

State
Alaska
Texas
California
Montana
New Mexico
Arizona
Nevada
Colorado
Oregon
Wyoming


In [None]:
states.select(0, 1)

State,Total Area (sq mi)
Alaska,665384.0
Texas,268596.0
California,163695.0
Montana,147040.0
New Mexico,121590.0
Arizona,113990.0
Nevada,110572.0
Colorado,104094.0
Oregon,98378.5
Wyoming,97813.0


In [None]:
# A NameError where the column name was used incorrectly.
# states.select(State, 'Total Area (sq mi)')

In [None]:
# A ValueError where the column name was used incorrectly.
# states.select('State', 'Total Area')

---

### Demo: `drop`

Use the `drop` table method to drop columns by name and by index.

In [None]:
states

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
Alaska,665384.0,570641.0,94743.1
Texas,268596.0,261232.0,7364.75
California,163695.0,155779.0,7915.52
Montana,147040.0,145546.0,1493.91
New Mexico,121590.0,121298.0,292.15
Arizona,113990.0,113594.0,396.22
Nevada,110572.0,109781.0,790.65
Colorado,104094.0,103642.0,451.78
Oregon,98378.5,95988.0,2390.53
Wyoming,97813.0,97093.1,719.87


In [None]:
states.drop('Land Area (sq mi)')

State,Total Area (sq mi),Water Area (sq mi)
Alaska,665384.0,94743.1
Texas,268596.0,7364.75
California,163695.0,7915.52
Montana,147040.0,1493.91
New Mexico,121590.0,292.15
Arizona,113990.0,396.22
Nevada,110572.0,790.65
Colorado,104094.0,451.78
Oregon,98378.5,2390.53
Wyoming,97813.0,719.87


---

### Demo: `sort`

Use the `sort` table method to sort the data in the table by a certain column.

In [None]:
states.sort('Total Area (sq mi)')

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
Rhode Island,1544.89,1033.81,511.07
Delaware,2488.72,1948.54,540.18
Connecticut,5543.41,4842.36,701.06
New Jersey,8722.58,7354.22,1368.36
New Hampshire,9349.16,8952.65,396.51
Vermont,9616.36,9216.66,399.71
Massachusetts,10554.4,7800.06,2754.33
Hawaii,10931.7,6422.63,4509.09
Maryland,12405.9,9707.24,2698.69
West Virginia,24230.0,24038.2,191.83


In [None]:
states.sort('Total Area (sq mi)').show(3)

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
Rhode Island,1544.89,1033.81,511.07
Delaware,2488.72,1948.54,540.18
Connecticut,5543.41,4842.36,701.06


In [None]:
states.sort('Total Area (sq mi)', descending=True ).show(5)

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
Alaska,665384,570641,94743.1
Texas,268596,261232,7364.75
California,163695,155779,7915.52
Montana,147040,145546,1493.91
New Mexico,121590,121298,292.15


In [None]:
states.sort('Total Area (sq mi)', descending=False ).show(3)

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
Rhode Island,1544.89,1033.81,511.07
Delaware,2488.72,1948.54,540.18
Connecticut,5543.41,4842.36,701.06


In [None]:
#help(states.sort)

---

### Demo: `where`

Use the `where` table method to filter the data in the table.

In [None]:
# Nothing Filtered
states

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
Alaska,665384.0,570641.0,94743.1
Texas,268596.0,261232.0,7364.75
California,163695.0,155779.0,7915.52
Montana,147040.0,145546.0,1493.91
New Mexico,121590.0,121298.0,292.15
Arizona,113990.0,113594.0,396.22
Nevada,110572.0,109781.0,790.65
Colorado,104094.0,103642.0,451.78
Oregon,98378.5,95988.0,2390.53
Wyoming,97813.0,97093.1,719.87


In [None]:
states.where('State', are.equal_to('California'))

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
California,163695,155779,7915.52


In [None]:
states.where('Land Area (sq mi)', are.above(100_000))

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
Alaska,665384,570641,94743.1
Texas,268596,261232,7364.75
California,163695,155779,7915.52
Montana,147040,145546,1493.91
New Mexico,121590,121298,292.15
Arizona,113990,113594,396.22
Nevada,110572,109781,790.65
Colorado,104094,103642,451.78


In [None]:
states.where('State', are.containing('New'))

State,Total Area (sq mi),Land Area (sq mi),Water Area (sq mi)
New Mexico,121590.0,121298.0,292.15
New York,54555.0,47126.4,7428.58
New Hampshire,9349.16,8952.65,396.51
New Jersey,8722.58,7354.22,1368.36


---

## Attribution

This content is licensed under the <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)</a> and derived from the <a href="https://www.data8.org/">Data 8: The Foundations of Data Science</a> offered by the University of California, Berkeley.

<img src="./by-nc-sa.png" width=100px>