In [None]:
from datascience import *
import numpy as np

# Arrays

An array has to have elements of the same type.

In [None]:
my_arr = make_array(42.5, 5.11, 777)
my_arr

If we create an array of mixed types, Python will automatically force them all to be the same type. If an array contains even a single string, the entire resulting array will contain strings. 

In [None]:
my_arr = make_array(42, 5, 11, '8')
my_arr

Adding a single number to an array adds it to every element in that array. If an array contains a single element, it is as if that element is just a single value. We can verify it by creating a new array of integers and another array with a single integer.

In [None]:
arr2 = make_array(1, 5, 10, 0)
arr2

In [None]:
5 + arr2

In [None]:
arr1 = make_array(5)

In [None]:
arr1 + arr2

# Tables

`Table()` will return an empty table. Since we rarely want just an empty table, we usually use it together with the `with_columns` function to add column names ("labels") and corresponding values to our table.

In [None]:
empty_tbl = Table()
empty_tbl

In [None]:
type(empty_tbl)  # the type is supposed to be datascience.tables.Table

We can either use `Table()` again to create a new empty table or we can start with an empty table that we just stored in `empty_tbl`. Notice that if we use an existing table, we do not need to use the `Table()` function.

In [None]:
Table().with_columns('Tasks', make_array("Finish INT5 lab", "Do the dishes", "Buy snacks"))

In [None]:
empty_tbl

Let's save this table into `my_todo` and add a couple more columns to it to keep track of tasks' priorities and whether we've done them. Notice that we can either use `empty_tbl`, which we already created or `Table()` to create an empty table.

In [None]:
my_todo = Table().with_columns(
                    'Tasks', make_array("Finish INT5 lab", "Do the dishes", "Buy snacks"), # first column
                    'Priority', make_array(5, 3, 2), # second column
                    'Done', make_array(0, 1, 0) # third column
)

In [None]:
my_todo

We could have alternatively, created the arrays separately, and then added them to the table.

In [None]:
tasks = ...
tasks

For each task above, let's add its priority: the higher the number, the more important the task.

In [None]:
priority_values = ...
priority_values

Let's indicate if a task is done `1` or not `0`.

In [None]:
doneness = ...
doneness

The table below looks just like the one we've created above.

In [None]:
empty_tbl.with_columns(
                     # first column
                     # second column
                     # third column
)

We also could have added the columns one at a time, instead of all at once. Notice, that we'll need to save the intermediate tables to be able to add more columns to them later. We'll start with the empty `todo` table but we also could have directly used the command that created it (`Table()`).

In [None]:
empty_tbl # empty table we created above

In [None]:
todo_tasks = ... # first column
todo_tasks

We are now going to add `priority_values` to the new table we just created (`todo_tasks`). Notice that instead of using the empty `todo` table to which we are adding columns, we are using `todo_tasks` to add the new column to it.

In [None]:
todo_tasks_pr = ... # second column
todo_tasks_pr

If we used just `todo` or `Table()` with the above command, we would have been adding a new column to an **empty table** and would have gotten back a table with just one column.

In [None]:
empty_tbl.with_columns('Priority', priority_values) 
# same as
Table().with_columns('Priority', priority_values) # preferred, since it is more explicit

Now, let's add the last column to our table to indicate if a task is done `1` or not `0`.

In [None]:
todo_all = ... # third column
todo_all

Since the commands work with the result of the previous commands (i.e., adding to the table that was previously created), we could also chain them together to achieve the same effect but without the need for the names of intermediate tables.

In [None]:
Table().with_columns('Tasks', tasks) #.with_columns('Priority', priority_values)

Note that `.num_rows`, `.num_columns`, and `.labels` are the only commands that we use that do not have `()` after them.

In [None]:
todo_all.num_rows

In [None]:
todo_all.labels

`relabeled` function leaves the original table intact and returns a new table with the specified column relabeled to a new name. To save that table, store it in a new variable name.

In [None]:
todo_all.relabeled('Priority', 'Stars')

In [None]:
todo_all.labels

In [None]:
todo = ...
todo

In [None]:
task_stars = todo.select("Task")  # what happens if you mispsell the column name?
task_stars = todo.select("Tasks", "Stars")
task_stars

Average priority of the tasks on my list.

In [None]:
sum( todo.column("Stars") ) / todo.num_rows

Instead of manually computing the average, we can use a `numpy` method `average` or `mean`. We need to make sure we prefix the function with `np.` and give it **an array** as an input.

In [None]:
np.mean(todo.column("Stars"))

In [None]:
np.average(...)

In [None]:
todo.column("Stars").max()

In [None]:
max(todo.column("Stars"))

Select the top 2 most important tasks... They are not in any order in the table, so we first should sort them.

Let's put the most important tasks at the top of the table.

In [None]:
todo.sort("Stars")

In [None]:
todo_sorted = ...
todo_sorted

In [None]:
todo_sorted.take(...)

Notice that we have a task that we have finished is on the list. What if we want to get only the most important unfinished tasks? We'll need to exclude the "done" tasks first, then take the first two rows from the _sorted_ table.

In [None]:
unfinished_tasks = todo.where("Done", are.equal_to(0))
unfinished_tasks

What good is a to-do list if we cannot add more tasks to it?

In [None]:
new_task = ['Practice table functions', 4, 0]
todo.with_row(new_task)

Perhaps, we decided that the priority column is not necessary for us and everything we add to the list is important. We can remove that column by using a `drop` function, which needs to know which column you want to drop.

In [None]:
todo.drop("Stars")
# todo.drop("Stars", "Done") # Can we drop more than one column?

As usual, the command doesn't modify the original table, so we need to save the resulting table with a new name if we want to keep using it.

In [None]:
todo

# Reading a table from a CSV file

In [None]:
minard = Table().read_table("data/minard.csv")

In [None]:
minard

In [None]:
minard.select('Survivors') # returns a TABLE

In [None]:
minard.column('Survivors') # returns a array, since everything inside the column has to have the same type

Now, you shouldn't be surprised to see that the line below results in an error.

In [None]:
sum(minard.select('Survivors'))/minard.num_rows 

In [None]:
sum(minard.column('Survivors'))/minard.num_rows

# Visualizing Data

In [None]:
import matplotlib
matplotlib.use('Agg', warn=False)
%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')

Let's look at the table and sort it by latitudes. How big is the difference between them?

In [None]:
minard.sort("Latitude")

Let's get all the latitudes into an array.

In [None]:
lat = minard.column("Latitude")
lat

In [None]:
min_lat = lat.min()
# same as min(lat)
min_lat

In [None]:
max_lat = lat.max()
# same as max(lat)
max_lat

In [None]:
max_lat - min_lat

Do the same for the Longitude to see which range is greater.

We can also visualize them on a scatterplot, showing us the number of survivors at each recorded latitude.

In [None]:
minard.scatter("Latitude", "Survivors")

In [None]:
minard.scatter("Longitude", "Survivors")