# INFO 3350/6350

## Lecture 02(a): Python refresher

## To do:

* Section on Friday (required)
  * Introductions, setup, and HW 01
* Readings for next week (see schedule on GitHub). Expect to spend several hours on these every week.
* Remaining from lecture 01:
  * Introductions
  * How to ask good questions, and where to ask them

## The Jupyter notebook system

First, activate your virtual environment, then start `jupyter lab`. *In Jupyter Lab*, open the notebook(s) you want.

![Jupyter system schematic](https://docs.jupyter.org/en/latest/_images/notebook_components.png)

[Source](https://docs.jupyter.org/en/latest/projects/architecture/content-architecture.html)

## Environments and data types

Underlying every computer program is an *environment*, which maps variable names to values.

This environment starts out empty. We can add to it by defining a variable.

In [1]:
x = 3

## Magics

Magics are special commands that work **in notebooks**, but **not Python in general**. Magics always start with `%`.

What's defined in our environment?

In [2]:
# list variable names
%who

x	 


In [3]:
# list variable names/types/values
%whos

Variable   Type    Data/Info
----------------------------
x          int     3


FYI, in pure Python, you'd use `dir()` or `locals()` ...

In [4]:
locals()

{'__name__': '__main__',
 '__doc__': 'Automatically created module for IPython interactive environment',
 '__package__': None,
 '__loader__': None,
 '__spec__': None,
 '__builtin__': <module 'builtins' (built-in)>,
 '__builtins__': <module 'builtins' (built-in)>,
 '_ih': ['',
  'x = 3',
  "# list variable names\nget_ipython().run_line_magic('who', '')",
  "# list variable names/types/values\nget_ipython().run_line_magic('whos', '')",
  'locals()'],
 '_oh': {},
 '_dh': [PosixPath('/Users/mwilkens/Documents/Code/info3350-s22/lectures')],
 'In': ['',
  'x = 3',
  "# list variable names\nget_ipython().run_line_magic('who', '')",
  "# list variable names/types/values\nget_ipython().run_line_magic('whos', '')",
  'locals()'],
 'Out': {},
 'get_ipython': <bound method InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x107b491b0>>,
 'exit': <IPython.core.autocall.ZMQExitAutocall at 0x107b4ba60>,
 'quit': <IPython.core.autocall.ZMQExitAutocall at 0x107b4ba60>,
 

A single percent sign runs the magic on a single line. Double percents (`%%`) attaches the magic to the whole cell.

Try some other magics, especially ones useful for dev and debugging: `%%time`, `%%prun`

In [5]:
%%time
import time
for i in range(3):
    print('hello, world!')
    time.sleep(1)

hello, world!
hello, world!
hello, world!
CPU times: user 5.59 ms, sys: 2.3 ms, total: 7.88 ms
Wall time: 3.01 s


In [6]:
%whos

Variable   Type      Data/Info
------------------------------
i          int       2
time       module    <module 'time' (built-in)>
x          int       3


## More variables, more types

In [7]:
# examine the type of a literal or a variable
type((1,2))

tuple

In [8]:
type(3)

int

In [9]:
type(3.)

float

In [10]:
type(x)

int

In [11]:
type('shakespeare')

str

In [12]:
# a dictionary
book_dict =  {
    'author': ['Shakespeare', 'Morrison', 'Bolaño'],
    'title': ['King Lear', 'Beloved', '2666'],
    'year': [1606, 1987, 2004],
    'words': [10000, 100000, 300000]
}
type(book_dict)

dict

In [13]:
# retrieve dictionary content by key
book_dict['author']

['Shakespeare', 'Morrison', 'Bolaño']

Note that the line above returns a list. You can operate on `book_dict['author']` *as a list*. This tends to confuse people.

In [14]:
# index into returned object
book_dict['author'][1]

'Morrison'

Strings are indexable, too ...

In [15]:
book_dict['author'][1][2]

'r'

In [16]:
%whos

Variable    Type      Data/Info
-------------------------------
book_dict   dict      n=4
i           int       2
time        module    <module 'time' (built-in)>
x           int       3


In [17]:
y = 5

In [18]:
%whos

Variable    Type      Data/Info
-------------------------------
book_dict   dict      n=4
i           int       2
time        module    <module 'time' (built-in)>
x           int       3
y           int       5


## What-ifs: cell order, division

In [19]:
# dynamic type conversion
# ints -> float
x/y

0.6

In [20]:
# integer division
x//y

0

In [21]:
999//1000

0

In [22]:
# modulo
# returns the remainder following division
999%500

499

Try running the **first** `%whos` cell again. What's the output?

Note that you can execute code cells out of the order in which they are written. This will return the result of running that code *on the current contents of the environment*, not the environment as it existed (or would exist) had you run all the cells in order.

Out-of-order cell execution is *really* convenient when you're developing your code. It lets you experiment and easily observe the effects of small changes. But it can also get you in trouble, because you might be operating on data (or using functions) that are inconsistent with what a straight read of the code would suggest. If you run into notably insoluable problems in your notebook, you might try selecting Kernel -> Restart Kernel and Run All Cells ... from the menu. This will guarantee that the machine state matches the visible order of execution.

In [23]:
%whos

Variable    Type      Data/Info
-------------------------------
book_dict   dict      n=4
i           int       2
time        module    <module 'time' (built-in)>
x           int       3
y           int       5


## Operating on lists

In [24]:
book_dict

{'author': ['Shakespeare', 'Morrison', 'Bolaño'],
 'title': ['King Lear', 'Beloved', '2666'],
 'year': [1606, 1987, 2004],
 'words': [10000, 100000, 300000]}

In [25]:
# Python lists can contain arbitrary types
python_list_int = [1, 2, 3]
python_list_mix = [1, 'g', (3.7, 2)]

In [26]:
# multiply a list by an int
python_list_mix * 3

[1, 'g', (3.7, 2), 1, 'g', (3.7, 2), 1, 'g', (3.7, 2)]

In [27]:
# multiply a list by an int
python_list_int * 3

[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [28]:
# for loop
result = []
for i in python_list_int:
    result.append(i*3)
result

[3, 6, 9]

A **list comprehension** is a neat one-liner to replace an explicit `for` loop.

In [29]:
# list comprehension
[i*3 for i in python_list_int]

[3, 6, 9]

## NumPy

NumPy (or numpy, because lazy) is a library for optimized mathematical operations in Python. We'll use it on occasion (it's fast and sometimes very convenient, especially for matrix operations), though we'll lean more heavily on Pandas (see below), which generally wraps Numpy in a lot of syntactic convenience.

In [30]:
import numpy as np
numpy_array = np.array([1, 2, 3])
numpy_array

array([1, 2, 3])

In [31]:
type(numpy_array)

numpy.ndarray

In [32]:
numpy_array[0]

1

A Numpy array is similar to a list (ordered, iterable), but it's optimized for math. Numpy arrays are computable objects, on which we can perform *vectorized*, *broadcast* operations (one operation is performed on every element of the input array).

In [33]:
# array * int
numpy_array * 1000000

array([1000000, 2000000, 3000000])

Two important differences between numpy arrays and Python lists, from the [docs](https://numpy.org/doc/stable/user/whatisnumpy.html):

> * NumPy arrays have a fixed size at creation, unlike Python lists (which can grow dynamically). **Changing the size of an ndarray will create a new array and delete the original.** 
    * [Note: This means that *appending* to numpy arrays is generally *very* slow. *Modifying* the elements of an array without changing its length is fast. You really want to preallocate your numpy arrays.]
> * The elements in a NumPy array are all required to be of the same data type, and thus will be the same size in memory. The exception: one can have arrays of (Python, including NumPy) objects, thereby allowing for arrays of different sized elements.

It would be nice to have the speed of numpy with an overlay of convenience. So ... Pandas!

You can pre-allocate a numpy array using `zeros`, `ones`, `empty`, or one of a few other numpy functions.

In [34]:
# allocate an array
np.zeros(5)

array([0., 0., 0., 0., 0.])

In [35]:
# allocate an array, then set an element to a value
data_array = np.zeros(5)
data_array[0] = 100
data_array

array([100.,   0.,   0.,   0.,   0.])