# Agenda

1. Data structures
    - Built-in data structures, and how they work
    - Alternatives to these in the standard library
    - The `collections` modules -- variations on dictionaries
2. Functions
3. Functional programming
4. Modules and packages
5. Objects
6. Iterators and generators
7. Decorators
8. Concurrency

# gitautopush

It's a package on PyPI

You can install it with

```
pip install gitautopush
```

Then just run it in a directory:

```
gitautopush .
```

Every minute or so, it checks if any files have changed. If so, it runs `git commit -a` and then `git push`.

# Data structures -- builtin data structures

# `None`

There are no variable declarations in Python! 

In [1]:
# in assignment, the right side runs before the left side

x = 100   # creates the x variable, and then assigns 100 to it
type(x)

int

In [2]:
x = 'abcd'  # x already exists, but now we assign 'abcd' to it
type(x)

str

In [3]:
x = None
type(x)

NoneType

In [4]:
# how can I check if x is None?

if x == None:   # unfortunately, this works! -- this code is un-Pythonic
    print('Yes, it is None!')

Yes, it is None!


# Comparing with `None`

`None` is a singleton -- no matter how many instances you think you've created, there's only one instance of `None` in all of Python.

PEP8 tells us that if we're comparing with a singleton, we should not use `==`.  Rather, we should use `is`.

- `==` asks: Are the two objects equivalent in value?
- `is` asks: Are these two objects the same object?

`is` compares the object IDs.


In [6]:
id(x)  # this number is the address in memory of the object that x is referring to

4481073152

In [7]:
y = None
id(y)

4481073152

In [8]:
# since id(x) and id(y) are the same, they must be two names referring to the same object
# thus, they are not only ==, but they are is:

x is y

True

In [9]:
x = [10, 20, 30]  # I've created a list
y = [10, 20, 30]  # I've created another list

# both of these lists have the same values. But they are *not* the same list in memory.
# they're two separate objects, two instances of "list"

# in the case of a Singleton object, there is only one instance.
# each time you try to create a new instance, you just get a copy of the existing one.

type(None)

NoneType

In [10]:
x = type(None)()   # creating a new instance of None
y = type(None)()   # creating a new instance of None

In [11]:
id(x)

4481073152

In [12]:
id(y)

4481073152

In [13]:
x is y

True

# `None` vs. `NaN` vs. other similar things



In [14]:
import numpy as np
import pandas as pd

In [15]:
type(np.nan)  # "not a number"

float

In [17]:
type(np.NaN)

float

In [18]:
# can I compare None to itself?
None == None

True

In [19]:
None is None

True

In [23]:
# are these objects' at the same location in memory -- i.e., are they exactly the same object?
np.nan is np.nan

True

In [24]:
# is np.nan's value equal to itself...

# nan is a float that isn't equal to itself!
np.nan == np.nan

False

In [27]:
# this is new in Pandas, and it's supposed to (eventually) take over from np.nan
# it'll work with the new pandas string types and other types
pd.NA

<NA>

In [28]:
# np.isnan # this checks if something is equal to nan, *not* using == 

a = np.array([10, 20, 30, np.nan, 50, 60])
a

array([10., 20., 30., nan, 50., 60.])

In [29]:
a[a==np.nan]

array([], dtype=float64)

In [30]:
np.isnan(a)

array([False, False, False,  True, False, False])

In [31]:
a[np.isnan(a)]

array([nan])

In [32]:
a[~np.isnan(a)]

array([10., 20., 30., 50., 60.])

In [33]:
x = 100
y = 100

x == y  

True

In [34]:
x is y

True

In [35]:
x = 100_000   # _ is ignored in numbers!
y = 100_000

x == y

True

In [36]:
# "x is y " == "id(x) == id(y)"
x is y     # this is asking: are x and y the same object in memory

False

# What's going on?

Python knows we're going to use small numbers a lot. So instead of creating new integer objects each time we need a small number (from -5 to 256), it uses already-allocated numbers.

In [37]:
id(x)

5197268848

In [38]:
id(y)

5197268784

In [39]:
# what about "is" and strings?

x = 'abcd'
y = 'abcd'

x == y

True

In [40]:
x is y

True

In [41]:
x = 'abcd' * 100_000
y = 'abcd' * 100_000

x == y 

True

In [42]:
x is y

False

In [43]:
x = 'a.cd'
y = 'a.cd'

x == y

True

In [44]:
# we're not asking if x's value and y's value are the same
# we're asking: are x and y referring to the same object in memory?
x is y

False

# Python doesn't really have variables!

When you assign `x = 'abcd'`, Python takes the string value `abcd`, and assigns it to a dictionary. What's the key in the dictionary? The string `x`!

So it needs to create a new string, `'x'`, each time we store to or retrieve from the variable `x`.  That's a lot of strings we're going to create.

So Python doesn't.  Rather, it caches our strings.

- Any string that's short enough (I think shorter than 1,000 characters)
- And that only contains valid identifier (variable/function) characters -- letters, `_`, and numbers


In [45]:
x = [10, 20, 30]
y = x

x is y

True

In [46]:
import copy
y = copy.copy(x)  # this copies only the top layer

y

[10, 20, 30]

In [47]:
x is y

False

In [48]:
import copy
y = copy.deepcopy(x)  # this copies all layers, all the way down

y

[10, 20, 30]

In [49]:
x = 100
y = x

x = 200
y  # what's the value of y?

100

In [50]:
import copy

x = [[100, 200], [300, 400], [500, 600]]
y = copy.copy(x)

x[0][1] = '!'
y

[[100, '!'], [300, 400], [500, 600]]

In [51]:
id(x)

5197955904

In [52]:
id(y)

5189378560

In [53]:
id(x[0])

5197625408

In [54]:
id(y[0])

5197625408

In [55]:
y = x  # both x and y are referring to the same object

# Numbers!

In [57]:
# how much memory does an integer take up in Python?

import sys

x = 0
sys.getsizeof(x)  # how big, in *bytes*, is x?

24

In [58]:
x = 1
sys.getsizeof(x)

28

In [59]:
# what's the largest int we can have in Python?
# it's limited by how much memory is in your system.

x = 100_000_000_000_000
x

100000000000000

In [60]:
x = x ** 12345

In [62]:
sys.getsizeof(x)

76576

In [63]:
# floats?

x = 1.0
type(x)

float

In [64]:
0.1 + 0.2

0.30000000000000004

In [65]:
1/3  # cannot be represented by a terminating decimal

0.3333333333333333

# How can we solve this problem?

1. Use NumPy or Pandas, which uses float implemented in C, where you can choose a large number of bits (so the error becomes so tiny, it's ignored)
2. Use integers, just call it 1000 * cents, for example.
3. Use Python's `round` function.  For example: `round(0.1 + 0.2, 2)`
4. Use the `Decimal` class in the `decimal` module, aka BCD -- binary coded decimals.

In [67]:
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5], dtype=np.float16)

In [68]:
a

array([0.1, 0.2, 0.3, 0.4, 0.5], dtype=float16)

In [69]:
a + 0.1

array([0.2   , 0.2998, 0.4   , 0.5   , 0.6   ], dtype=float16)

In [70]:
a = np.array([0.1, 0.2, 0.3, 0.4, 0.5], dtype=np.float128)   # 128-bit float, so the error is so tiny, it disappears

In [71]:
a + 0.1

array([0.2, 0.3, 0.4, 0.5, 0.6], dtype=float128)

In [72]:
round(0.1 + 0.2, 2)

0.3

In [74]:
from decimal import Decimal

# advantage: very accurate
# disadvantage: very high memory use, very slow

x = Decimal('0.1')  # creating a new Decimal object based on the string '0.1'
y = Decimal('0.2')  # same, with '0.2'

x + y

Decimal('0.3')

In [None]:
# 01 would be -- 0b00 0b01

In [75]:
# don't do this!

x = Decimal(0.1) 
y = Decimal(0.2) 

x + y

Decimal('0.3000000000000000166533453694')

In [76]:
x

Decimal('0.1000000000000000055511151231257827021181583404541015625')

In [77]:
y

Decimal('0.200000000000000011102230246251565404236316680908203125')

In [78]:
a

array([0.1, 0.2, 0.3, 0.4, 0.5], dtype=float128)

In [79]:
np.float

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.float


float

In [81]:
np.float128(0.1) + np.float128(0.2)

0.30000000000000001665

# Strings

Strings are immutable, and contain Unicode.

In [83]:
s = ''
type(s)

str

In [84]:
sys.getsizeof(s)

49

In [85]:
s = 'a'
sys.getsizeof(s)

50

# Strings are sequences

Strings, lists, and tuples are all sequences in Python. Which means, they all:

- Get the length with `len`
- Get an individual item with `[i]`
- Get a slice back with `[start:end+1]` 
- Iterate with a `for` loop
- Have an `.index` method to find the first index with that value
- Have a `.count` method to find how many times an element appears

The differences are:
- Strings contain strings (there is no character type in Python)
- Lists and tuples can contain anything
- Strings and tuples are immutable

In [86]:
s = 'abcd'
s += 'efgh'  # have I changed the string? 

In [88]:
s  # s changed, but it now refers to a new, different string

'abcdefgh'

In [89]:
s = 'abcd'
print(id(s))

s += 'efgh'  # have I changed the string? 
print(id(s))

5214131696
5214071728


In [90]:
# immutable ≠ constant

In [91]:
s[0]

'a'

In [92]:
s[0] = '!'

TypeError: 'str' object does not support item assignment

In [93]:
s = 'aBcDeFgH'

s.lower()  # gives me a new string back

'abcdefgh'

In [94]:
s

'aBcDeFgH'

In [95]:
s.capitalize()

'Abcdefgh'

In [97]:
s.swapcase()  # why does this method exist?

'AbCdEfGh'

In [98]:
# since strings cannot change, and string methods return new strings, we can chain our methods:

s = '   abCDef   '
s.strip().lower()

'abcdef'

In [99]:
s

'   abCDef   '

In [100]:
# the most interesting new way to create strings is the f-string (format string)

x = 100
y = [10, 20, 30]
z = {'a':1, 'b':2}

# in the curly braces, we can have any Python expression
# the value of the expression is turned into a string and added to the resulting f-string
s = f'x = {x}, y = {y}, z = {z}'
print(s)

x = 100, y = [10, 20, 30], z = {'a': 1, 'b': 2}


In [101]:
# as of Python 3.8, we can say this:
s = f'{x=}, {y=}, {z=}'
print(s)

x=100, y=[10, 20, 30], z={'a': 1, 'b': 2}


In [103]:
s = f'{len(y)=}'
print(s)

len(y)=3


# Exercise: Firstlast

1. Write a function, `firstlast`, that takes any sequence (string, list, or tuple) and returns a two-element structure of the same type. 
2. The output's first element will be the first element from the input, and the final element will be the last element from the input.
3. If there's only one element in the input, it'll be repeated.

Examples:

```python
firstlast('abcde')            # 'ae'
firstlast([10, 20, 30, 40])   # [10, 40]
firstlast((100, 200, 300))    # (100, 300)
firstlast([10])               # [10, 10]

```

In [112]:
# duck typing 
# I don't care what type I have -- I care what it does!
# so long as (in this example) it implements the API for sequences, including slices, I'm fine

def firstlast(seq):
    return seq[:1] + seq[-1:]

print(firstlast('abcde'))            # 'ae'
print(firstlast([10, 20, 30, 40]))   # [10, 40]
print(firstlast((100, 200, 300)))    # (100, 300)
print(firstlast([10]))               # [10, 10]


ae
[10, 40]
(100, 300)
[10, 10]


In [106]:
s = 'abcde'

s[0]

'a'

In [107]:
s[1:3]  # slice, returns the same type as I started with

'bc'

In [108]:
mylist = [10, 20, 30, 40, 50]
mylist[1:3]

[20, 30]

In [109]:
mylist[:1]

[10]

In [110]:
mylist[-1:]

[50]

In [111]:
mylist[:1] + mylist[-1:]

[10, 50]

In [113]:
def firstlast(seq):
    if isinstance(seq, list):
        return [seq[0], seq[-1]]

# Lists vs. tuples

People often think that lists are mutable, and tuples are immutable, and that's how we should use them -- based on whether we want to change our data.

That's *not* true! The Python tradition is:

- use lists for sequences of the *same* type
- use tuples for sequences of *different* types, and for records/structs/etc.

In [114]:
# behind the scenes here, there's an array of pointers to Python (integer) objects
mylist = [10, 20, 30]

In [115]:
mylist.append(40)  # do we need to reallocate the entire array?

In [116]:
mylist = []
for i in range(30):
    print(f'{i}:\t{len(mylist)=}\t{sys.getsizeof(mylist)=}')
    mylist.append(i)

0:	len(mylist)=0	sys.getsizeof(mylist)=56
1:	len(mylist)=1	sys.getsizeof(mylist)=88
2:	len(mylist)=2	sys.getsizeof(mylist)=88
3:	len(mylist)=3	sys.getsizeof(mylist)=88
4:	len(mylist)=4	sys.getsizeof(mylist)=88
5:	len(mylist)=5	sys.getsizeof(mylist)=120
6:	len(mylist)=6	sys.getsizeof(mylist)=120
7:	len(mylist)=7	sys.getsizeof(mylist)=120
8:	len(mylist)=8	sys.getsizeof(mylist)=120
9:	len(mylist)=9	sys.getsizeof(mylist)=184
10:	len(mylist)=10	sys.getsizeof(mylist)=184
11:	len(mylist)=11	sys.getsizeof(mylist)=184
12:	len(mylist)=12	sys.getsizeof(mylist)=184
13:	len(mylist)=13	sys.getsizeof(mylist)=184
14:	len(mylist)=14	sys.getsizeof(mylist)=184
15:	len(mylist)=15	sys.getsizeof(mylist)=184
16:	len(mylist)=16	sys.getsizeof(mylist)=184
17:	len(mylist)=17	sys.getsizeof(mylist)=248
18:	len(mylist)=18	sys.getsizeof(mylist)=248
19:	len(mylist)=19	sys.getsizeof(mylist)=248
20:	len(mylist)=20	sys.getsizeof(mylist)=248
21:	len(mylist)=21	sys.getsizeof(mylist)=248
22:	len(mylist)=22	sys.getsizeof(my

In [118]:
# how much memory does my list take up?
mylist = [10, 20, 30]
sys.getsizeof(mylist)

120

In [119]:
mylist[0] = 'abcdefghij' * 10_000_000

In [120]:
# this only checks the size of the list's data structure!
# it doesn't include the sizes of the elements!
sys.getsizeof(mylist)

120

In [121]:
t = (10, 20, 30)
type(t)

tuple

In [122]:
t = (10, 20)
type(t)

tuple

In [127]:
t = (10,)  # use a comma, and the value is a single-element tuple
type(t)

tuple

In [124]:
t = ()
type(t)

tuple

In [125]:
# we use () for lots of things in Python, including calling functions and priority of operations

4 + 5 * 6

34

In [126]:
# what if I want to give priority to 4+5?
(4+5) * 6

54

In [128]:
(4+5,) * 6

(9, 9, 9, 9, 9, 9)

In [129]:
# can we have mutable data in a tuple?
t = ([10, 20, 30],
    [40, 50, 60])

In [130]:
t

([10, 20, 30], [40, 50, 60])

In [131]:
# can I modify those lists? YES
t[0].append(31)
t

([10, 20, 30, 31], [40, 50, 60])

In [132]:
# what happens here?

t[0] += [32, 33, 34]  # this ran the .__iadd__ method on the list, which (a) changed the list, (b) assigned it back

TypeError: 'tuple' object does not support item assignment

In [133]:
t



([10, 20, 30, 31, 32, 33, 34], [40, 50, 60])

# Next up

- Named tuples
- Dictionaries
     - How they work
     - Methods + Practice
     - Alternative dicts
     
15 minute break: Return at 11:05     

In [134]:
# person record
p = ('Reuven', 'Lerner', 46)

p[0]

'Reuven'

In [135]:
p[1]

'Lerner'

In [136]:
p[2]

46

In [138]:
# named tuples give us ... tuples with names!
from collections import namedtuple

# namedtuple takes two arguments:
# - string describing the class we want to create
# - a list of strings, the attributes we want on instances of that class
# it returns a new class, which we assign to a variable
#  (typically going to be the same name as arg #1)

Person = namedtuple('Person', ['first', 'last', 'shoesize'])

In [140]:
type(Person)  # see? it's a class, with a type of type

type

In [141]:
Person.__name__  # what is its string name?

'Person'

In [142]:
p = Person('Reuven', 'Lerner', 46)
p[0]

'Reuven'

In [143]:
p[1]

'Lerner'

In [144]:
p[2]

46

In [145]:
p.first

'Reuven'

In [146]:
p.last

'Lerner'

In [147]:
p.shoesize

46

In [148]:
Person.__bases__ # what does it inherit from?

(tuple,)

In [149]:
p.first = 'newfirstname'

AttributeError: can't set attribute

In [151]:
# _ as a first character means: private! watch out! don't use it!
# _replace returns a *new* named tuple, based on the existing one (p)
p._replace(first='newfirstname')

Person(first='newfirstname', last='Lerner', shoesize=46)

# Dictionaries

Dicts are the most important data structure in Python. The languge uses it:

- Our global variables are kept in a dict, which we can see by running `globals()`
- In a function body, we can see the local variables in a dict, `locals()`
- Modules are all dicts
- Attributes are collections of dicts
- Objects are collections of dicts

Keys in a dict are:
- Immutable (basically)
- Unique

Values in a dict are:
- Anything at all
- Can be repeated

You can get from the keys to the values, but not vice versa.

In [152]:
d = {'a':1, 'b':2, 'c':3}
type(d)

dict

In [153]:
len(d)  # how many key-value pairs?

3

In [154]:
d['a']

1

In [155]:
d['b']

2

In [156]:
d['c']

3

In [157]:
d['q']

KeyError: 'q'

In [158]:
# search in a dict's keys with 'in'
'a' in d  # O(1)

True

In [161]:
# DO NOT do this:
'a' in d.keys()  # O(n)

True

In [160]:
d.keys()

dict_keys(['a', 'b', 'c'])

In [162]:
d = {'a':1, 'b':2, 'c':3}

while True:
    k = input('Enter a key: ').strip()
    
    if not k:   # empty strings in boolean context are False
        break
        
    elif k in d:
        print(f'd[{k}] is {d[k]}')
        
    else:
        print(f'{k} is not a key in d')

Enter a key: a
d[a] is 1
Enter a key: b
d[b] is 2
Enter a key: c
d[c] is 3
Enter a key: q
q is not a key in d
Enter a key: 


In [None]:
# a shorter form of those last 4 lines is the "get" method
# it's just like [], but returns None if the key does