# 5) More data types in python

Related text:

- https://jakevdp.github.io/WhirlwindTourOfPython/index.html<br>
- https://jakevdp.github.io/PythonDataScienceHandbook/

## Past and Present Topics

Last time we covered syntax for multiple operations, from simple to more complex math, how variables in python point to places in memory, and some properties of lists. Any questions?

Today, we will finish our discussion of data types in python. First, a few more operators that are handy when working with lists.

## Identity and Membership Operators

Like ``and``, ``or``, and ``not``, Python also contains prose-like operators  to check for identity and membership.
They are the following:

| Operator      | Description                                       |
|---------------|---------------------------------------------------|
| ``a is b``    | True if ``a`` and ``b`` are identical objects     |
| ``a is not b``| True if ``a`` and ``b`` are not identical objects |
| ``a in b``    | True if ``a`` is a member of ``b``                |
| ``a not in b``| True if ``a`` is not a member of ``b``            |

### Identity Operators: "``is``" and "``is not``"

The identity operators, "``is``" and "``is not``" check for *object identity*.
Object identity is different than equality, as we can see here:

In [1]:
a = [1, 2, 3]
b = [1, 2, 3]

In [2]:
a == b

True

In [3]:
a is b

False

In [4]:
a is not b

True

What do identical objects look like? 

In [5]:
a = [1, 2, 3]
b = a
print("b is a:", b is a)
c = a.copy()
print("c is a:", c is a)
d = 4
e = 4
print("d is e:", d is e)
e = 4.0
print("e is d:", e is d)
d = 4.0
print("d is e:", d is e)
print("d == e:", d == e)
d = e
print("d is e:", d is e)

b is a: True
c is a: False
d is e: True
e is d: False
d is e: False
d == e: True
d is e: True


The difference between the two cases here is that in the first assignments of a and b, a and b point to different objects, while in the second they point to the same object. As we previously discussed, Python variables are pointers. The "is" operator checks whether the two variables are pointing to the same container (object), rather than referring to what the container contains. 

As previously mentioned, if a copy is made of a collection (like a list) changes to one will not effect the other, but they will if they are pointing to the same object.

Be careful not to use "is" if you really want ==. However, if you want to check whether two variables point to the same object, use "is".

### Membership operators
Membership operators check for membership within compound objects.
So, for example, we can write:

In [6]:
1 in [1, 2, 3]

True

In [7]:
2 not in [1, 2, 3]

False

Using this built-in functions will be *much faster* than if you wrote a loop to check membership!

## Strings

As previously discussed strings (type is "str") can be defined with either single or double-quotes (if you don't use either, Python will assume you are referencing a variable and throw and error if the variable has not been defined.

In [8]:
message = "what do you like?"
response = 'spam'
print(type(message), type(response))

<class 'str'> <class 'str'>


### Some commonalities with lists: 

For example, we've used the function `len` on lists.

In [9]:
a_list = [1, 2, 3]
b_list = [4, 5]
question = "what do you like?"
response = 'spam'
print("Length and type of 'a_list' are: {} and {}".format(len(a_list), type(a_list)))
print("Length and type of 'question' are: {} and {}".format(len(question), type(question)))   

Length and type of 'a_list' are: 3 and <class 'list'>
Length and type of 'question' are: 17 and <class 'str'>


Both can also be combined (added together, also called concatenation) with `+`:

In [10]:
c_list = a_list + b_list
q_and_a = question + ' ' + response
print("'a_list' + 'b_list' yields:", c_list)
print("'question' + ' ' + 'response' yields:", q_and_a)

'a_list' + 'b_list' yields: [1, 2, 3, 4, 5]
'question' + ' ' + 'response' yields: what do you like? spam


Parts of each can be accessed (called *indexing*) in the same ways:

In [11]:
print(c_list[2:4])
print(q_and_a[2:4])

[3, 4]
at


#### A side note on indexing 

The book https://jakevdp.github.io/WhirlwindTourOfPython/index.html has a nice graphic to help picture list/string indexing. Python will return what is between the indexes shown below, with `:` specifying everything before, after, or in-between:

![List Indexing Figure](images/lect05_list_indexing.png)

### Back to string and list commonalities:
Both can be checked in the same way to see if they contain a particular object:

In [12]:
print("Is '2' in c_list?  is 'at'?", 2 in c_list, 'at' in c_list)
# One difference: you can look for anything in a list, but only strings in strings
# Thus, below, I will get an error if I search for an int or float  
#    so instead of checking for the int 2, I look for the string '2'
print("Is '2' in q_and_a? is 'at'?", '2' in q_and_a, 'at' in q_and_a) 

Is '2' in c_list?  is 'at'? True False
Is '2' in q_and_a? is 'at'? False True


And both can be multiplied!

In [13]:
print(a_list * 3)
print(response * 3)

[1, 2, 3, 1, 2, 3, 1, 2, 3]
spamspamspam


### Some more string-specific actions

Some built-in methods of strings include:

In [14]:
print(response.upper())
print(q_and_a.capitalize())
q_and_a = q_and_a.replace('spam', "tofu")
print(q_and_a)
print(response, q_and_a)

SPAM
What do you like? spam
what do you like? tofu
spam what do you like? tofu


In [15]:
q_a_split = q_and_a.split()
print(q_a_split, type(q_a_split))

['what', 'do', 'you', 'like?', 'tofu'] <class 'list'>


In [16]:
q_a_split_alt = q_and_a.split("?")
print(q_a_split_alt)

['what do you like', ' tofu']


Strings' built-in method `strip()` will (by default) remove beginning or ending white space.

In [17]:
q_a_split_alt[1] = q_a_split_alt[1].strip()
print(q_a_split_alt)

['what do you like', 'tofu']


There are [many more built-in operations for strings](https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str) (and [also this link](https://docs.python.org/3/library/string.html)) that you are not responsible for knowing for this class. One that you are you have already seen: the `format` method!

You can also use regular expressions to give you much more functionality. You'll need to import the [re library](https://docs.python.org/3/library/re.html):

In [18]:
import re

## A note on printing and floating-point precision

One thing to be aware of with floating point arithmetic is that its precision is limited, which can cause equality tests to be unstable. For example:

In [19]:
0.1 + 0.2 == 0.3

False

In [20]:
print("0.1 = {0:.17f}".format(0.1))
print("0.2 = {0:.17f}".format(0.2))
print("0.3 = {0:.17f}".format(0.3))

0.1 = 0.10000000000000001
0.2 = 0.20000000000000001
0.3 = 0.29999999999999999


Which values are stored exactly? Why, base-2 numbers of course since they are stored in binary, such as:

$$
1/8 = 0\cdot 2^{-1} + 0\cdot 2^{-2} + 1\cdot 2^{-3}
$$

The value $0.125 = 0.001_2$ happens to be one number which both binary and decimal notation can represent in a finite number of digits.

This is similar to how base-10 notation requires an infinite number of digits to represent:
$$
1 / 3 = 0.333333333\cdots
$$

The best way to deal with it is to always keep in mind that floating-point arithmetic is approximate, and *never* rely on exact equality tests with floating-point values. Numpy has a solution for this!

In [21]:
import numpy as np

x = 0.1 + 0.2
print(x, x == 0.3)
print(np.isclose(x, 0.3))

0.30000000000000004 False
True


### A few more printing options

We'll continue to introduce printing options in future class sessions.

In [22]:
a = 0.1 + 0.2
b = 0.3
print(a, b)
print("a = {}".format(a, b))
print("a = {}, b = {}".format(a, b))
print("a = {0:}, b = {1:}, a = {0:}".format(a, b))

0.30000000000000004 0.3
a = 0.30000000000000004
a = 0.30000000000000004, b = 0.3
a = 0.30000000000000004, b = 0.3, a = 0.30000000000000004


In [23]:
print("a = {:8.2f}, b = {:8.2f}".format(a, b))
print("a = {:2.2e}, b = {:2.2e}".format(a, b))

a =     0.30, b =     0.30
a = 3.00e-01, b = 3.00e-01


## The None Type

The `None` type can be useful to name a value as a placeholder, and then be able to check if it got assigned a value later.

In [24]:
var = None
if 2 > 4:
    var = 'all I know about math is wrong'
print(type(var))

<class 'NoneType'>


What values are the following assigned when recast as booleans?

In [25]:
print(bool(var))
print(bool(0))
print(bool(2018))

False
False
True


In [26]:
print(bool(a_list))
print(bool(q_and_a))
print(bool(""))
print(bool([]))

True
True
False
False


Thus, basically any variable can be used in an if statement:

In [27]:
# First, let's remind ourselves about these variables
print(var, type(var))
print(a_list, type(a_list))

None <class 'NoneType'>
[1, 2, 3] <class 'list'>


In [28]:
if var:
    var = "Dow"
else:
    var = "NCRC"
print("var =", var)
if a_list:
    a_list = "Dow"
else:
    a_list = "NCRC"
print("a_list =", a_list)

var = NCRC
a_list = Dow


We'll hold off on a very important type of object, numpy arrays, until we talk more about math operations with numpy.

## More types of collections: sets and dicts

So far, we've covered basic types (such as ``int``, ``float``, ``bool``, and ``str``) and lists and tuples as types of collections (containers for objects):

We've discussed that the syntax used in defining the collection determines the collection type:

| Type Name | Example                   |Description                            |
|-----------|---------------------------|---------------------------------------|
| ``list``  | ``[1, 2, 3]``             | Ordered collection                    |
| ``tuple`` | ``(1, 2, 3)``             | Immutable ordered collection          |
| ``set``   | ``{1, 2, 3}``             | Unordered collection of unique values |
| ``dict``  | ``{'a':1, 'b':2, 'c':3}`` | Unordered (key,value) mapping         |

Let's now discuss these last two types of collections.

### Sets

Sets are very much like sets in math.

In [29]:
primes = {7, 3, 2, 5, 7}
print(primes, "len =", len(primes))
odds = {9, 7, 1, 5, 3, 7, 9}
print(odds, "len =", len(odds))

{2, 3, 5, 7} len = 4
{1, 3, 5, 7, 9} len = 5


In [30]:
# there are two ways of specifying a union
p_o_union = primes | odds
print(p_o_union == primes.union(odds))
print(p_o_union)

True
{1, 2, 3, 5, 7, 9}


In [31]:
# and there are two ways of specifying an intersection
p_o_inter = primes & odds
print(p_o_inter == primes.intersection(odds) == odds.intersection(primes))
print(p_o_inter)

True
{3, 5, 7}


Many more operations with sets (that you are not responsible to know) can be found in the [Python documentation](https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset).

### Dictionaries

With lists, you can access by their index. What if you want to access by something meaningful, like a student's unique name? Use a dict! 

Note: if the order is maintained, that is a coincidence; do not depend on this (see also below).

In [32]:
classmates = {'avinav': 'Avin',
             'chechung': 'Carolina',
             'minjcha': ' Minjeong', 
             }
print(classmates)

{'avinav': 'Avin', 'chechung': 'Carolina', 'minjcha': ' Minjeong'}


In [33]:
# How to add items to your dictionary
classmates['lyrivera'] = 'Luis'
# Print the 'keys' which are the handles to access the associated 'values'
print(classmates.keys())
print(classmates.values())
# like so:
print(classmates['avinav'])

dict_keys(['avinav', 'chechung', 'minjcha', 'lyrivera'])
dict_values(['Avin', 'Carolina', ' Minjeong', 'Luis'])
Avin


In [34]:
print('Avin' in classmates.values())

True


As you might already expect, keys must be unique but values can repeat.

Lists in list, or dicts in lists, or dicts in dicts, can be hard concepts to think about, but very useful. What if I want the dictionary to allow me to track several properties of each student? A dict in dict works well.

In [35]:
che_696 = {'avinav': {'first': 'Avin', 'last': 'Vijay', 'program': 'ChE'},
          'chechung': {'first': 'Carolina', 'last': 'Chung', 'program': 'Biomed'},
          'minjcha': {'first': ' Minjeong', 'last': 'Cha', 'program': 'Mat Sci'},
          'lyrivera': {'first': 'Luis', 'last': 'Rivera-Rivera', 'program': 'ChE'},
          }
print(che_696['chechung']['program'])

Biomed


This way of setting up the dict of dicts is onerous, but we could automatically make one by reading in a file that had columns for each category and a row per person. We'll get to that, but not today.

### More specialized data structures

There is a built-in [collections module](https://docs.python.org/3/library/collections.html) that has even more types. Two I've found useful:

- ``collections.defaultdict``: Like a dictionary, but unspecified keys have a user-specified default value
- ``collections.OrderedDict``: Like a dictionary, but the order of keys is maintained

In [36]:
import collections

ordered_class = collections.OrderedDict

## Some material I intentionally skipped

*FYI*: Do you use complex numbers? Read about them [here](https://jakevdp.github.io/WhirlwindTourOfPython/05-built-in-scalar-types.html) (use the search option to skip to Complex Numbers).

*Next up:* Programming with an IDE!

Before Monday, try installing [IntelliJ using this link](https://www.jetbrains.com/student)