# Session 2: Python Data Types and Methods

In this session, we explore basic data types and methods that operate on them using Python methods associated with each type.  In this and subsequent notebooks, we draw on material from various sources, including Jean Mark Gawron's book "Python for Social Science", available here: 

http://www-rohan.sdsu.edu/~gawron/python_for_ss/course_core/book_draft/index.html

And the Python 3 documentation, here:
https://docs.python.org/3/index.html

## Numeric Data Types in Python

We have already seen some of the basic interactions with numbers in Python.  The main two numeric types are Int and Float.  In Python 2 there were two versions of integers (int and long), but these have been unified in Python 3.

In [1]:
# Integers are the simplest numeric type
type(12)

int

In [2]:
# Float or Floating Point numbers enable more precision
type(12.0000000000001)

float

In [3]:
# We can assign values to a variable to reuse them
x = 12.0000000000000000000001
y = 12
print(x)
print(y)

12.0
12


Why not use floats all the time?  They are more precise, after all.  A couple of reasons.  One is that it can be more complicated to do certain things, like compare numbers to see if they are equivalent.

In [4]:
# Test whether x is equal to y.  
x == y

True

**Try the above by changing the number of decimal places to the original x and test it again...**

If two numbers are within some tolerance of each other, Python will consider them close enough to call equal in value.

A second reason is that floating point numbers require more space in memory and on disk if they are in a file.  This is not a problem for a single value, but if you were working with really large databases, it adds up, and could cause you to run out of memory or disk if you used float as the type for all your numeric data.

You can **cast** the type of a number to convert it to a specified type, like converting from float to int:


In [5]:
print(x)
y = int(x)
print(y)

12.0
12


In [6]:
float(y)

12.0

### Built in operations for numeric data types

Reviewing some of the built-in methods in Python that apply to numeric data types:

In [8]:
x = 200
y = 12

#Summing two values
x + y

212

In [9]:
# Subtracting
x - y

188

In [10]:
# Multiplying
x * y

2400

In [11]:
# Dividing
x / y

16.666666666666668

In [12]:
# Integer division -- floored quotient
x // y

16

In [13]:
# Remainder of x / y
x % y

8

In [14]:
# Flipping the sign
y = -x
y

-200

In [15]:
# Works in the other direction as well
-y

200

In [16]:
# Raising x to the power of y
x = 10
y = 5
x ** y

100000

### Importing Additional Methods from the Math Library ###

In addition to the built-in functions and operators above, many more are available in the math library, which is always available, but you have to import the library to access them.  A few examples below.

In [19]:
import math
math.sqrt(x)

3.1622776601683795

You can see the full list of functions available in the math library by using tab after the name of the library and a dot:

In [22]:
math.

SyntaxError: invalid syntax (<ipython-input-22-6994845579ab>, line 1)

And you can get more documentation on a specific function by asking for it:

In [24]:
math.log?

[1;31mDocstring:[0m
log(x[, base])

Return the logarithm of x to the given base.
If the base not specified, returns the natural logarithm (base e) of x.
[1;31mType:[0m      builtin_function_or_method


In [25]:
math.log(x)

2.302585092994046

In [26]:
# What happens if we take the log of a number with a value of 0?
x = 0
math.log(x)

ValueError: math domain error

In [27]:
# We could add a 1 to x to avoid this problem
math.log(x+1)

0.0

In [30]:
# Or we could use one of the other log functions in math that does this and avoids returning an error
math.log1p(x)

0.0

In [38]:
math.log1p?

[1;31mDocstring:[0m
log1p(x)

Return the natural logarithm of 1+x (base e).
The result is computed in a way which is accurate for x near zero.
[1;31mType:[0m      builtin_function_or_method


In [31]:
# A common problem is division where the denominator has a value of zero
y / x

ZeroDivisionError: division by zero

In [32]:
y / (x+1)

5.0

In [35]:
# Comparing two values to see if they are approximately the same, within some tolerance:
x = 12.1
z = 12.2
math.isclose(x,z, rel_tol=.001)

False

In [37]:
math.isclose?

[1;31mDocstring:[0m
isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0) -> bool

Determine whether two floating point numbers are close in value.

   rel_tol
       maximum difference for being considered "close", relative to the
       magnitude of the input values
    abs_tol
       maximum difference for being considered "close", regardless of the
       magnitude of the input values

Return True if a is close in value to b, and False otherwise.

For the values to be considered close, the difference between them
must be smaller than at least one of the tolerances.

-inf, inf and NaN behave similarly to the IEEE 754 Standard.  That
is, NaN is not close to anything, even itself.  inf and -inf are
only close to themselves.
[1;31mType:[0m      builtin_function_or_method


In [39]:
# Of course you can put several operations together to compute things, like a quadratic equation.  
# We will see how to do this on set of numbers a bit later.
a = 2
b = 3
c = 4
y = a + b*x + c*x**2 
y

623.9399999999999

## Strings

Strings are just text, like in the introductory "Hello World!" example.  

Let's explore some methods that operate on them, and explain an important distinction between data types.  Let's review quickly what we already know about strings.  We can assign any string to a variable like we would assign an integer or a float to a variable:

In [40]:
# Try this first just with a text string assigned to a variable
a = CP255

NameError: name 'CP255' is not defined

In [41]:
# The string needs to be in quotes for this variable assignment to work
a = "CP255"
type(a)

str

In [42]:
# The quotes can be single or double, but have to match, 
# or you will get an error as Python can't find the end of the string.
a = 'CP255"

SyntaxError: EOL while scanning string literal (<ipython-input-42-ecad0abe4292>, line 3)

What if you need to create a string that has multiple lines?  There are two ways to create such a string.  The first uses triple quotes.

In [43]:
X = """
  The Zen of Python:
  
  Beautiful is better than ugly.
  Explicit is better than implicit.
  Simple is better than complex.
  Complex is better than complicated.
"""

print(X)


  The Zen of Python:
  
  Beautiful is better than ugly.
  Explicit is better than implicit.
  Simple is better than complex.
  Complex is better than complicated.



In [44]:
# The second way uses \n to insert the line endings
X = "\n   Beautiful is better than ugly.\n   Explicit is better than implicit.\n   Simple is better than complex.\n   Complex is better than complicated."
print(X)


   Beautiful is better than ugly.
   Explicit is better than implicit.
   Simple is better than complex.
   Complex is better than complicated.


In [45]:
# Notice that the string object X actually has \n line endings as part of it. 
# The print function does not print those characters, it just starts a new line.
# But if you just type X, its built-in function to print itself shows its contents:
X

'\n   Beautiful is better than ugly.\n   Explicit is better than implicit.\n   Simple is better than complex.\n   Complex is better than complicated.'

### Indexing and Slicing Strings

We can get individual elements of a string (characters) by using indexes, that give us pointers to the positions within a string.  

**Notice that counting in Python starts from zero -- essentially all counters are offsets from the first position. This can take a bit of getting used to -- think of it like the way building floors in Europe generally start with zero.  The first floor in Europe would be a second floor in the U.S.**

In [46]:
a[0]

'C'

We can use a the string indexing method to extract a range, or a specific section of a string, beginning from any position and ending in any position.  

Python uses a syntax that separates the starting from the ending index position by a colon.  If we leave out the first or last, then the indexing gives all the values up to (but not including) the second value, or all the ones from the first value to the end.  Some examples should make this clearer: 

In [47]:
a[1:5]

'P255'

In [48]:
a[:5]

'CP255'

In [49]:
a[8:]

''

### Working with Strings

In [50]:
# A variable containing a string is still an object, and can do things like print itself
a

'CP255'

Print works with strings the same way as with numbers, suppressing the quotes

In [51]:
print(a)

CP255


In [52]:
a = 'This is CP255!'

We can find the length of a string using the built-in len function

In [53]:
len(a)

14


Related to indexing, here is a string function to look up a specific substring within a string, and return its index, or position:

In [54]:
str.find(a, 'T')

0

Let's see what other string functions are available, using tab completion after str.:

In [None]:
str.

Some of these function names are pretty self-explanatory, like 'capitalize', but others are less so.  As usual, you can look up some quick help on any of those functions:

In [55]:
str.expandtabs?

[1;31mDocstring:[0m
S.expandtabs(tabsize=8) -> str

Return a copy of S where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
[1;31mType:[0m      method_descriptor


Note that since we assigned a string to a variable, a, that variable is now an object of type string, and it has access to the string methods directly:

In [56]:
print(a)
a.find('T')

This is CP255!


0

We can check whether a string contains a character or substring:

In [57]:
'R' in a

False

We can remove specific characters in a string with the strip method:

In [58]:
a.strip('!')

'This is CP255'

To remove any leading and trailing spaces from a string, just use the strip function with no argument:

In [59]:
b = ' ' + a
print(b)
print(b.strip())

 This is CP255!
This is CP255!


It is often helpful to put several operations together on one line, nesting them.  Going from left to right, we first take the values from the 8th index value to the end of the string, and then we strip the '!' from that result, and then we capitalize the result:

In [60]:
a[8:].strip('!').upper()

'CP255'

Another handy function lets you capitalize each word:

In [61]:
a.title()

'This Is Cp255!'

Note that we cannot assign a new letter to part of the string by its index location.  This is because in Python, strings are an **immutable** data type.  As we will see shortly, other data types like lists are **mutable**.

In [62]:
a[0] = 't'

TypeError: 'str' object does not support item assignment

There is a function that will let you replace string values, however:

In [63]:
print(a)
print(a.replace('!', '?'))

This is CP255!
This is CP255?


## Converting between string and numeric types

In [64]:
rent = '2500'
type(rent)

str

Let's say we have a string object that contains numeric values and we want to do mathematical operations on it.  What happens?

In [65]:
rent*2

'25002500'

In [66]:
rent*1.5

TypeError: can't multiply sequence by non-int of type 'float'

If we need to do mathematical operations, we really need to convert this string object to a numeric type -- either an integer or a float.

In [67]:
rent_int = int(rent)
type(rent_int)

int

In [68]:
rent_int * 2

5000

In [69]:
rent_float = float(rent)
rent_float

2500.0

Recall that you can also convert an integer to a float by a mathematical operation that involves a floating point component so that the result is forced to type float:

In [70]:
rent_flt = rent_int * 1.5
rent_flt

3750.0

But notice that the int method won't convert a string that looks like a floating point number:

In [71]:
rent_i = int('2500.0')

ValueError: invalid literal for int() with base 10: '2500.0'

But you can do this if you first convert to float and then convert to int:

In [72]:
rent_i = int(float('2500.0'))
print(rent_i)
type(rent_i)

2500


int

Of course, you sometimes may need to convert data from numeric to string type.  It works the same way:

In [73]:
rent_str = str(rent_int)
rent_str

'2500'

## Lists

You can think of strings as an ordered list of characters.  In Python, **lists** are another basic data type. Lists can contain any kind of object: strings, integers, floats, and others -- in any combination.  The syntax for lists is to include them as a sequence separated by commas, and enclosed in square brackets.  

### Creating Lists

We can create an empty list, and add elements to it:

In [91]:
mylist = []
mylist.append('this')

In [92]:
mylist

['this']

Notice that we can add lists, like we can add strings, to contatenate them:

In [93]:
# Besides using append as above, we can use + to add a list to a list, in this case we are adding a list with 1 item
mylist = mylist + ['that']

# We can also insert items in a specified location in a list
mylist.insert(1, 'and')

In [94]:
mylist

['this', 'and', 'that']

We can also convert a string that might be a sentence, or a line of data, to a list, so we can work with its elements more easily:

In [95]:
print('a = ', a)
b = str.split(a)
print('b = ', b)

a =  This is CP255!
b =  ['This', 'is', 'CP255!']


In [None]:
# And recalling that a is a string object, we can use the split function directly on a 
a.split()

### Indexing Lists

Note that indexing works for lists like it does for strings.  And if you have a list of strings, you can index into both in a nested way.

In [96]:
# What is the content of the first item in the list?
mylist[0]

'this'

In [97]:
# What is the content of the last item in the list? We can use the index value -1 to get the last item
mylist[-1]

'that'

To get a range of values from a list, use a slice of the index values: [0:2] would get the first through the 2nd entry, since the range goes up to, but does not include, the value of the index after the colon.

In [98]:
mylist[0:2]

['this', 'and']

In [99]:
# How would we find the first character of the second word in our list?
mylist[1][0]

'a'

### Working with Lists

What functions are available for list objects?

In [100]:
list.

SyntaxError: invalid syntax (<ipython-input-100-bf5ecf468415>, line 1)

In [101]:
# Find out the length of a list using len
len(mylist)

3

In [106]:
# Let's count the number of times we encounter a character in the list, or a word
a.count('5')

2

You can check whether a list contains an item, just as we did with strings.

In [107]:
'this' in mylist

True

In [None]:
# Delete the 3rd item in the list (remember it is indexed from 0). Let's make a copy of the list first
# since del is an inplace deletion
shortlist = mylist
del(shortlist[2])
shortlist

Remember that strings are immutable and we were unable to directly substitute a value of a character based on its index position?  Well, **lists are mutable**, and it does work to replace a value directly by its index value:

In [110]:
b[2] = 'mutable!'
b

['This', 'is', 'mutable!']

and we can put the list of strings together again to make a string from a list, inserting a space between each element:

In [111]:
c = str.join(' ',b)
c

'This is mutable!'

We can reverse the order of the items in a list. Notice that this is an in place operation.  Try it twice.

In [113]:
b.reverse()
b

['This', 'is', 'mutable!']

We can use the sort function to order the list.  Let's try it with a list of numbers first.

In [114]:
nums = [1, 3, 4, 5, 8, 6]
nums.sort()
nums

[1, 3, 4, 5, 6, 8]

And now with a list of words.

In [115]:
words = ['A', 'big', 'apple', 'pie']
words.sort()
print(words)

['A', 'apple', 'big', 'pie']


## Tuples

Tuples are like lists, but are **immutable**.  The syntax is similar except tuples use parentheses instead of square brackets.

In [116]:
d = ('a', 'b', 'c')
print(d)

('a', 'b', 'c')


In [117]:
d[2] = 'z'

TypeError: 'tuple' object does not support item assignment

See?  It really is immutable.  You'll just get a traceback if you try.  Use immutables only when you don't want to allow them to be modified.

## Dictionaries

Dictionaries are a very handy data type that can be used to manage data you need to look up by a key.  Dictionaries are unordered key - value pairs, separated by a colon.  They are much more general than the word : definition kind of pairing, since the value can be many different kinds of objects.  The syntax in this case identifies a dictionary with curly braces, containing lists of key-value pairs. 

### Creating Dictionaries

In [118]:
antonyms = {'hot': 'cold', 'fast': 'slow', 'good': 'bad'}
print(antonyms)

{'fast': 'slow', 'hot': 'cold', 'good': 'bad'}


A second way to do it is by converting lists.  This is a convenient thing to do with real data that comes from files, compared to the simple data we are using here.  The zip function is a bit advanced -- we will come back to it later when we talk about loops and iterables.  For now, just understand that it creates an iterable (think list) of tuples, containing the paired entries from the Keys and Values lists.

In [119]:
Keys = ['hot', 'fast', 'good']
Values = ['cold', 'slow', 'bad']
antonyms2 = dict(zip(Keys,Values))
print(antonyms2)

{'fast': 'slow', 'hot': 'cold', 'good': 'bad'}


### Working with Dictionaries

As usual, find the functions available for this class by using its name, dot, and tab:

In [None]:
dict.

We can retrieve the value of any dictionary entry by its key:

In [121]:
antonyms['hot']

'cold'

We can get the length, keys, and values of a dictionary:

In [122]:
len(antonyms)

3

To see all the keys in a dictionary, use the keys function:

In [123]:
print(antonyms.keys())

dict_keys(['fast', 'hot', 'good'])


The same thing works to get the values:

In [124]:
print(antonyms.values())

dict_values(['slow', 'cold', 'bad'])


Dictionaries are mutable:

In [125]:
antonyms['fast'] = 'gorge'

In [126]:
antonyms

{'fast': 'gorge', 'good': 'bad', 'hot': 'cold'}

## Exercise

Time to practice a bit with what we have covered so far.

In [204]:
s = 'Now is the time for all good men to come to the aid of their country!'

Turn the string above into 'all good countrymen' using the minimum amount of code, using the methods covered so far.  A couple of lines of code should do the trick.

In [205]:
li = str.split(s)
s = li[5] + " " + li[6] + " " + li[-1].strip("!") + li[7]

In [206]:
s

'all good countrymen'