# Session 2: Data Types -- Strings, Lists and Dictionaries

In this session, we explore basic data types and methods that operate on them in Python, to begin to manipulate data.  We will explore:

* Data types and methods that act on them:
    * Strings
    * Lists
    * Tuples
    * Dictionaries
* Introduction to Files: reading and writing data to disk files

## Strings

Strings are just text, like in the introductory "Hello World!" example.  

Let's explore some methods that operate on them, and explain an important distinction between data types.  Let's review quickly what we already know about strings.  We can assign any string to a variable like we would assign an integer or a float to a variable:

In [1]:
a = "This is CP255!"

In [2]:
a

'This is CP255!'

Notice that you get the contents without the quotes if you use the print method:

In [3]:
print(a)

This is CP255!


We can find the length of a string object:

In [4]:
len(a)

14

We can get individual elements of a string (characters) by using indexes, that give us pointers to the positions within a string.  Notice that counting in Python starts from zero -- essentially all counters are offsets from the first position. This can take a bit of getting used to -- think of it like the way building floors in Europe generally start with zero.  The first floor in Europe would be a second floor in the U.S.

In [5]:
a[0]

'T'

We can use a the string indexing method to extract a range, or a specific section of a string, beginning from any position and ending in any position.  

Python uses a syntax that separates the starting from the ending index position by a colon.  If we leave out the first or last, then the indexing gives all the values up to (but not including) the second value, or all the ones from the first value to the end.  Some examples should make this clearer: 

In [6]:
a[1:5]

'his '

In [7]:
a[:5]

'This '

In [8]:
a[8:]

'CP255!'

We can also remove specific characters with the strip method:

In [9]:
a.strip('!')

'This is CP255'

Another helpful thing you can do is nest methods, like so:

In [10]:
a[8:].strip('!')

'CP255'

That worked because the first operation (extracting a subset based on the index range) was a string object, so another string method could be applied to it.

A for loop can nicely iterate over the characters in a string, without having to manually increment a counter to do so:

In [11]:
for item in a:
    print(item)

T
h
i
s
 
i
s
 
C
P
2
5
5
!


Note that we cannot assign a new letter to part of the string.  This is because in Python, strings are an **immutable** data type.

In [12]:
a[0] = 't'

TypeError: 'str' object does not support item assignment

As we will see shortly, other data types like lists are **mutable**.

### Using Classes and Methods by Importing them

Some functions are already programmed and stored in pre-built classes that are available to us if we just import them.  This is the essence of object oriented programming.  We can use someone else's classes and the functions within them, which we refer to as methods when they are embedded inside a class.  Or we can write our own.  Let's use an existing one now, until we learn to write classes later in the semester.

In [13]:
a.find('r')

-1

In [14]:
print(str.find(a, 'T'))

0


Notice that the find function returns the location (index value) of the first instance it finds of the second argument, 'T' within the string object which is the first argument, a.

## Converting between string and numeric types

Let's say we have a string object that contains numeric data and we want to do mathematical operations on it.  What happens?

In [15]:
rent = '2500'
type(rent)

str

In [16]:
rent*1.5

TypeError: can't multiply sequence by non-int of type 'float'

In [17]:
rent*2

'25002500'

If we need to do mathematical operations, we really need to convert this string object to a numeric type -- either an integer or a float.

In [18]:
rent_int = int(rent)
type(rent_int)

int

In [19]:
rent_int * 2

5000

In [20]:
rent_float = float(rent)
print(rent_float)
type(rent_float)

2500.0


float

Recall that you can also convert an integer to a float by a mathematical operation that involves a floating point component so that the result is forced to type float:

In [21]:
rent_flt = rent_int * 1.5
print(rent_flt)
type(rent_flt)

3750.0


float

But notice that the int method won't convert a string that looks like a floating point number:

In [22]:
rent_i = int('2500.0')

ValueError: invalid literal for int() with base 10: '2500.0'

But you can do this if you firs convert to float and then convert to int:

In [23]:
rent_i = int(float('2500.0'))
print(rent_i)
type(rent_i)

2500


int

Of course, you sometimes may need to convert data from numeric to string type.  It works the same way:

In [24]:
rent_str = str(rent_int)
rent_str

'2500'

## Lists

You can think of strings as an ordered list of characters.  In Python, **lists** are another basic data type. Lists can contain any kind of object: strings, integers, floats, and others -- in any combination.  The syntax for lists is to include them as a sequence separated by commas, and enclosed in square brackets.  We can create an empty list, and add elements to it:

In [25]:
mylist = []
mylist.append('this')

In [26]:
mylist

['this']

Notice that we can add lists, like we can add strings, to contatenate them:

In [27]:
mylist = mylist + ['and', 'that']

In [28]:
mylist

['this', 'and', 'that']

And as we said, lists are mutable.  Let's replace the last value in the list, which we could do by using index value 2 (there are three elements), or by using the index value -1, which indicates the last index value.

In [29]:
mylist[-1] = 'those'

In [30]:
mylist

['this', 'and', 'those']

In [31]:
len(mylist)

3

In [32]:
del(mylist[2])

In [33]:
mylist

['this', 'and']

Now we can also convert a string that might be a sentence, or a line of data, to a list, so we can work with its elements more easily:

In [34]:
a.split()

['This', 'is', 'CP255!']

In [35]:
b = str.split(a)

In [36]:
b

['This', 'is', 'CP255!']

In [37]:
b[2] = 'great!'

and we can put the list of strings together again to make a string from a list, inserting a space between each element:

In [38]:
c = str.join(' ',b)

In [39]:
c

'This is great!'

## Tuples

Tuples are like lists, but are **immutable**.  The syntax is similar except tuples use parentheses instead of square brackets.

In [44]:
d = ('a', 'b', 'c')
print(d)

('a', 'b', 'c')


In [45]:
d[2] = 'z'

TypeError: 'tuple' object does not support item assignment

See?  It really is immutable.  You'll just get a traceback if you try.  Use immutables only when you don't want to allow them to be modified.

## Dictionaries

Dictionaries are a very handy data type that can be used to manage data you need to look up by a key.  Dictionaries are unordered key - value pairs, separated by a colon.  They are much more general than the word : definition kind of pairing, since the value can be many different kinds of objects.  The syntax in this case identifies a dictionary with curly braces, containing lists of key-value pairs. 

In [47]:
antonyms = {'hot': 'cold', 'fast': 'slow', 'good': 'bad'}
print(antonyms)

{'hot': 'cold', 'good': 'bad', 'fast': 'slow'}


In [48]:
antonyms['hot']

'cold'

We can get the length, keys, and values of a dictionary:

In [49]:
len(antonyms)

3

In [50]:
print(antonyms.keys())

dict_keys(['hot', 'good', 'fast'])


In [51]:
print(antonyms.values())

dict_values(['cold', 'bad', 'slow'])


Dictionaries are mutable:

In [52]:
antonyms['fast'] = 'gorge'

In [53]:
antonyms

{'fast': 'gorge', 'good': 'bad', 'hot': 'cold'}

## Reading Data from a File

Now that we have learned about some basic data types and methods that operate on them, we add another tool: working with files in Python.  Let's see how to open and step through the rows of a file we have used previously: rents_cat.csv.

There is a csv class for Python that provides some convenience methods to interact with csv formatted files.  Here is how we can read and print the first three rows of the file:

In [55]:
import csv
with open('Data/rents_cat.csv', 'r') as csvfile:
    i = 0
    itemreader = csv.reader(csvfile)
    for row in itemreader:
        i = i+1
        if i < 4:
            print(row)

['', 'neighborhood', 'title', 'price', 'bedrooms', 'pid', 'longitude', 'subregion', 'link', 'latitude', 'sqft', 'month', 'day', 'year', 'blockfips', 'countyfips', 'countyname', 'price_sqft', 'price_sqft_cat']
['3', 'financial district', '*NEW* Beautiful, Upscale Condo in Historic Jackson Square', '3300.0', '1.0', '4067393707', '-122.39974699999999', 'SF', '/sfc/apa/4067393707.html', '37.798108', '830.0', 'Sep', '18', '2013', '60750105002005', '6075', 'San Francisco', '3.9759036144578315', '4']
['6', 'castro / upper market', 'remodeled 1bd/1ba', '2950.0', '1.0', '4076891344', '-122.44033400000001', 'SF', '/sfc/apa/4076891344.html', '37.757405', '900.0', 'Sep', '18', '2013', '60750204022001', '6075', 'San Francisco', '3.2777777777777777', '4']


Sometimes you will need to skip the header row, so you only read the actual lines of data below the header row.

In [56]:
import csv
with open('Data/rents_cat.csv', 'r') as csvfile:
    i = 0
    itemreader = csv.reader(csvfile)
    next(itemreader, None)  # skip the headers
    for row in itemreader:
        i = i+1
        if i < 4:
            print(row)

['3', 'financial district', '*NEW* Beautiful, Upscale Condo in Historic Jackson Square', '3300.0', '1.0', '4067393707', '-122.39974699999999', 'SF', '/sfc/apa/4067393707.html', '37.798108', '830.0', 'Sep', '18', '2013', '60750105002005', '6075', 'San Francisco', '3.9759036144578315', '4']
['6', 'castro / upper market', 'remodeled 1bd/1ba', '2950.0', '1.0', '4076891344', '-122.44033400000001', 'SF', '/sfc/apa/4076891344.html', '37.757405', '900.0', 'Sep', '18', '2013', '60750204022001', '6075', 'San Francisco', '3.2777777777777777', '4']
['7', 'sunset / parkside', 'Panoramic Ocean View 2 Bedroom House', '3500.0', '2.0', '4060191270', '-122.48358200000001', 'SF', '/sfc/apa/4060191270.html', '37.74831500000001', '1400.0', 'Sep', '18', '2013', '60750328013002', '6075', 'San Francisco', '2.5', '2']


Remember that you can get help either inline inside IPython, or by googling to get the more detailed online documentation.

In [57]:
csv.reader?

## Exercises

Time to practice a bit with what we have covered so far.

In [58]:
s = 'Now is the time for all good men to come to the aid of their country!'

1. Turn the string above into 'all good countrymen' using the minimum amount of code, using the methods covered so far, and the least amount of code.  One or two short lines of code should do the trick.

Now modify the code that we just used to read the rents_cat.csv file, to create variables for neighborhood, rent, sqft, rent_sqft, bedrooms, year, month, and make the type of the numeric variables integer.  Print these for the first 5 rows of data in the file.

Now try computing the average rent_sqft across all the records in the file.

In [59]:
import csv
csvfile = open('Data/rents_cat.csv', 'r')
i = 0
total = 0.0
total_rent_sqft = 0.0
itemreader = csv.reader(csvfile)
next(itemreader, None)  # skip the headers
for row in itemreader:
    i = i+1
    total = total + float(row[3])
    total_rent_sqft = total_rent_sqft + float(row[17])
print(total/i, total_rent_sqft/i)

2722.7133695115936 2.51892280247634
