# Overview

https://developers.google.com/edu/python

# Python Set Up

In [1]:
!python google-python-exercises/hello.py Google

Hello Google


In [2]:
!cat google-python-exercises/hello.py

#!/usr/bin/python -tt
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

"""A tiny Python program to check that Python is working.
Try running this program from the command line like this:
  python hello.py
  python hello.py Alice
That should print:
  Hello World -or- Hello Alice
Try changing the 'Hello' to 'Howdy' and run again.
Once you have that working, you're ready for class -- you can edit
and run Python code; now you just need to learn Python!
"""

import sys

# Define a main() function that prints a little greeting.
def main():
  # Get the name from the command line, using 'World' as a fallback.
  if len(sys.argv) >= 2:
    name = sys.argv[1]
  else:
    name = 'World'
  print('Hello', name)

# This is the standard boilerplate that calls the main() function.
if __name__ == '__main__':
  main

# Python Intro

**"Python is a dynamic, interpreted (bytecode-compiled) language. There are no type declarations of variables, parameters, functions, or methods in source code. This makes the code short and flexible, and you lose the compile-time type checking of the source code. Python tracks the types of all values at runtime and flags code that does not make sense as it runs."**

### Imports, Command-line arguments, and len()

"A Python module can be run directly — as above "python hello.py Bob" — or it can be imported and used by some other module. When a Python file is run directly, the special variable "\__name__" is set to "\__main__"."

**"Therefore, it's common to have the boilerplate if ```__name__ ==...``` shown above to call a main() function when the module is run directly, but not when the module is imported (sort of treats python file module as a library) by some other module."**

In [3]:
!python google-python-exercises/helloworld.py Google

Hello there Google
Command-line arguments are:  2
Command-line arguments list:  ['google-python-exercises/helloworld.py', 'Google']
First argument (sys 0) is:  google-python-exercises/helloworld.py
Second argument (sys 1) is:  Google


In [4]:
!cat google-python-exercises/helloworld.py

#!/usr/bin/env python

# import modules used here -- sys is a very standard one
import sys

# Gather our code in a main() function
def main():
    print('Hello there', sys.argv[1])
    print('Command-line arguments are: ', len(sys.argv))
    print('Command-line arguments list: ', sys.argv)
    print('First argument (sys 0) is: ', sys.argv[0])
    print('Second argument (sys 1) is: ', sys.argv[1])
    # Command line args are in sys.argv[1], sys.argv[2] ...
    # sys.argv[0] is the script name itself and can be ignored

# Standard boilerplate to call the main() function to begin
# the program.
if __name__ == '__main__':
    main()

### User-defined Functions

"The ```def``` keyword defines the function with its parameters within parentheses and its code indented. The first line of a function can be a documentation string ("docstring") that describes what the function does. The docstring can be a single line, or a multi-line description as in the example above. (Yes, those are "triple quotes," a feature unique to Python!) Variables defined in the function are local to that function, so the "result" in the above function is separate from a "result" variable in another function. The return statement can take an argument, in which case that is the value returned to the caller."

"At run time, functions must be defined by the execution of a "def" before they are called. It's typical to def a main() function towards the bottom of the file with the functions it calls above it."

***The main() function does not have to be named main()***



In [5]:
def repeat(s, exclaim):
    """
    Returns the string 's' repeated 3 times.
    If exclaim is true, add exclamation marks.
    """
    result = s*3
    if exclaim:
        result = result + '!!!'
    return result

In [6]:
print(repeat('Yay', False))
print(repeat('Yay', True))

YayYayYay
YayYayYay!!!


In [7]:
!python google-python-exercises/repeat.py

YayYayYay
YayYayYay!!!


In [8]:
!cat google-python-exercises/repeat.py

# Defines a "repeat" function that takes 2 arguments.
def repeat(s, exclaim):
    """
    Returns the string 's' repeated 3 times.
    If exclaim is true, add exclamation marks.
    """

    result = s + s + s # can also use "s * 3" which is faster (Why?)
    if exclaim:
        result = result + '!!!'
    return result

def execute():
    print(repeat('Yay', False))
    print(repeat('Yay', True))

if __name__ == '__main__':
    execute()

### Indentation

"Avoid using TABs as they greatly complicate the indentation scheme (not to mention TABs may mean different things on different platforms). Set your editor to insert spaces instead of TABs for Python code."

"A common question beginners ask is, "How many spaces should I indent?" According to the official Python style guide (PEP 8), you should indent with 4 spaces. (Fun fact: Google's internal style guideline dictates indenting by 2 spaces!)"

### Code Checked at Runtime

"The if-statement contains an obvious error, where the repeat() function is accidentally typed in as repeeeet(). The funny thing in Python ... this code compiles and runs fine so long as the name at runtime is not 'Guido'. Only when a run actually tries to execute the repeeeet() will it notice that there is no such function and raise an error. This just means that when you first run a Python program, some of the first errors you see will be simple typos like this. This is one area where languages with a more verbose type system, like Java, have an advantage ... they can catch such errors at compile time (but of course you have to maintain all that type information ... it's a tradeoff)."

In [9]:
!python google-python-exercises/guido.py Google

GoogleGoogle


In [10]:
!python google-python-exercises/guido.py Guido

Traceback (most recent call last):
  File "google-python-exercises/guido.py", line 17, in <module>
    main()
  File "google-python-exercises/guido.py", line 12, in main
    print(repeeeet(sys.argv[1]) + '!!!')
NameError: name 'repeeeet' is not defined


In [11]:
!cat google-python-exercises/guido.py

# Defines a "repeat" function that takes 1 argument.
import sys

def repeat(name):
    """
    Returns the string 'name' repeated 2 times.
    """
    return name*2

def main():
    if sys.argv[1] == 'Guido':
        print(repeeeet(sys.argv[1]) + '!!!')
    else:
        print(repeat(sys.argv[1]))

if __name__ == '__main__':
    main()

### Variable Names

"As far as actual naming goes, some languages prefer underscored_parts for variable names made up of "more than one word," but other languages prefer camelCasing. In general, Python prefers the underscore method but guides developers to defer to camelCasing if integrating into existing Python code that already uses that style."

### More on Modules and their Namespaces

""Python Standard Library." Commonly used modules/packages include:

**sys** — access to exit(), argv, stdin, stdout, ...

**re** — regular expressions

**os** — operating system interface, file system

You can find the documentation of all the Standard Library modules and packages at https://docs.python.org/3/library/. "

### Online help, help(), and dir()

https://developers.google.com/edu/python/introduction#online-help,-help,-and-dir

# Strings

In [12]:
s = 'hi'
print(s[1])
print(len(s))
print(s + ' there')
pi = 3.14
text = 'The value of pi is '  + str(pi); print(text)

i
2
hi there
The value of pi is 3.14


"A "raw" string literal is prefixed by an **'r'** and passes all the chars through without special treatment of backslashes, so r'x\nx' evaluates to the length-4 string 'x\nx'. A 'u' prefix allows you to write a unicode string literal (Python has lots of other unicode support features -- see the docs below)."

In [13]:
raw = r'this\t\n and that'
print(raw)

multi = """It was the best of times. 
It was the worst of times."""
print(multi)

this\t\n and that
It was the best of times. 
It was the worst of times.


### String Methods

"A method is like a function, but it runs "on" an object."

"Here are some of the most common string methods:

- s.lower(), s.upper() -- returns the lowercase or uppercase version of the string
- s.strip() -- returns a string with whitespace removed from the start and end
- s.isalpha()/s.isdigit()/s.isspace()... -- tests if all the string chars are in the various character classes
- s.startswith('other'), s.endswith('other') -- tests if the string starts or ends with the given other string
- s.find('other') -- searches for the given other string (not a regular expression) within s, and returns the first index where it begins or -1 if not found
- s.replace('old', 'new') -- returns a string where all occurrences of 'old' have been replaced by 'new'
- s.split('delim') -- returns a list of substrings separated by the given delimiter. The delimiter is not a regular expression, it's just text. 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']. As a convenient special case s.split() (with no arguments) splits on all whitespace chars.
- s.join(list) -- opposite of split(), joins the elements in the given list together using the string as the delimiter. e.g. '---'.join(['aaa', 'bbb', 'ccc']) -> aaa---bbb---ccc"

https://docs.python.org/3/library/stdtypes.html#string-methods

### String Slices

"The "slice" syntax is a handy way to refer to sub-parts of sequences -- typically strings and lists. The slice s[start:end] is the elements beginning at start and extending up to but not including end.

"Suppose we have s = "Hello"

- s[1:4] is 'ell' -- chars starting at index 1 and extending up to but not including index 4
- s[1:] is 'ello' -- omitting either index defaults to the start or end of the string
- s[:] is 'Hello' -- omitting both always gives us a copy of the whole thing (this is the pythonic way to copy a sequence like a string or list)
- s[1:100] is 'ello' -- an index that is too big is truncated down to the string length"

"As an alternative, Python uses negative numbers to give easy access to the chars at the end of the string: s[-1] is the last char 'o', s[-2] is 'l' the next-to-last char, and so on. Negative index numbers count back from the end of the string:

- s[-1] is 'o' -- last char (1st from the end)
- s[-4] is 'e' -- 4th from the end
- s[:-3] is 'He' -- going up to but not including the last 3 chars.
- s[-3:] is 'llo' -- starting with the 3rd char from the end and extending to the end of the string."

"It is a neat truism of slices that for any index n, ```s[:n] + s[n:] == s.```"

### String %

"Python has a printf()-like facility to put together a string. The % operator takes a printf-type format string on the left (**%d int, %s string, %f/%g floating point**), and the matching values in a tuple on the right (a tuple is made of values separated by commas, typically grouped inside parentheses)."

In [14]:
text = "%d little pigs come out, or I'll %s, and I'll %s, and I'll blow your %s down." % (3, 'huff', 'puff', 'house')
print(text)

3 little pigs come out, or I'll huff, and I'll puff, and I'll blow your house down.


"The above line is kind of long -- suppose you want to break it into separate lines. To fix this, enclose the whole expression in an outer set of parenthesis -- then the expression is allowed to span multiple lines. This code-across-lines technique works with the various grouping constructs detailed below: ( ), [ ], { }."

In [15]:
text = (
"%d little pigs come out, or I'll %s, and I'll %s, and I'll blow your %s down."
% (3, 'huff', 'puff', 'house'))
print(text)

3 little pigs come out, or I'll huff, and I'll puff, and I'll blow your house down.


"Python lets you cut a line up into chunks, which it will then automatically concatenate. So, to make this line even shorter, we can do this."

In [16]:
text = (
"%d little pigs come out, "
"or I'll %s, and I'll %s, "
"and I'll blow your %s down."
% (3, 'huff', 'puff', 'house'))
print(text)

3 little pigs come out, or I'll huff, and I'll puff, and I'll blow your house down.


### i18n Strings (Unicode)

"Regular Python strings are *not* unicode, they are just plain bytes. To create a unicode string, use the **'u'** prefix on the string literal"

In [17]:
ustring = u'A unicode \u018e string \xf1'
print(ustring)

A unicode Ǝ string ñ


"A unicode string is a different type of object from regular "str" string, but the unicode string is compatible (they share the common superclass "basestring"), and the various libraries such as regular expressions work correctly if passed a unicode string instead of a regular string.

To convert a unicode string to bytes with an encoding such as **'utf-8'**, call the ustring.encode('utf-8') method on the unicode string. Going the other direction, the str(s, encoding) function converts encoded plain bytes to a unicode string."

In [18]:
s = ustring.encode('utf-8')
print(s) ## bytes of utf-8 encoding
t = str(s, 'utf-8')
print(t) ## convert bytes back to a unicode string
t == ustring

b'A unicode \xc6\x8e string \xc3\xb1'
A unicode Ǝ string ñ


True

### If Statement

"Python uses the colon (:) and indentation/whitespace to group statements."

In [19]:
speed = 77
mood = 'good'

if speed >= 80:
    print('License and registration please')
if mood == 'terrible' or speed >= 100:
    print('You have the right to remain silent.')
elif mood == 'bad' or speed >= 90:
    print("I'm going to have to write you a ticket.")
    write_ticket()
else:
    print("Let's try to keep it under 80 ok?")

Let's try to keep it under 80 ok?


"If the code is short, you can put the code on the same line after ":"."

In [20]:
if speed >= 80: print('You are so busted')
else: print('Have a nice day')

Have a nice day


## Exercise: string1.py

In [21]:
# A. donuts
# Given an int count of a number of donuts, return a string
# of the form 'Number of donuts: <count>', where <count> is the number
# passed in. However, if the count is 10 or more, then use the word 'many'
# instead of the actual count.
# So donuts(5) returns 'Number of donuts: 5'
# and donuts(23) returns 'Number of donuts: many'
def donuts(count):
  if count < 10:
    return 'Number of donuts: %d' % (count)
  else:
    return 'Number of donuts: many'

In [22]:
print(donuts(5))
print(donuts(23))

Number of donuts: 5
Number of donuts: many


In [23]:
# B. both_ends
# Given a string s, return a string made of the first 2
# and the last 2 chars of the original string,
# so 'spring' yields 'spng'. However, if the string length
# is less than 2, return instead the empty string.
def both_ends(s):
  if len(s) >= 2:
    return s[:2] + s[-2:]
  else:
    return ''

In [24]:
print(both_ends('spring'))
print(both_ends('s'))

spng



In [25]:
# C. fix_start
# Given a string s, return a string
# where all occurences of its first char have
# been changed to '*', except do not change
# the first char itself.
# e.g. 'babble' yields 'ba**le'
# Assume that the string is length 1 or more.
# Hint: s.replace(stra, strb) returns a version of string s
# where all instances of stra have been replaced by strb.
def fix_start(s):
  first_char = s[0]
  right_chars = s[1:]
  replacer = right_chars.replace(first_char, '*')
  return first_char + replacer

In [26]:
fix_start('babble')

'ba**le'

In [27]:
# D. MixUp
# Given strings a and b, return a single string with a and b separated
# by a space '<a> <b>', except swap the first 2 chars of each string.
# e.g.
#   'mix', pod' -> 'pox mid'
#   'dog', 'dinner' -> 'dig donner'
# Assume a and b are length 2 or more.
def mix_up(a, b):
  a_swapped = b[:2] + a[2:]
  b_swapped = a[:2] + b[2:]
  return a_swapped + ' ' + b_swapped

In [28]:
mix_up('mix', 'pod')

'pox mid'

In [29]:
!python google-python-exercises/basic/string1.py

donuts
 OK  got: 'Number of donuts: 4' expected: 'Number of donuts: 4'
 OK  got: 'Number of donuts: 9' expected: 'Number of donuts: 9'
 OK  got: 'Number of donuts: many' expected: 'Number of donuts: many'
 OK  got: 'Number of donuts: many' expected: 'Number of donuts: many'
both_ends
 OK  got: 'spng' expected: 'spng'
 OK  got: 'Helo' expected: 'Helo'
 OK  got: '' expected: ''
 OK  got: 'xyyz' expected: 'xyyz'
fix_start
 OK  got: 'ba**le' expected: 'ba**le'
 OK  got: 'a*rdv*rk' expected: 'a*rdv*rk'
 OK  got: 'goo*le' expected: 'goo*le'
 OK  got: 'donut' expected: 'donut'
mix_up
 OK  got: 'pox mid' expected: 'pox mid'
 OK  got: 'dig donner' expected: 'dig donner'
 OK  got: 'spash gnort' expected: 'spash gnort'
 OK  got: 'fizzy perm' expected: 'fizzy perm'


In [30]:
!cat google-python-exercises/basic/string1.py

#!/usr/bin/python -tt
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

# Basic string exercises
# Fill in the code for the functions below. main() is already set up
# to call the functions with a few different inputs,
# printing 'OK' when each function is correct.
# The starter code for each function includes a 'return'
# which is just a placeholder for your code.
# It's ok if you do not complete all the functions, and there
# are some additional functions to try in string2.py.


# A. donuts
# Given an int count of a number of donuts, return a string
# of the form 'Number of donuts: <count>', where <count> is the number
# passed in. However, if the count is 10 or more, then use the word 'many'
# instead of the actual count.
# So donuts(5) returns 'Number of donuts: 5'
# and donuts(23) returns 'Number of d

## Exercise: string2.py

In [31]:
# D. verbing
# Given a string, if its length is at least 3,
# add 'ing' to its end.
# Unless it already ends in 'ing', in which case
# add 'ly' instead.
# If the string length is less than 3, leave it unchanged.
# Return the resulting string.
def verbing(s):
  if len(s) >= 3:
    if s[-3:] != 'ing':
        s = s + 'ing'
    else:
        s = s + 'ly'
  return s

In [32]:
verbing('string')

'stringly'

In [33]:
# E. not_bad
# Given a string, find the first appearance of the
# substring 'not' and 'bad'. If the 'bad' follows
# the 'not', replace the whole 'not'...'bad' substring
# with 'good'.
# Return the resulting string.
# So 'This dinner is not that bad!' yields:
# This dinner is good!
def not_bad(s):
  n = s.find('not')
  b = s.find('bad')
  if n != -1 and b != -1 and b > n:
    s = s[:n] + 'good' + s[b+3:]
  return s

In [34]:
not_bad('This dinner is not that bad!')

'This dinner is good!'

In [35]:
# F. front_back
# Consider dividing a string into two halves.
# If the length is even, the front and back halves are the same length.
# If the length is odd, we'll say that the extra char goes in the front half.
# e.g. 'abcde', the front half is 'abc', the back half 'de'.
# Given 2 strings, a and b, return a string of the form
#  a-front + b-front + a-back + b-back
def front_back(a, b):
  a_middle = int(len(a) / 2)
  b_middle = int(len(b) / 2)
  if len(a) % 2 == 1:
    a_middle = a_middle + 1
  if len(b) % 2 == 1:
    b_middle = b_middle + 1 
  return a[:a_middle] + b[:b_middle] + a[a_middle:] + b[b_middle:]

In [36]:
front_back('abcd', 'xy')

'abxcdy'

In [37]:
!python google-python-exercises/basic/string2.py

verbing
 OK  got: 'hailing' expected: 'hailing'
 OK  got: 'swimingly' expected: 'swimingly'
 OK  got: 'do' expected: 'do'
not_bad
 OK  got: 'This movie is good' expected: 'This movie is good'
 OK  got: 'This dinner is good!' expected: 'This dinner is good!'
 OK  got: 'This tea is not hot' expected: 'This tea is not hot'
 OK  got: "It's bad yet not" expected: "It's bad yet not"
front_back
 OK  got: 'abxcdy' expected: 'abxcdy'
 OK  got: 'abcxydez' expected: 'abcxydez'
 OK  got: 'KitDontenut' expected: 'KitDontenut'


In [38]:
!cat google-python-exercises/basic/string2.py

#!/usr/bin/python2.4 -tt
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

# Additional basic string exercises

# D. verbing
# Given a string, if its length is at least 3,
# add 'ing' to its end.
# Unless it already ends in 'ing', in which case
# add 'ly' instead.
# If the string length is less than 3, leave it unchanged.
# Return the resulting string.
def verbing(s):
  if len(s) >= 3:
    if s[-3:] != 'ing':
        s = s + 'ing'
    else:
        s = s + 'ly'
  return s


# E. not_bad
# Given a string, find the first appearance of the
# substring 'not' and 'bad'. If the 'bad' follows
# the 'not', replace the whole 'not'...'bad' substring
# with 'good'.
# Return the resulting string.
# So 'This dinner is not that bad!' yields:
# This dinner is good!
def not_bad(s):
  n = s.find('not')
  b = s.

# Lists

In [39]:
colors = ['red', 'blue', 'green']
print(colors[0])
print(colors[2])
print(len(colors))

red
green
3


"Assignment with an = on lists does not make a copy. Instead, assignment makes the two variables point to the one list in memory."

In [40]:
b = colors

"Makes a copy."

In [41]:
a = colors[:]
a[0] = 'purple'
a == b

False

### FOR and IN

"The *for* construct -- ```for var in list``` -- is an easy way to look at each element in a list (or other collection). Do not add or remove from the list during iteration."

In [42]:
squares = [1, 4, 9, 16]
sum = 0
for num in squares:
  sum += num
print(sum)

30


"The *in* construct on its own is an easy way to test if an element appears in a list (or other collection) -- ```value in collection``` -- tests if the value is in the collection, returning True/False."

In [43]:
list = ['larry', 'curly', 'moe']
if 'curly' in list:
  print('yay')

yay


### Range

"The **range(n)** function yields the numbers 0, 1, ... n-1, and range(a, b) returns a, a+1, ... b-1 -- up to but not including the last number. The combination of the for-loop and the range() function allow you to build a traditional numeric for loop."

In [44]:
for i in range(10):
  print(i)

0
1
2
3
4
5
6
7
8
9


In [45]:
for i in range(1, 3):
  print(i)

1
2


### While Loop

"The above for/in loops solves the common case of iterating over every element in a list, but the while loop gives you total control over the index numbers. Here's a while loop which accesses every 3rd element in a list."

In [46]:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
i = 0
while i < len(a):
  print(a[i])
  i = i + 3

0
3
6
9


### List Methods

"Here are some other common list methods.

- list.append(elem) -- adds a single element to the end of the list. Common error: does not return the new list, just modifies the original.
- list.insert(index, elem) -- inserts the element at the given index, shifting elements to the right.
- list.extend(list2) adds the elements in list2 to the end of the list. Using + or += on a list is similar to using extend().
- list.index(elem) -- searches for the given element from the start of the list and returns its index. Throws a ValueError if the element does not appear (use "in" to check without a ValueError).
- list.remove(elem) -- searches for the first instance of the given element and removes it (throws ValueError if not present)
- list.sort() -- sorts the list in place (does not return it). (The sorted() function shown later is preferred.)
- list.reverse() -- reverses the list in place (does not return it)
- list.pop(index) -- removes and returns the element at the given index. Returns the rightmost element if index is omitted (roughly the opposite of append())."

"Notice that these are *methods* on a list object, while len() is a function that takes the list (or string or whatever) as an argument."

In [47]:
l = ['larry', 'curly', 'moe']
l.append('shemp')
l.insert(0, 'xxx')
l.extend(['yyy', 'zzz'])
print(l)
print(l.index('curly'))

l.remove('curly')
l.pop(1)
print(l)

['xxx', 'larry', 'curly', 'moe', 'shemp', 'yyy', 'zzz']
2
['xxx', 'moe', 'shemp', 'yyy', 'zzz']


"Common error: note that the above methods do not *return* the modified list, they just modify the original list."

### List Build Up

"One common pattern is to start a list a the empty list **[]**, then use append() or extend() to add elements to it."

In [48]:
l = []
l.append('a')
l.append('b')
print(l)

['a', 'b']


### List Slices

In [49]:
l = ['a', 'b', 'c', 'd']
print(l[1:-1])
l[0:2] = 'z'
print(l)

['b', 'c']
['z', 'c', 'd']


# Sorting

"The **sorted()** function can be customized through optional arguments. The sorted() optional argument reverse=True, e.g. ```sorted(list, reverse=True)```, makes it sort backwards in **descending** (high to low) order; **ascending** (low to high) order is default."

In [50]:
strs = ['aa', 'BB', 'zz', 'CC']
nums = [1, 2, 3, 4]
print(sorted(strs))
print(sorted(strs, reverse=True))
print(sorted(nums))
print(sorted(nums, reverse=True))

['BB', 'CC', 'aa', 'zz']
['zz', 'aa', 'CC', 'BB']
[1, 2, 3, 4]
[4, 3, 2, 1]


### Custom Sorting With key=

"Sorted() takes an optional **"key="** specifying a "key" function that transforms each element before comparison. The key function takes in 1 value and returns 1 value, and the returned "proxy" value is used for the comparisons within the sort."

In [51]:
strs = ['ccc', 'aaaa', 'd', 'bb']
print(sorted(strs, key=len))
print(sorted(strs, key=str.lower))

['d', 'bb', 'ccc', 'aaaa']
['aaaa', 'bb', 'ccc', 'd']


"To use *key=* custom sorting, remember that you provide a function that takes one value and returns the proxy value to guide the sorting."

In [52]:
strs = ['xc', 'zb', 'yd' ,'wa']

def MyFn(s):
  return s[-1]

print(sorted(strs, key=MyFn))

['wa', 'zb', 'xc', 'yd']


# Tuples

"A *tuple* is a fixed size grouping of elements, such as an **(x, y)** co-ordinate. Tuples are like lists, except they are immutable and do not change size. Tuples play a sort of "struct" role in Python -- a convenient way to pass around a little logical, fixed size bundle of values."

In [53]:
tuple = (1, 2, 'hi')
print(len(tuple))
print(tuple[2])
#tuple[2] = 'bye' # NO, tuples cannot be changed
tuple = (1, 2, 'bye') # this works
print(tuple)

3
hi
(1, 2, 'bye')


"Size-1 tuple - the comma is necessary to distinguish the tuple from the ordinary case of putting an expression in parentheses."

In [54]:
tuple = ('hi',)
print(tuple)

('hi',)


In [55]:
(x, y, z) = (42, 13, "hike")
print(z)

hike


# List Comprehensions

"A *list comprehension* is a compact way to write an expression that expands to a whole list."

In [56]:
nums = [1, 2, 3, 4]
squares = [ n * n for n in nums ]

"The syntax is ```[ expr for var in list ]``` -- the ```for var in list``` looks like a regular for-loop, but without the colon (:). The expr to its left is evaluated once for each element to give the values for the new list. Here is an example with strings, where each string is changed to upper case with '!!!' appended:"

In [57]:
strs = ['hello', 'and', 'goodbye']
shouting = [ s.upper() + '!!!' for s in strs ]
print(shouting)

['HELLO!!!', 'AND!!!', 'GOODBYE!!!']


In [58]:
nums = [2, 8, 1, 6]
small = [ n for n in nums if n <= 2 ]
print(small)

fruits = ['apple', 'cherry', 'banana', 'lemon']
afruits = [ s.upper() for s in fruits if 'a' in s ]
print(afruits)

[2, 1]
['APPLE', 'BANANA']


## Exercise: list1.py

In [59]:
# A. match_ends
# Given a list of strings, return the count of the number of
# strings where the string length is 2 or more and the first
# and last chars of the string are the same.
# Note: python does not have a ++ operator, but += works.
def match_ends(words):
  count = 0
  for i in words:
    if len(i) >= 2 and i[0] == i[-1]:
      count = count + 1
  return count

In [60]:
match_ends(['aaa', 'bbb', 'ccc', 'abc'])

3

In [61]:
# B. front_x
# Given a list of strings, return a list with the strings
# in sorted order, except group all the strings that begin with 'x' first.
# e.g. ['mix', 'xyz', 'apple', 'xanadu', 'aardvark'] yields
# ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
# Hint: this can be done by making 2 lists and sorting each of them
# before combining them.
def front_x(words):
  x_list = []
  other_list = []
  for i in words:
    if i.startswith('x'):
      x_list.append(i)
    else:
      other_list.append(i)
  return sorted(x_list) + sorted(other_list)

In [62]:
front_x(['mix', 'xyz', 'apple', 'xanadu', 'aardvark'])

['xanadu', 'xyz', 'aardvark', 'apple', 'mix']

In [63]:
# C. sort_last
# Given a list of non-empty tuples, return a list sorted in increasing
# order by the last element in each tuple.
# e.g. [(1, 7), (1, 3), (3, 4, 5), (2, 2)] yields
# [(2, 2), (1, 3), (3, 4, 5), (1, 7)]
# Hint: use a custom key= function to extract the last element form each tuple.
def last(a):
  return a[-1]

def sort_last(tuples):
  return sorted(tuples, key=last)

In [64]:
sort_last([(1, 7), (1, 3), (3, 4, 5), (2, 2)])

[(2, 2), (1, 3), (3, 4, 5), (1, 7)]

In [65]:
!python google-python-exercises/basic/list1.py

match_ends
 OK  got: 3 expected: 3
 OK  got: 2 expected: 2
 OK  got: 1 expected: 1
front_x
 OK  got: ['xaa', 'xzz', 'axx', 'bbb', 'ccc'] expected: ['xaa', 'xzz', 'axx', 'bbb', 'ccc']
 OK  got: ['xaa', 'xcc', 'aaa', 'bbb', 'ccc'] expected: ['xaa', 'xcc', 'aaa', 'bbb', 'ccc']
 OK  got: ['xanadu', 'xyz', 'aardvark', 'apple', 'mix'] expected: ['xanadu', 'xyz', 'aardvark', 'apple', 'mix']
sort_last
 OK  got: [(2, 1), (3, 2), (1, 3)] expected: [(2, 1), (3, 2), (1, 3)]
 OK  got: [(3, 1), (1, 2), (2, 3)] expected: [(3, 1), (1, 2), (2, 3)]
 OK  got: [(2, 2), (1, 3), (3, 4, 5), (1, 7)] expected: [(2, 2), (1, 3), (3, 4, 5), (1, 7)]


In [66]:
!cat google-python-exercises/basic/list1.py

#!/usr/bin/python -tt
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

# Basic list exercises
# Fill in the code for the functions below. main() is already set up
# to call the functions with a few different inputs,
# printing 'OK' when each function is correct.
# The starter code for each function includes a 'return'
# which is just a placeholder for your code.
# It's ok if you do not complete all the functions, and there
# are some additional functions to try in list2.py.

# A. match_ends
# Given a list of strings, return the count of the number of
# strings where the string length is 2 or more and the first
# and last chars of the string are the same.
# Note: python does not have a ++ operator, but += works.
def match_ends(words):
  count = 0
  for i in words:
    if len(i) >= 2 and i[0] == i[-1]:
  

## Exercise: list2.py

In [67]:
# D. Given a list of numbers, return a list where
# all adjacent == elements have been reduced to a single element,
# so [1, 2, 2, 3] returns [1, 2, 3]. You may create a new list or
# modify the passed in list.
def remove_adjacent(nums):
  result = []
  for num in nums:
    if len(result) == 0 or num != result[-1]:
      result.append(num)
  return result

In [68]:
remove_adjacent([1, 2, 2, 3])

[1, 2, 3]

In [69]:
# E. Given two lists sorted in increasing order, create and return a merged
# list of all the elements in sorted order. You may modify the passed in lists.
# Ideally, the solution should work in "linear" time, making a single
# pass of both lists.
def linear_merge(list1, list2):
  result = []
  # Look at the two lists so long as both are non-empty.
  # Take whichever element [0] is smaller.
  while len(list1) and len(list2):
    if list1[0] < list2[0]:
      result.append(list1.pop(0))
    else:
      result.append(list2.pop(0))

  result.extend(list1)
  result.extend(list2)
  return result

In [70]:
linear_merge(['aa', 'xx', 'zz'], ['aa', 'bb', 'cc', 'xx', 'zz'])

['aa', 'aa', 'bb', 'cc', 'xx', 'xx', 'zz', 'zz']

In [71]:
!python google-python-exercises/basic/list2.py

remove_adjacent
 OK  got: [1, 2, 3] expected: [1, 2, 3]
 OK  got: [2, 3] expected: [2, 3]
 OK  got: [] expected: []
linear_merge
 OK  got: ['aa', 'bb', 'cc', 'xx', 'zz'] expected: ['aa', 'bb', 'cc', 'xx', 'zz']
 OK  got: ['aa', 'bb', 'cc', 'xx', 'zz'] expected: ['aa', 'bb', 'cc', 'xx', 'zz']
 OK  got: ['aa', 'aa', 'aa', 'bb', 'bb'] expected: ['aa', 'aa', 'aa', 'bb', 'bb']


In [72]:
!cat google-python-exercises/basic/list2.py

#!/usr/bin/python -tt
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

# Additional basic list exercises

# D. Given a list of numbers, return a list where
# all adjacent == elements have been reduced to a single element,
# so [1, 2, 2, 3] returns [1, 2, 3]. You may create a new list or
# modify the passed in list.
def remove_adjacent(nums):
  result = []
  for num in nums:
    if len(result) == 0 or num != result[-1]:
      result.append(num)
  return result

# E. Given two lists sorted in increasing order, create and return a merged
# list of all the elements in sorted order. You may modify the passed in lists.
# Ideally, the solution should work in "linear" time, making a single
# pass of both lists.
def linear_merge(list1, list2):
  result = []
  # Look at the two lists so long as both are non-empt

# Dicts and Files

### Dict Hash Table

"Python's efficient key/value hash table structure is called a **"dict"**. The contents of a dict can be written as a series of key:value pairs within braces { }, e.g. **dict = {key1:value1, key2:value2, ... }**. The "empty dict" is just an empty pair of curly braces {}."

In [73]:
dict = {}
dict['a'] = 'alpha'
dict['o'] = 'omega'
dict['g'] = 'gamma'
print(dict)

{'a': 'alpha', 'o': 'omega', 'g': 'gamma'}


In [74]:
print(dict['a'])
dict['a'] = 6
'a' in dict

alpha


True

In [75]:
#print(dict['z'])
if 'z' in dict: print(dict['z'])
print(dict.get('z'))

None


"A for loop on a dictionary iterates over its keys by default. The keys will appear in an arbitrary order. The methods **dict.keys()** and **dict.values()** return lists of the keys or values explicitly."

In [76]:
for key in dict: print(key)

a
o
g


In [77]:
for key in dict.keys(): print(key)

a
o
g


In [78]:
print(dict.keys())
print(dict.values())

dict_keys(['a', 'o', 'g'])
dict_values([6, 'omega', 'gamma'])


In [79]:
for key in sorted(dict.keys()):
    print(key, dict[key])

a 6
g gamma
o omega


"Items() which returns a list of (key, value) tuples, which is the most efficient way to examine all the key value data in the dictionary."

In [80]:
print(dict.items())

dict_items([('a', 6), ('o', 'omega'), ('g', 'gamma')])


In [81]:
for k, v in dict.items(): print(k, '->', v)

a -> 6
o -> omega
g -> gamma


"There are "iter" variants of these methods called iterkeys(), itervalues() and iteritems() which avoid the cost of constructing the whole list -- a performance win if the data is huge. From a performance point of view, the dictionary is one of your greatest tools, and you should use it where you can as an easy way to organize data. "

### Dict Formatting

"The **%** operator works conveniently to substitute values from a dict into a string by name."

In [82]:
hash = {}
hash['word'] = 'garfield'
hash['count'] = 42
s = 'I want %(count)d copies of %(word)s' % hash # %d for int, %s for string
print(hash)
print(s)

{'word': 'garfield', 'count': 42}
I want 42 copies of garfield


### Del

"The **"del"** operator does deletions. In the simplest case, it can remove the definition of a variable, as if that variable had not been defined. Del can also be used on list elements or slices to delete that part of the list and to delete entries from a dictionary."

In [83]:
var = 6
del var

In [84]:
list = ['a', 'b', 'c', 'd']
del list[0]
del list[-2:]
print(list)

['b']


In [85]:
dict = {'a':1, 'b':2, 'c':3}
del dict['b']
print(dict)

{'a': 1, 'c': 3}


# Files

"The **open()** function opens and returns a file handle that can be used to read or write a file in the usual way. The code ```f = open('name', 'r')``` opens the file into the variable f, ready for reading operations, and use ```f.close()``` when finished. Instead of **'r'**, use **'w'** for writing, and **'a'** for append."

In [86]:
f = open('google-python-exercises/foo.txt', 'r')
for line in f:
  print(line,) # trailing , so print does not add an end-of-line char
f.close()

"The ```f.readlines()``` method reads the whole file into memory and returns its contents as a list of its lines. The ```f.read()``` method reads the whole file into a single string, which can be a handy way to deal with the text all at once. For writing, ```f.write(string)``` method is the easiest way to write data to an open output file."

## Exercise: wordcount.py

In [87]:
filename = './google-python-exercises/NOTICE.txt'

In [88]:
def word_count_dict(filename):
  word_count = {}
  input_file = open(filename, 'r')
  for line in input_file:
    words = line.split()
    for word in words:
      word = word.lower()
      if not word in word_count:
        word_count[word] = 1
      else:
        word_count[word] = word_count[word] + 1
  input_file.close()
  return word_count

In [89]:
word_count_dict(filename)

{'code': 2,
 'for': 1,
 "google's": 1,
 'python': 1,
 'class': 1,
 'copyright': 1,
 '2010': 1,
 'google': 2,
 'inc.': 2,
 'this': 1,
 'developed': 1,
 'by': 1,
 'nick': 1,
 'parlante': 1,
 'at': 1}

In [90]:
def print_words(filename):
  word_count = word_count_dict(filename)
  words = sorted(word_count.keys())
  for word in words:
    print('key: ', word, '->', 'value: ', word_count[word])

In [91]:
print_words(filename)

key:  2010 -> value:  1
key:  at -> value:  1
key:  by -> value:  1
key:  class -> value:  1
key:  code -> value:  2
key:  copyright -> value:  1
key:  developed -> value:  1
key:  for -> value:  1
key:  google -> value:  2
key:  google's -> value:  1
key:  inc. -> value:  2
key:  nick -> value:  1
key:  parlante -> value:  1
key:  python -> value:  1
key:  this -> value:  1


In [92]:
def get_count(word_count_tuple):
  return word_count_tuple[1] # (key, value)

def print_top(filename):
  word_count = word_count_dict(filename)
  # Each item is a (word, count) tuple.
  items = sorted(word_count.items(), key=get_count, reverse=True)
  for item in items[:3]:
    print('key: ', item[0], '->', 'value: ', item[1])

In [93]:
print_top(filename)

key:  code -> value:  2
key:  google -> value:  2
key:  inc. -> value:  2


In [94]:
!python google-python-exercises/basic/wordcount.py --topcount ./google-python-exercises/NOTICE.txt

key:  code -> value:  2
key:  google -> value:  2
key:  inc. -> value:  2


In [95]:
!python google-python-exercises/basic/wordcount.py --rank ./google-python-exercises/NOTICE.txt

unknown option: --rank


In [96]:
!cat google-python-exercises/basic/wordcount.py

#!/usr/bin/python -tt
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

"""Wordcount exercise
Google's Python class

The main() below is already defined and complete. It calls print_words()
and print_top() functions which you write.

1. For the --count flag, implement a print_words(filename) function that counts
how often each word appears in the text and prints:
word1 count1
word2 count2
...

Print the above list in order sorted by word (python will sort punctuation to
come before letters -- that's fine). Store all the words as lowercase,
so 'The' and 'the' count as the same word.

2. For the --topcount flag, implement a print_top(filename) which is similar
to print_words() but which prints just the top 20 most common words sorted
so the most common word is first, then the next most common, and so on.


# Regular Expressions

"*Regular expressions* are a powerful language for matching text patterns. The re.search() method takes a regular expression pattern and a string and searches for that pattern within the string. ```  match = re.search(pat, str)```."

In [99]:
import re

def Search(pattern, text):
  match = re.search(pattern, text)
  if match:
    print(match.group())
  else:
    print('not found')

### Basic Patterns

"Here are the most basic patterns which match single chars.

- a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)

- . (a period) -- matches any single character except newline '\n'

- \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character.

- \b -- boundary between word and non-word

- \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character.

- \t, \n, \r -- tab, newline, return

- \d -- decimal digit [0-9] (some older regex utilities do not support but \d, but they all support \w and \s)

- ^ = start, $ = end -- match the start or end of the string

- \ -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\ to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a character."

In [100]:
Search(r'iii', 'piiig')
Search(r'igs', 'piiig')
Search(r'..g', 'piiig')
Search(r'\d\d\d', 'p123g')
Search(r'\w\w\w', '@@abcd!!')

iii
not found
iig
123
abc


### Repetition

"+ and * are used to specify reptition in the pattern.

- \+ -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's
- \* -- 0 or more occurrences of the pattern to its left
- ? -- match 0 or 1 occurrences of the pattern to its left"

"The + and * are **"greedy"** -- first the search finds the leftmost match for the pattern, and second it tries to use up as much of the string as possible."

In [101]:
Search(r'pi+', 'piiig')
Search(r'i+', 'piigiiii')
Search(r'\d\s*\d\s*\d', 'xx1 2   3xx')
Search(r'\d\s*\d\s*\d', 'xx12  3xx')
Search(r'\d\s*\d\s*\d', 'xx123xx')
Search(r'^b\w+', 'foobar')
Search(r'b\w+', 'foobar')

piii
ii
1 2   3
12  3
123
not found
bar


### Square Brackets

"Square brackets can be used to indicate a set of chars, so **[abc]** matches 'a' or 'b' or 'c'. The codes \w, \s etc. work inside square brackets too with the one exception that dot **(.)** just means a literal dot. You can also use a dash to indicate a range, so **[a-z]** matches all lowercase letters. To use a dash without indicating a range, put the dash last, e.g. **[abc-]**. An up-hat **(^)** at the start of a square-bracket set inverts it, so **[^ab]** means any char except 'a' or 'b'."

In [102]:
Search(r'[\w.-]+@[\w.-]+', 'purple alice-b@google.com monkey dishwasher')

alice-b@google.com


### Group Extraction

"The **"group"** feature of a regular expression allows you to pick out parts of the matching text. The parenthesis do not change what the pattern will match, instead they establish logical "groups" inside of the match text."

In [103]:
def MatchGroup(pattern, text):
  matches = re.search(pattern, text)
  if matches:
    print(matches.group())
    print(matches.group(1))
    print(matches.group(2))
  else:
    print('not found')

In [104]:
MatchGroup(r'([\w.-]+)@([\w.-]+)', 'purple alice-b@google.com monkey dishwasher')

alice-b@google.com
alice-b
google.com


### findall

"**findall()** finds *all* the matches and returns them as a list of strings, with each string representing one match."

In [105]:
def Find(pattern, text):
  matches = re.findall(pattern, text)
  for match in matches:
    print(match)

In [106]:
Find(r'[\w\.-]+@[\w\.-]+', 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher')

alice@google.com
bob@abc.com


### findall With Files

In [107]:
f = open('google-python-exercises/NOTICE.txt', 'r')
strings = re.findall(r'\w+', f.read())
strings

['Code',
 'for',
 'Google',
 's',
 'Python',
 'Class',
 'Copyright',
 '2010',
 'Google',
 'Inc',
 'This',
 'code',
 'developed',
 'by',
 'Nick',
 'Parlante',
 'at',
 'Google',
 'Inc']

### findall and Groups

"If the pattern includes 2 or more parenthesis groups, then instead of returning a list of strings, findall() returns a list of ***tuples***. If the pattern includes no parenthesis, then findall() returns a list of found strings. If the pattern includes a single set of parenthesis, then findall() returns a list of strings corresponding to that single group."

In [108]:
def TupleFind(pattern, text):
  tuples = re.findall(pattern, text)
  print(tuples)
  for tuple in tuples:
    print(tuple[0])
    print(tuple[1])

In [109]:
TupleFind(r'([\w\.-]+)@([\w\.-]+)', 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher')

[('alice', 'google.com'), ('bob', 'abc.com')]
alice
google.com
bob
abc.com


### Options

"The option flag is added as an extra argument to the **search()** or **findall()**.

- IGNORECASE -- ignore upper/lowercase differences for matching, so 'a' matches both 'a' and 'A'.

- DOTALL -- allow dot (.) to match newline -- normally it matches anything but newline. This can trip you up -- you think .* matches everything, but by default it does not go past the end of a line. Note that \s (whitespace) includes newlines, so if you want to match a run of whitespace that may include a newline, you can just use \s*

- MULTILINE -- Within a string made of many lines, allow ^ and $ to match the start and end of each line.

Normally ^/$ would just match the start and end of the whole string.

"There is an extension to regular expression where you add a ? at the end, such as .*? or .+?, changing them to be non-greedy."

### Substitution (optional)

"The ```re.sub(pat, replacement, str)``` function searches for all the instances of pattern in the given string, and replaces them."

In [110]:
str = 'purple alice@google.com, blah monkey bob@abc.com blah dishwasher'
# \1 is group(1), \2 group(2) in the replacement
print(re.sub(r'([\w\.-]+)@([\w\.-]+)', r'\1@yo-yo-dyne.com', str))

purple alice@yo-yo-dyne.com, blah monkey bob@yo-yo-dyne.com blah dishwasher


## Exercise: babynames.py

In [111]:
filename = './google-python-exercises/babynames/baby1990.html'

In [112]:
def extract_names(filename):
  names = []
  f = open(filename, 'r')
  text = f.read() # read in whole text file
  year_match = re.search(r'Popularity\sin\s(\d\d\d\d)', text)
  if not year_match:
    sys.stderr.write('Couldn\'t find the year!\n')
    sys.exit(1)
  year = year_match.group(1)
  names.append(year)
  
  # each tuple is: (rank, boy-name, girl-name)
  tuples = re.findall(r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>', text)
  names_to_rank =  {} 
  for rank_tuple in tuples:
    (rank, boyname, girlname) = rank_tuple  # unpack the tuple into 3 vars
    if boyname not in names_to_rank:
      names_to_rank[boyname] = rank
    if girlname not in names_to_rank:
      names_to_rank[girlname] = rank

  sorted_names = sorted(names_to_rank.keys())
  for name in sorted_names:
    names.append(name + " " + names_to_rank[name])
  return names

In [113]:
extract_names(filename)[:10]

['1990',
 'Aaron 34',
 'Abbey 482',
 'Abbie 685',
 'Abby 222',
 'Abdul 934',
 'Abel 384',
 'Abigail 90',
 'Abraham 246',
 'Abram 920']

In [114]:
!python google-python-exercises/babynames/babynames.py \
  --summaryfile ./google-python-exercises/babynames/baby*.html

In [115]:
!ls google-python-exercises/babynames/

baby1990.html         baby1998.html         baby2006.html
baby1990.html.summary baby1998.html.summary baby2006.html.summary
baby1992.html         baby2000.html         baby2008.html
baby1992.html.summary baby2000.html.summary baby2008.html.summary
baby1994.html         baby2002.html         [31mbabynames.py[m[m
baby1994.html.summary baby2002.html.summary [30m[43msolution[m[m
baby1996.html         baby2004.html
baby1996.html.summary baby2004.html.summary


In [116]:
!grep 'Cornelius' google-python-exercises/babynames/*summary

google-python-exercises/babynames/baby1990.html.summary:Cornelius 487
google-python-exercises/babynames/baby1992.html.summary:Cornelius 542
google-python-exercises/babynames/baby1994.html.summary:Cornelius 629
google-python-exercises/babynames/baby1996.html.summary:Cornelius 675
google-python-exercises/babynames/baby1998.html.summary:Cornelius 702
google-python-exercises/babynames/baby2000.html.summary:Cornelius 791
google-python-exercises/babynames/baby2002.html.summary:Cornelius 862
google-python-exercises/babynames/baby2004.html.summary:Cornelius 967
google-python-exercises/babynames/baby2006.html.summary:Cornelius 939
google-python-exercises/babynames/baby2008.html.summary:Cornelius 988


In [117]:
!cat google-python-exercises/babynames/babynames.py

#!/usr/bin/python
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

import sys
import re

"""Baby Names exercise

Define the extract_names() function below and change main()
to call it.

For writing regex, it's nice to include a copy of the target
text for inspiration.

Here's what the html looks like in the baby.html files:
...
<h3 align="center">Popularity in 1990</h3>
....
<tr align="right"><td>1</td><td>Michael</td><td>Jessica</td>
<tr align="right"><td>2</td><td>Christopher</td><td>Ashley</td>
<tr align="right"><td>3</td><td>Matthew</td><td>Brittany</td>
...

Suggested milestones for incremental development:
 -Extract the year and print it
 -Extract the names and rank numbers and just print them
 -Get the names data into a dict and print it
 -Build the [year, 'name rank', ... ] list and print 

# Utilities

### File System -- os, os.path, shutil

"The **os** and **os.path** modules include many functions to interact with the file system. The **shutil** module can copy files.

- os module docs (https://docs.python.org/3/library/os.html)

- filenames = os.listdir(dir) -- list of filenames in that directory path (not including . and ..). The filenames are just the names in the directory, not their absolute paths.

- os.path.join(dir, filename) -- given a filename from the above list, use this to put the dir and filename together to make a path

- os.path.abspath(path) -- given a path, return an absolute form, e.g. /home/nick/foo/bar.html

- os.path.dirname(path), os.path.basename(path) -- given dir/foo/bar.html, return the dirname "dir/foo" and basename "bar.html"

- os.path.exists(path) -- true if it exists

- os.mkdir(dir_path) -- makes one dir, os.makedirs(dir_path) makes all the needed dirs in this path

- shutil.copy(source-path, dest-path) -- copy a file (dest path directories should exist)"


In [118]:
import os

def ListDir(dir):
  filenames = os.listdir(dir)
  for filename in filenames:
    print(filename)
    print(os.path.join(dir, filename))
    print(os.path.abspath(os.path.join(dir, filename)))
    print('\n')

In [119]:
ListDir('./google-python-exercises/copyspecial/')

tmp2
./google-python-exercises/copyspecial/tmp2
/Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/tmp2


zz__something__.jpg
./google-python-exercises/copyspecial/zz__something__.jpg
/Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/zz__something__.jpg


solution
./google-python-exercises/copyspecial/solution
/Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/solution


copyspecial.py
./google-python-exercises/copyspecial/copyspecial.py
/Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/copyspecial.py


.ipynb_checkpoints
./google-python-exercises/copyspecial/.ipynb_checkpoints
/Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/.ipynb_checkpoints


xyz__hello__.txt
./google-python-exercises/copyspecial/xyz__hello__.txt
/Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/co

In [120]:
# dir(os)
# help(os.listdir)
# dir(os.path)
# help(os.path.dirname)

### Running External Processes -- commands

"The **subprocess** module is a simple way to run an external command and capture its output.

- subprocess module docs (https://docs.python.org/2/library/subprocess.html#module-subprocess)

- (status, output) = commands.getstatusoutput(cmd) -- runs the command, waits for it to exit, and returns its status int and output text as a tuple. The command is run with its standard output and standard error combined into the one output text. The status will be non-zero if the command failed. Since the standard-err of the command is captured, if it fails, we need to print some indication of what happened.

- output = commands.getoutput(cmd) -- as above, but without the status int.

- There is a commands.getstatus() but it does something else, so don't use it -- dumbest bit of method naming ever!

- There is also a simple os.system(cmd) which runs the command and dumps its output onto your output and returns its error code. This works if you want to run the command but do not need to capture its output into your python data structures."

In [121]:
import subprocess

def ExecDir(dir):
  cmd = 'ls -l ' + dir
  print("Command to run:", cmd)
  (status, output) = subprocess.getstatusoutput(cmd)
  if status:
    sys.stderr.write(output)
    sys.exit(status)
  print(output)
  print(status)

In [122]:
ExecDir('./google-python-exercises/copyspecial/')

Command to run: ls -l ./google-python-exercises/copyspecial/
total 160
-rwxrwxrwx@ 1 grp  staff   1942 Jul 26 12:38 copyspecial.py
drwxrwxrwx@ 3 grp  staff     96 Dec 21  2009 solution
drwxr-xr-x  7 grp  staff    224 Jul 26 12:44 tmp
drwxr-xr-x  4 grp  staff    128 Jul 26 12:52 tmp2
-rw-rw-rw-@ 1 grp  staff     65 Mar  8  2010 xyz__hello__.txt
-rw-r--r--@ 1 grp  staff  72314 Mar 11  2010 zz__something__.jpg
0


### Exceptions

"An exception (https://docs.python.org/3/tutorial/errors.html & https://docs.python.org/3/library/exceptions.html) represents a run-time error that halts the normal execution at a particular line and transfers control to error handling code. The **"try/except"** structure can handle expections."

In [123]:
import sys

filename = 'test.txt'

try:
  f = open(filename, 'r')
  text = f.read()
  f.close()
except IOError:
  sys.stderr.write('problem reading:' + filename)

problem reading:test.txt

"The **try**: section includes the code which might throw an exception. The **except**: section holds the code to run if there is an exception. If there is no exception, the **except**: section is skipped (that is, that code is for error handling only, not the "normal" case for the code)."

### HTTP -- urllib and urlparse

"The module **urllib.request** provides url fetching -- making a url look like a file you can read from. The **urllib.parse** module can take apart and put together urls.

- urllib module docs (https://docs.python.org/3/library/urllib.html)

- text = ufile.read() -- can read from it, like a file (readlines() etc. also work)

- info = ufile.info() -- the meta info for that request. info.get_content_type() is the mime type, e.g. 'text/html'

- baseurl = ufile.geturl() -- gets the "base" url for the request, which may be different from the original because of redirects

- urllib.urlretrieve(url, filename) -- downloads the url data to the given file path

- urlparse.urljoin(baseurl, url) -- given a url that may or may not be full, and the baseurl of the page it comes from, return a full url. Use geturl() above to provide the base url."

In [124]:
import urllib

def wget(url):
  ufile = urllib.request.urlopen(url)
  info = ufile.info() 
  if info.get_content_type() == 'text/html':
    print('base url: ' + ufile.geturl())
    text = ufile.read()
    #print(text)

In [125]:
wget('https://www.google.com/')

base url: https://www.google.com/


In [126]:
def wget2(url):
  try:
    ufile = urllib.request.urlopen(url)
    if ufile.info().get_content_type() == 'text/html':
      print(ufile.read())
  except IOError:
    print('problem reading url:', url)

In [127]:
wget2('https://www.gooogle.com/')

problem reading url: https://www.gooogle.com/


## Exercise: copyspecial.py

In [128]:
dirname = './google-python-exercises/copyspecial/'
to_dir = './google-python-exercises/copyspecial/tmp/'
zipfile = './google-python-exercises/copyspecial/tmp/files.zip'

In [129]:
import os
import re

def get_special_paths(dirname):
  result = []
  paths = os.listdir(dirname)
  for fname in paths:
    match = re.search(r'__(\w+)__', fname)
    if match:
      result.append(os.path.abspath(os.path.join(dirname, fname)))
  return result

In [147]:
paths = get_special_paths(dirname)
print(paths)

['/Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/zz__something__.jpg', '/Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/xyz__hello__.txt']


In [148]:
import shutil

def copy_to(paths, to_dir):
  if not os.path.exists(to_dir):
    os.mkdir(to_dir)
  for path in paths:
    fname = os.path.basename(path)
    shutil.copy(path, os.path.join(to_dir, fname))

In [149]:
copy_to(paths, to_dir)

In [150]:
!ls ./google-python-exercises/copyspecial/tmp/

files.zip           xyz__hello__.txt    zz__something__.jpg


In [156]:
import subprocess

def zip_to(paths, zipfile):
  cmd = 'zip -j ' + zipfile + ' ' + ' '.join(paths)
  print("Command I'm going to do: " + cmd)
  (status, output) = subprocess.getstatusoutput(cmd)
  if status:
    sys.stderr.write(output)
    sys.exit(1)

In [157]:
zip_to(paths, zipfile)

Command I'm going to do: zip -j ./google-python-exercises/copyspecial/tmp/files.zip /Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/zz__something__.jpg /Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/xyz__hello__.txt


In [158]:
!ls ./google-python-exercises/copyspecial/tmp/

files.zip           xyz__hello__.txt    zz__something__.jpg


In [159]:
!python google-python-exercises/copyspecial/copyspecial.py \
  --tozip ./google-python-exercises/copyspecial/tmp/files2.zip \
  ./google-python-exercises/copyspecial/tmp/

Command I'm going to do:zip -j ./google-python-exercises/copyspecial/tmp/files2.zip /Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/tmp/zz__something__.jpg /Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/tmp/xyz__hello__.txt


In [160]:
!ls ./google-python-exercises/copyspecial/tmp/

files.zip           files2.zip          xyz__hello__.txt    zz__something__.jpg


In [161]:
!python google-python-exercises/copyspecial/copyspecial.py \
  --todir ./google-python-exercises/copyspecial/tmp2/ \
  ./google-python-exercises/copyspecial/

In [162]:
!ls ./google-python-exercises/copyspecial/tmp2/

xyz__hello__.txt    zz__something__.jpg


In [163]:
!python google-python-exercises/copyspecial/copyspecial.py \
  --tozip /test/path/tmp.zip \
  ./google-python-exercises/copyspecial/

Command I'm going to do:zip -j /test/path/tmp.zip /Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/zz__something__.jpg /Users/grp/Documents/BIGDATA/GCP/google-python-class/google-python-exercises/copyspecial/xyz__hello__.txt
zip I/O error: No such file or directory
zip error: Could not create output file (/test/path/tmp.zip)

In [164]:
!cat google-python-exercises/copyspecial/copyspecial.py

#!/usr/bin/python
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

import sys
import re
import os
import shutil
import subprocess

"""Copy Special exercise
"""

def get_special_paths(dirname):
  result = []
  paths = os.listdir(dirname)
  for fname in paths:
    match = re.search(r'__(\w+)__', fname)
    if match:
      result.append(os.path.abspath(os.path.join(dirname, fname)))
  return result

def copy_to(paths, to_dir):
  if not os.path.exists(to_dir):
    os.mkdir(to_dir)
  for path in paths:
    fname = os.path.basename(path)
    shutil.copy(path, os.path.join(to_dir, fname))

def zip_to(paths, zipfile):
  cmd = 'zip -j ' + zipfile + ' ' + ' '.join(paths)
  print("Command I'm going to do:" + cmd)
  (status, output) = subprocess.getstatusoutput(cmd)
  if status:
    sys.stderr.write(out

## Exercise: logpuzzle.py

In [165]:
filename = './google-python-exercises/logpuzzle/animal_code.google.com'
dest_dir = './google-python-exercises/logpuzzle/output/'

In [166]:
def url_sort_key(url):
  match = re.search(r'-(\w+)-(\w+)\.\w+', url)
  if match:
    return match.group(2)
  else:
    return url

def read_urls(filename):
  underbar = filename.index('_')
  host = filename[underbar + 1:]
  url_dict = {}
  f = open(filename)
  for line in f:
    match = re.search(r'"GET (\S+)', line)
    if match:
      path = match.group(1)
      if 'puzzle' in path:
        url_dict['http://' + host + path] = 1
  return sorted(url_dict.keys(), key=url_sort_key)

In [167]:
img_urls = read_urls(filename)
print(img_urls)

['http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baaa.jpg', 'http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baab.jpg', 'http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baac.jpg', 'http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baad.jpg', 'http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baae.jpg', 'http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baaf.jpg', 'http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baag.jpg', 'http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baah.jpg', 'http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baai.jpg', 'http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baaj.jpg', 'http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baba.jpg', 'http://code.google.com/edu/languages/google-python-class/images

In [168]:
def download_images(img_urls, dest_dir):
  if not os.path.exists(dest_dir):
    os.makedirs(dest_dir)
  index = open(os.path.join(dest_dir, 'index.html'), 'w')
  index.write('<html><body>\n')
  i = 0
  for img_url in img_urls:
    local_name = 'img%d' % i
    print('Retrieving...', img_url)
    urllib.request.urlretrieve(img_url, os.path.join(dest_dir, local_name))
    index.write('<img src="%s">' % (local_name,))
    i += 1
  index.write('\n</body></html>\n')
  index.close()

In [169]:
download_images(img_urls, dest_dir)

Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baaa.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baab.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baac.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baad.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baae.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baaf.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baag.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baah.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baai.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baaj.jpg
Retrieving... http://code.google.com/edu

In [170]:
!ls ./google-python-exercises/logpuzzle/output/

img0       img11      img14      img17      img2       img5       img8
img1       img12      img15      img18      img3       img6       img9
img10      img13      img16      img19      img4       img7       index.html


In [171]:
!python google-python-exercises/logpuzzle/logpuzzle.py \
  --todir ./google-python-exercises/logpuzzle/animaldir/ \
  ./google-python-exercises/logpuzzle/animal_code.google.com

Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baaa.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baab.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baac.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baad.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baae.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baaf.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baag.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baah.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baai.jpg
Retrieving... http://code.google.com/edu/languages/google-python-class/images/puzzle/a-baaj.jpg
Retrieving... http://code.google.com/edu

In [172]:
!ls ./google-python-exercises/logpuzzle/animaldir/

img0       img11      img14      img17      img2       img5       img8
img1       img12      img15      img18      img3       img6       img9
img10      img13      img16      img19      img4       img7       index.html


In [174]:
!cat google-python-exercises/logpuzzle/logpuzzle.py

#!/usr/bin/python
# Copyright 2010 Google Inc.
# Licensed under the Apache License, Version 2.0
# http://www.apache.org/licenses/LICENSE-2.0

# Google's Python Class
# http://code.google.com/edu/languages/google-python-class/

import os
import re
import sys
import urllib.request

"""Logpuzzle exercise
Given an apache logfile, find the puzzle urls and download the images.

Here's what a puzzle url looks like:
10.254.254.28 - - [06/Aug/2007:00:13:48 -0700] "GET /~foo/puzzle-bar-aaab.jpg HTTP/1.0" 302 528 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6"
"""


def url_sort_key(url):
  match = re.search(r'-(\w+)-(\w+)\.\w+', url)
  if match:
    return match.group(2)
  else:
    return url

def read_urls(filename):
  underbar = filename.index('_')
  host = filename[underbar + 1:]
  url_dict = {}
  f = open(filename)
  for line in f:
    match = re.search(r'"GET (\S+)', line)
    if match:
      path = match.