# Introduction to Python

In [1]:
%matplotlib inline 
#The "%" is an Ipython Notebook and not part of the Python language 
#Tell plottling library to draw things on the notebook

import numpy as np # imports a fast numerical programming library
import scipy as sp #imports stats function, amongst other things
import matplotlib.pyplot as plt #Sets up plotting under plt 
import pandas as pd #handles data as dataframes
import seaborn as sns #sets up styles and gives us more plotting options\\

## Python and Iteration (Files)

Introducing the notion of a comprehension -- _a way of constructing a list_. 

In [2]:
alist=[1,2,3,4,5]
a_squared = [i*i for i in alist]
a_squared

[1, 4, 9, 16, 25]

Python has some functios like `enumerate` and `zip`. 

**Enumerate** provides a list of tuples with each tuple of the form `(index,value)`, while **zip** takes elements from each list and outputs them together into a tuple, thus creating a list of tuples. 


In [15]:
enumerate(a_squared)

<enumerate at 0x2146a0dbfc0>

In [16]:
zip(alist,a_squared)

<zip at 0x2146a0e1c48>

There is a design flaw since it doesnt print out the results

In [17]:
for k in enumerate(a_squared):
    print(k)

(0, 1)
(1, 4)
(2, 9)
(3, 16)
(4, 25)


In [18]:
[k for k in enumerate(a_squared)]

[(0, 1), (1, 4), (2, 9), (3, 16), (4, 25)]

Opened files behave like lists too! 

Here, I get each line in the file and find its length using the _comprehension syntax_ to put these lengths into a big list. 

In [19]:
length_of_lines = [len(line) for line in open("hamlet.txt")]
#Poor code since I don't close the file 
print(length_of_lines)

[8, 26, 1, 23, 1, 1, 1, 1, 21, 1, 27, 59, 28, 27, 26, 21, 21, 23, 24, 17, 23, 10, 20, 19, 21, 31, 9, 27, 30, 11, 21, 26, 1, 50, 31, 1, 66, 12, 1, 17, 1, 1, 1, 7, 1, 49, 1, 48, 1, 5, 13, 1, 6, 44, 1, 5, 20, 1, 6, 10, 1, 5, 4, 1, 6, 40, 1, 5, 52, 1, 6, 47, 24, 1, 5, 26, 1, 6, 22, 1, 5, 18, 38, 45, 1, 6, 47, 1, 31, 1, 5, 24, 1, 5, 26, 1, 6, 21, 1, 5, 29, 23, 1, 6, 23, 21, 1, 8, 1, 5, 17, 1, 5, 5, 24, 1, 5, 16, 1, 5, 44, 1, 5, 46, 1, 5, 21, 1, 5, 35, 41, 47, 37, 44, 36, 41, 1, 5, 31, 1, 5, 17, 40, 41, 30, 1, 5, 19, 40, 1, 5, 19, 50, 50, 42, 29, 1, 5, 50, 1, 22, 1, 5, 47, 1, 5, 42, 1, 5, 46, 1, 5, 48, 1, 5, 22, 1, 5, 22, 1, 5, 49, 41, 39, 53, 1, 5, 16, 1, 5, 21, 1, 5, 41, 1, 14, 1, 5, 32, 1, 5, 45, 41, 21, 1, 5, 40, 37, 18, 1, 5, 25, 1, 5, 24, 35, 39, 44, 41, 14, 1, 5, 47, 46, 1, 5, 47, 43, 47, 1, 5, 48, 46, 42, 42, 40, 49, 42, 45, 49, 29, 1, 5, 12, 46, 41, 43, 44, 52, 52, 52, 35, 49, 44, 38, 42, 34, 49, 38, 48, 35, 46, 40, 38, 47, 41, 38, 45, 44, 40, 49, 43, 1, 5, 36, 46, 48, 44, 1, 5, 40,

In [20]:
#Sum of line lengths
sum(length_of_lines)

173948

In [21]:
#Average line lengths
np.mean(length_of_lines)

25.69394387001477

I want to access hamlet word by word and not line by line 


In [22]:
hamlet_file = open("hamlet.txt")

In [23]:
hamlet_text = hamlet_file.read()

In [24]:
hamlet_file.close()

In [25]:
hamlet_tokens = hamlet_text.split()

In [26]:
#The number of words 
len(hamlet_tokens)

31659

Use the `with` syntax which creates a useful context since the file closing is done automatically. 

In [27]:
with open("hamlet.txt") as hamlet_file:
    hamlet_text = hamlet_file.read()
    hamlet_tokens = hamlet_text.split()
    print(len(hamlet_tokens))

31659


Roughly 32,000 words in Hamlet. 

### Indexing of Lists

In [29]:
print(hamlet_text[:1000]) #First 1000 words from hamlet 

ï»¿XXXX
HAMLET, PRINCE OF DENMARK

by William Shakespeare




PERSONS REPRESENTED.

Claudius, King of Denmark.
Hamlet, Son to the former, and Nephew to the present King.
Polonius, Lord Chamberlain.
Horatio, Friend to Hamlet.
Laertes, Son to Polonius.
Voltimand, Courtier.
Cornelius, Courtier.
Rosencrantz, Courtier.
Guildenstern, Courtier.
Osric, Courtier.
A Gentleman, Courtier.
A Priest.
Marcellus, Officer.
Bernardo, Officer.
Francisco, a Soldier
Reynaldo, Servant to Polonius.
Players.
Two Clowns, Grave-diggers.
Fortinbras, Prince of Norway.
A Captain.
English Ambassadors.
Ghost of Hamlet's Father.

Gertrude, Queen of Denmark, and Mother of Hamlet.
Ophelia, Daughter to Polonius.

Lords, Ladies, Officers, Soldiers, Sailors, Messengers, and other
Attendants.

SCENE. Elsinore.



ACT I.

Scene I. Elsinore. A platform before the Castle.

[Francisco at his post. Enter to him Bernardo.]

Ber.
Who's there?

Fran.
Nay, answer me: stand, and unfold yourself.

Ber.
Long live the king!

Fran.
Bern

In [30]:
#Last 1000 words from hamlet
print(hamlet_text[-1000:])

on by cunning and forc'd cause;
And, in this upshot, purposes mistook
Fall'n on the inventors' heads: all this can I
Truly deliver.

Fort.
Let us haste to hear it,
And call the noblest to the audience.
For me, with sorrow I embrace my fortune:
I have some rights of memory in this kingdom,
Which now, to claim my vantage doth invite me.

Hor.
Of that I shall have also cause to speak,
And from his mouth whose voice will draw on more:
But let this same be presently perform'd,
Even while men's minds are wild: lest more mischance
On plots and errors happen.

Fort.
Let four captains
Bear Hamlet like a soldier to the stage;
For he was likely, had he been put on,
To have prov'd most royally: and, for his passage,
The soldiers' music and the rites of war
Speak loudly for him.--
Take up the bodies.--Such a sight as this
Becomes the field, but here shows much amiss.
Go, bid the soldiers shoot.

[A dead march.]

[Exeunt, bearing off the dead bodies; after the which a peal of
ordnance is shot off.]


In [31]:
#Splitting tokens
print(hamlet_tokens[1:4])

['HAMLET,', 'PRINCE', 'OF']


In [33]:
print(hamlet_tokens[:4])

['ï»¿XXXX', 'HAMLET,', 'PRINCE', 'OF']


In [35]:
print(hamlet_tokens[0])

ï»¿XXXX


In [36]:
#Last word
print(hamlet_tokens[-1])

off.]


In [37]:
#Getting every second word between 2nd and 9th 
print(hamlet_tokens[1:8:2])

['HAMLET,', 'OF', 'by', 'Shakespeare']


**Range** and **XRange** gets list of integers up to N. 
- XRange behaves like an iterator since there is no point generating a million integers -- Can just add 1 to the previous one and save memory. 
- Trade off storage for computations

## Dictionaries

In [61]:
my_dict = {'one':1,'two':2,'three':3}

In [62]:
for i in my_dict:
    print(i)

one
two
three


In [63]:
my_dict.items()
#Prints out tuples of keys and values

dict_items([('one', 1), ('two', 2), ('three', 3)])

In [64]:
my_dict.values()

dict_values([1, 2, 3])

In [65]:
my_dict.keys()

dict_keys(['one', 'two', 'three'])

In [66]:
print([i for i in my_dict],[(k,v) for k,v in my_dict.items()],my_dict.values())

['one', 'two', 'three'] [('one', 1), ('two', 2), ('three', 3)] dict_values([1, 2, 3])


Keys don't have to be strings -- use dictionary comprehensions 

In [67]:
for k in enumerate(a_squared):
    print(k)

(0, 1)
(1, 4)
(2, 9)
(3, 16)
(4, 25)


In [68]:
my_dict = {k:v for (k,v) in zip(alist,a_squared)}
my_dict

{1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

### Strings 


In [81]:
last_word = hamlet_tokens[-1]
print(last_word)

off.]


In [82]:
lastword[-2]='k' #Can't change a part of the string 

NameError: name 'lastword' is not defined

In [83]:
last_word[-2]

'.'

## Functions
Look at functions as methods on objects or standing alone by themselves. 

In python functions, functions are 'first-class' i.e. _pass functions to other functions._

In [84]:
def square(x):
    return(x*x)

def cube(x):
    return x*x*x

square(5),cube(5)

(25, 125)

In [85]:
def sum_of_anything(x,y,f):
    print(x,y,f)
    return(f(x)+f(y))

sum_of_anything(3,4,square)


3 4 <function square at 0x000002146A0DCB70>


25

Python functions can have **positional arguments and keyword arguments**. 

Postional arguments are stored in a tuple whereas keyword arguments in a dictionary. 

In [86]:
def f(a,b,*posargs,**dictargs):
    print("got",a,b,posargs,dictargs)
    return a

print(f(1,3))

got 1 3 () {}
1


In [88]:
print(f(1,3,4,2,d=1,c=2))

got 1 3 (4, 2) {'d': 1, 'c': 2}
1


In [90]:
def do_it(x):
    if x==1:
        print("one")
    elif x==2:
        print('two')
    else:
        print(str(x))
        
do_it(2)

two


Can use `break` to break out of a loop based on a condition. The loop below is a for loop. 

In [93]:
for i in range(10):
    if (i>5):
        break
    print(i)

0
1
2
3
4
5


`continue` continues to the next iteration of the loop **skipping all the code below,** while `break` breaks out of it. 

In [94]:
i=0
while i<10:
    print(i)
    i+=1
    if i<5:
        continue
    else:
        break

0
1
2
3
4


## Exceptions 
This is the way to catch errors

In [95]:
try:
    f(1) #This function takes 2 arguments
except:
    import sys
    print(sys.exc_info())

(<class 'TypeError'>, TypeError("f() missing 1 required positional argument: 'b'",), <traceback object at 0x000002146A0E1748>)


## Bringing it All Together
Converting all words to lower-case

In [96]:
hamlet_lowercase_tokens = [word.lower() for word in hamlet_tokens]
hamlet_lowercase_tokens.count('thou')

95

Finding unique set of words using python's `set` data structure. Counting how often words occured using the `count` method on lists

In [98]:
unique_lowercase_tokens = set(hamlet_lowercase_tokens)
unique_lowercase_tokens

tokendict={}
for unique_tokens in unique_lowercase_tokens:
    tokendict[unique_tokens] = hamlet_tokens.count(unique_tokens)

In [99]:
tokendict

{'drains': 1,
 'passage?': 1,
 'hell': 4,
 'well.--welcome,': 1,
 "cain's": 0,
 'sphere,': 1,
 'bands.': 1,
 'grunt': 1,
 'enough,': 1,
 'talk': 2,
 'presently': 2,
 'religious': 1,
 'confine': 1,
 'sting': 2,
 'no;': 1,
 'times': 7,
 'gifts,--': 1,
 'drowned': 2,
 'ban': 1,
 'next': 3,
 'cup.': 1,
 'stops.': 1,
 'returns,--puzzles': 1,
 'foolish': 3,
 'soldiers': 2,
 'flats': 1,
 'lo,': 2,
 'sugar': 1,
 'compost': 1,
 'horrible!': 3,
 'yaughan;': 0,
 'capt.': 0,
 'speak': 36,
 'perform:': 1,
 'cue': 1,
 'jawbone,': 1,
 'hang,': 1,
 'fail;': 1,
 'priest,': 1,
 'odds.': 2,
 'lids': 1,
 'ducats,': 1,
 'good:': 2,
 'remembrance;': 1,
 'image,': 2,
 'burden!': 1,
 'quarrelling,': 1,
 'unworthiest': 1,
 'treble': 2,
 'hearers?': 1,
 'near,': 1,
 'awhile': 3,
 'with;': 1,
 'no?': 1,
 'air': 4,
 'them?--to': 0,
 'crew': 1,
 'undertake': 2,
 'volley.': 1,
 'fare': 2,
 'lords.': 0,
 'mad?': 2,
 'shoulder': 2,
 'passion': 6,
 'pity': 2,
 'utter': 1,
 "lodg'd": 1,
 'inquiry': 1,
 'pace': 1,
 'you