### Working with files ###

In [1]:
%%file three.txt
one
two
three

Overwriting three.txt


In [2]:
fhandle = open("three.txt")

In [3]:
fhandle.read() # you can read whole content in a single statement

'one\ntwo\nthree'

In [4]:
fhandle.read() # trying read after this will result in empty string

''

In [5]:
fhandle.close()

In [6]:
!python3 -c "import this" > data.txt

In [7]:
filehandle = open("data.txt")

In [8]:
filehandle.readline() # possible to read one line at a time

'The Zen of Python, by Tim Peters\n'

In [9]:
lines = filehandle.readlines() # read content of file and return as lines

In [10]:
lines

['\n',
 'Beautiful is better than ugly.\n',
 'Explicit is better than implicit.\n',
 'Simple is better than complex.\n',
 'Complex is better than complicated.\n',
 'Flat is better than nested.\n',
 'Sparse is better than dense.\n',
 'Readability counts.\n',
 "Special cases aren't special enough to break the rules.\n",
 'Although practicality beats purity.\n',
 'Errors should never pass silently.\n',
 'Unless explicitly silenced.\n',
 'In the face of ambiguity, refuse the temptation to guess.\n',
 'There should be one-- and preferably only one --obvious way to do it.\n',
 "Although that way may not be obvious at first unless you're Dutch.\n",
 'Now is better than never.\n',
 'Although never is often better than *right* now.\n',
 "If the implementation is hard to explain, it's a bad idea.\n",
 'If the implementation is easy to explain, it may be a good idea.\n',
 "Namespaces are one honking great idea -- let's do more of those!\n"]

lets do some work with this data. how about computing number of words on every line?

In [11]:
filehandle = open("data.txt")

In [12]:
for line in filehandle.readlines():
    print(len(line.strip().split()))

7
0
5
5
5
5
5
5
2
9
4
5
3
10
13
12
5
8
11
13
12


In [None]:
for line in open("data.txt"):
    print(line)

### Do it yourself ###
- Write a program cat.py equivalent to `cat` command in unix.
```
python cat.py three.txt
one
two
three
```

- Write a program head.py equivalent to unix command head. it should take first commandline argument as number of lines and second argument as filename

```
python head.py 5 data.txt
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
```

- Write a program wc.py which implements functions to compute line count, word count and charecter count.
```
python wc.py data.txt
20 144 856
```

In [1]:
!python cat.py three.txt

python: can't open file 'cat.py': [Errno 2] No such file or directory


In [14]:
%%file cat.py
"""
module cat is rough implementation of unix command cat.
"""
import sys

def cat(file):
    """
    prints contents of file to standard output
    """
    f = open(file)
    
    for line in f.readlines():
        print(line, end="")# try this first without end=""
    
    f.close()

if __name__ == "__main__":
    cat(sys.argv[1])

Overwriting cat.py


In [15]:
%%file head.py
"""
module head is rough implementation of unix command head.
it prints initial part of file to standard output
"""
import sys

def head(file, n):
    """
    prints initial n lines of file to standard output
    """
    f = open(file)
    for line in f.readlines()[:n]:
        print(line, end="")

if __name__ == "__main__":
    head(sys.argv[2], int(sys.argv[1]))

Overwriting head.py


In [16]:
!python head.py 5 data.txt 

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.


In [17]:
%%file wc.py
"""
module wc implements unix equivalent of wc command
"""
import sys

def line_count(f):
    return len(open(f).readlines())

def word_count(f):
    return len(open(f).read().split())

def char_count(f):
    return len(open(f).read())

if __name__ == "__main__":
    file = sys.argv[1]
    print(line_count(file), word_count(file), char_count(file))

Overwriting wc.py


In [18]:
!python wc.py data.txt

21 144 857


### Do it yourself ###
- Using above wc module 
  1. Find file with maximum number of lines in current directory
  2. How about file with maximum number of words?

In [19]:
import os
files = [f for f in os.listdir(os.getcwd()) if os.path.isfile(f)]

In [20]:
import wc

In [21]:
max(files, key=wc.line_count)

'Foundation1.html'

In [22]:
max(files, key=wc.word_count)

'Foundation1.html'

### Writing files ###
for writing you need to open file with write mode.

In [23]:
f = open("primes.txt","w")
f.write("two\n")
f.write("three\n")
f.write("five\n")
f.close() # only on close contents are flushed to disk.

Lets check contents of our file with our own module `cat.py`

In [24]:
!python cat.py primes.txt 

two
three
five


When we open file with `w` mode , file contents are overwritten. if appned is expected, open the file with `a`mode

In [25]:
f = open("primes.txt", "a")
f.write("seven\n")
f.write("eleven\n")
f.close()

In [26]:
!python cat.py primes.txt

two
three
five
seven
eleven


Similarly we can read and write binary files with following modes.
1. `rb`  => read in binary mode
2. `wb` => write in binary mode
3. `ab` => append in binary mode

In [27]:
open("primes.txt",'r').read() # text mode

'two\nthree\nfive\nseven\neleven\n'

In [28]:
open("primes.txt", "rb").read() # binary mode

b'two\nthree\nfive\nseven\neleven\n'

In [29]:
f = open("binarydata.bin", "wb")
f.write(b'x025x082')
f.close()

In [30]:
open("binarydata.bin", "rb").read()

b'x025x082'

In [31]:
f = open("binarydata.bin", "ab")
f.write(b'Hello')
f.close()

In [32]:
open("binarydata.bin", "rb").read()

b'x025x082Hello'

### with statement ###
with `with` statements it becomes very easy to write file as file handle is closed automatically by python after exit from with block.

In [33]:
with open("primes.txt", "a") as f:
    f.write("thirteen")

In [34]:
open("primes.txt").read() # note that contents are flushed to disk only after close

'two\nthree\nfive\nseven\neleven\nthirteen'

In [35]:
with open("regional.txt","w", encoding="utf-8" ) as regional:
    regional.write("\u0c05\u0c06")

In [36]:
open("regional.txt", encoding="utf-8").read()

'అఆ'

In [37]:
open("regional.txt", "rb").read()

b'\xe0\xb0\x85\xe0\xb0\x86'

### Do it yourself ###
1. Write a function to write multiplication tables upto 11 in csv format in a file as given below.
```
1,2,3,4,5,6,7,8,9,10,11
2,4,6,8,10,12,14,16,18,20,22
.
.
.
```
2. Write python module tabulate.py which converts a cvs file data into a nice prety print table format.
```
python tabulate.py multi_tables.csv tabulated.txt
cat tabulated.txt
  1   2   3   4   5   6   7   8   9  10  11
  2   4   6   8  10  12  14  16  18  20  22
  3   6   9  12  15  18  21  24  27  30  33
  4   8  12  16  20  24  28  32  36  40  44
  5  10  15  20  25  30  35  40  45  50  55
  6  12  18  24  30  36  42  48  54  60  66
  7  14  21  28  35  42  49  56  63  70  77
  8  16  24  32  40  48  56  64  72  80  88
  9  18  27  36  45  54  63  72  81  90  99
 10  20  30  40  50  60  70  80  90 100 110
```


In [1]:

def write_tables(n, filename):
    def tables(n):
        return [[str(m*i) for m in range(1,n+1)] for i in range(1,11)]
    
    with open(filename, "w") as f:
        data = tables(n)
        for row in data:
            f.write(",".join(row))
            f.write("\n")
    
        

In [2]:
write_tables(11, "multi_tables.csv")

In [3]:
!cat multi_tables.csv

1,2,3,4,5,6,7,8,9,10,11
2,4,6,8,10,12,14,16,18,20,22
3,6,9,12,15,18,21,24,27,30,33
4,8,12,16,20,24,28,32,36,40,44
5,10,15,20,25,30,35,40,45,50,55
6,12,18,24,30,36,42,48,54,60,66
7,14,21,28,35,42,49,56,63,70,77
8,16,24,32,40,48,56,64,72,80,88
9,18,27,36,45,54,63,72,81,90,99
10,20,30,40,50,60,70,80,90,100,110


In [4]:
%%file tabulate.py
"""
# tabulate.py
converts cvs file in to nice prety print table
"""
import sys

def parse_csv(filename):
    return [line.strip().split(",") for line in open(filename).readlines()]
    
def tabulate(csv, tabular):
    data = parse_csv(csv)
    maxwordlength = len(max([max(row, key=len) for row in data], key=len))
    with open(tabular, "w") as f:
        for row in data:
            f.write(" ".join([word.rjust(maxwordlength) for word in row]))
            f.write("\n")
    
    

if __name__ == "__main__":
    tabulate(sys.argv[1], sys.argv[2])

Overwriting tabulate.py


In [5]:
!python tabulate.py multi_tables.csv tabulated.txt

In [45]:
!cat tabulated.txt

  1   2   3   4   5   6   7   8   9  10  11
  2   4   6   8  10  12  14  16  18  20  22
  3   6   9  12  15  18  21  24  27  30  33
  4   8  12  16  20  24  28  32  36  40  44
  5  10  15  20  25  30  35  40  45  50  55
  6  12  18  24  30  36  42  48  54  60  66
  7  14  21  28  35  42  49  56  63  70  77
  8  16  24  32  40  48  56  64  72  80  88
  9  18  27  36  45  54  63  72  81  90  99
 10  20  30  40  50  60  70  80  90 100 110


### Writing to stderr, stdout ###
Any proces when starts it opens up three files, stdout, stderr, stdin. 
1. stdout -> standard output usually directed to terminal
2. stdinput -> standard input, usually takes from keyboard
3. stderr -> standard error

python provides handle to write to these buffers via sys module

In [None]:
import sys
sys.stdout.write("Hello Python")

In [None]:
sys.stderr.write("Error: some ..Exception..")

### Dictionaries ##
Dictionaries are used internally extensively by python langauge. Lets try to work with dictionaries

In [50]:
author = {'name':"lewis carrol", 
          'books':["alice in wonderland", "looking through the glass"],
          'language':"english"}

In [51]:
author['name']

'lewis carrol'

In [53]:
author['books']

['alice in wonderland', 'looking through the glass']

In [54]:
print(author)

{'books': ['alice in wonderland', 'looking through the glass'], 'language': 'english', 'name': 'lewis carrol'}


In [55]:
author['name'] = "lewis"

In [56]:
print(author)

{'books': ['alice in wonderland', 'looking through the glass'], 'language': 'english', 'name': 'lewis'}


In [57]:
del author['books']

In [58]:
print(author)

{'language': 'english', 'name': 'lewis'}


In [59]:
"name" in author

True

In [60]:
"lewis" in author

False

In [61]:
author['language']

'english'

In [62]:
author.get("language")

'english'

In [64]:
'books' in author # what happens if we try to get item that it not there in dictionary

False

In [65]:
author['books]']

KeyError: 'books]'

In [66]:
author.get("books",[])

[]

Lets use dictionaries in real world. Here is how default grub config file looks like on my debian.
```
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""
```
I want to have two different configs for ubuntu and debian as there are some modules which are not compatible on debian for my laptop hardware

In [76]:
grub_debian = {
    'GRUB_DEFAULT' :0,
    'GRUB_TIMEOUT' :5,
    'GRUB_DISTRIBUTOR' :"`lsb_release -i -s 2> /dev/null || echo Debian`",
    'GRUB_CMDLINE_LINUX_DEFAULT' :"quiet splash modprob.blacklist=dw_dmac, dw_dmax_core",
    'GRUB_CMDLINE_LINUX' : ""
}

In [77]:
grub_ubuntu ={
        'GRUB_DEFAULT' :0,
        'GRUB_HIDDEN_TIMEOUT_QUIET' :'true',
        'GRUB_TIMEOUT' :10,
        'GRUB_DISTRIBUTOR' :"`lsb_release -i -s 2> /dev/null || echo Debian`",
        'GRUB_CMDLINE_LINUX_DEFAULT' :"quiet splash reboot=bios",
        'GRUB_CMDLINE_LINUX' : ""
}

In [73]:
grub_config = {'debian':grub_debian,
               'ubuntu':grub_ubuntu
              }

In [74]:
grub_config['debian']

{'GRUB_CMDLINE_LINUX': '',
 'GRUB_CMDLINE_LINUX_DEFAULT': 'quiet splash modprob.blacklist=dw_dmac, dw_dmax_core',
 'GRUB_DEFAULT': 0,
 'GRUB_DISTRIBUTOR': '`lsb_release -i -s 2> /dev/null || echo Debian`',
 'GRUB_TIMEOUT': 5}

In [75]:
grub_config['debian']['GRUB_CMDLINE_LINUX_DEFAULT']

'quiet splash modprob.blacklist=dw_dmac, dw_dmax_core'

### Iterating over dictionaries ###

In [2]:
d = {"one":1, "two":2 , "three":3}

Iterating over keys

In [3]:
for key in d.keys():
    print(key, d[key])

two 2
one 1
three 3


Iterating over values directly

In [4]:
for value in d.values():
    print(value)

2
1
3


Iterating over keys and values together

In [5]:
for k,v in d.items():
    print(k,v)

two 2
one 1
three 3


what if we iterate directly over dictionary?

In [6]:
for item in d:
    print(item)

two
one
three


it goes over keys!

In [7]:
numbers = [("one", 1), ("two", 2) , ("three", 3)]

In [8]:
dict(numbers)

{'one': 1, 'three': 3, 'two': 2}

In [9]:
dict(zip(('a','b','c'), (1, 2, 3)))

{'a': 1, 'b': 2, 'c': 3}

In [10]:
items = ("Pen", "Pencil", "Colorbox")
prices = (25, 10, 50)
cart = dict(zip(items, prices))

In [11]:
for item, price in cart.items():
    print(item.rjust(8), price)
print("-"*12)
print("Total".rjust(8), sum(cart.values()))

Colorbox 50
     Pen 25
  Pencil 10
------------
   Total 85


### Do it yourself ###
- write a function unzip which operates over a dictionary and returns keys and and values as seperate lists.

In [12]:
def unzip(d):
    keys = d.keys()
    values = [d[k] for k in keys]
    return list(keys), values

In [13]:
unzip(cart)

(['Colorbox', 'Pen', 'Pencil'], [50, 25, 10])

### More examples ###
Write a program to compute frequency of words in a file.

In [14]:
%%file words.txt
five
five four
five four three
five four three two
five four three two one
six seven eight nine
six seven eight
six seven
six

Overwriting words.txt


In [15]:
%%file wordfreq.py
"""
computes frequency of every word in a file
"""
import sys

def read_words(filename):
    return open(filename).read().split()

def wordfreq(words):
    freq = {}
    for word in words:
        if word in freq:
            freq[word] += 1
        else:
            freq[word] = 1
    return freq

if __name__ == "__main__":
    words = read_words(sys.argv[1])
    freq = wordfreq(words)
    print(freq)

Overwriting wordfreq.py


In [16]:
!python wordfreq.py words.txt

{'two': 2, 'three': 3, 'six': 4, 'seven': 3, 'five': 5, 'eight': 2, 'nine': 1, 'one': 1, 'four': 4}


Can we improve this?

In [17]:
%%file wordfreq.py
"""
computes frequency of every word in a file
usage:
python wordfreq.py filename
"""
import sys

def read_words(filename):
    return open(filename).read().split()

def wordfreq(words):
    freq = {}
    for word in words:
        freq[word] = freq.get(word,0) + 1
    return freq

if __name__ == "__main__":
    words = read_words(sys.argv[1])
    freq = wordfreq(words)
    print(freq)

Overwriting wordfreq.py


In [18]:
!python wordfreq.py words.txt

{'three': 3, 'six': 4, 'eight': 2, 'two': 2, 'four': 4, 'nine': 1, 'one': 1, 'seven': 3, 'five': 5}


In [19]:
def wordfreq1(words):
    freq = {}
    uniqwords = set(words)
    for w in uniqwords:
        freq[w] = words.count(w)
    return freq

In [20]:
import wordfreq

In [21]:
words = wordfreq.read_words("words.txt")

In [22]:
freq = wordfreq1(words)

How about printing it nicely

In [23]:
for w, f in freq.items():
    print(w.rjust(5),f)

 five 5
eight 2
  six 4
seven 3
three 3
 nine 1
  two 2
 four 4
  one 1


In sorted order of frequencies?

In [24]:
for k,v in sorted(freq.items()):
    print(k.rjust(5), v)

eight 2
 five 5
 four 4
 nine 1
  one 1
seven 3
  six 4
three 3
  two 2


In [25]:
for k,v in sorted(freq.items(), key=lambda x:x[1]):
    print(k.rjust(5),v)

 nine 1
  one 1
eight 2
  two 2
seven 3
three 3
  six 4
 four 4
 five 5


In [26]:
for k,v in sorted(freq.items(), key=lambda x:x[1], reverse=True):
    print(k.rjust(5),v)

 five 5
  six 4
 four 4
seven 3
three 3
eight 2
  two 2
 nine 1
  one 1


Print a horizontal Text histogram of word frequency

In [27]:
for k,v in sorted(freq.items(), key=lambda x:x[1], reverse=True):
    print(k.rjust(5),v, "*"*v)

 five 5 *****
  six 4 ****
 four 4 ****
seven 3 ***
three 3 ***
eight 2 **
  two 2 **
 nine 1 *
  one 1 *


Grouping all keys with given values!

In [23]:
team = {"david":"USA", "anand":"india", "linus":"USA","nouful":"india","alice":"UK"}

In [24]:
[name for name in team.keys() if team[name]=="india"]

['nouful', 'anand']

In [25]:
[name for name in team.keys() if team[name]=="USA"]

['linus', 'david']

### Pitfalls ###

In [29]:
x = [1, 2, 3, 4]
y = x
y.append(5)
print(x)

[1, 2, 3, 4, 5]


In [30]:
x = [1, 2, 3, 4]
y = x
y = [1, 2, 3]
print(x)

[1, 2, 3, 4]


In [31]:
x = 1
y = x
y = 2
print(x)

1


### Classes ###

In [49]:
class Complex:
    def __init__(self, r, i):
        self.real = r
        self.imaginary = i
    
    def get_real(self):
        return self.real
    
    def get_imaginary(self):
        return self.imaginary

In [50]:
p = Complex(10,5) # __init__ is special method that get called when class is called as function with arguments

In [51]:
print(p)

<__main__.Complex object at 0x7f57f4209b70>


In [52]:
type(p)

__main__.Complex

In [54]:
isinstance(p, Complex)

True

In [55]:
p.get_real()

10

In [58]:
class Complex:
    def __init__(self, r, i):
        self.real = r
        self.imaginary = i
    
    def get_real(self):
        return self.real
    
    def get_imaginary(self):
        return self.imaginary
    
    def display(self):
        print(self.real,"+",str(self.imaginary)+"j" )
        
    def add(self, c):
        r = self.real + c.get_real()
        i = self.imaginary + c.get_imaginary()
        return Complex(r, i)

In [59]:
p = Complex(10,5)

In [60]:
p1 = Complex(3,4)

In [61]:
p2 = p.add(p1)
p2.display()

13 + 9j


### Do it yourself ###
- add provision to double the given complex number
- add provision to multiply given complex number with another complex number.
```
c0 = Complex(1,2)
c0.double().display()
2 + 4j
c1 = Complex(2, 3)
c2 = Complex(4, 5)
c3 = c1.multiply(c2)
c3.display()
-7 + 22j
```

### Why Classes? ###
Lets try to see with simple example. Let us try to model a bank account. Initialy let it be a module with the required functions in it

In [62]:
%%file bank0.py

balance = 0

def deposit(amount):
    global balance
    balance = balance + amount

def withdraw(amount):
    global balance
    balance = balance - amount

def get_balance():
    return balance

def main():
    deposit(100)
    withdraw(40)
    print(get_balance())
    
    deposit(20)
    print(get_balance())

if __name__ == "__main__":
    main()


Writing bank0.py


In [64]:
!python bank0.py

60
80


But what if we want to model multiple accounts? the bank0 module allows only one instance of banck account.

In [63]:
%%file bank1.py
"""Implementation of bank accounts with support for multiple accounts.
"""

def make_account():
    return {"balance": 0}

def deposit(account, amount):
    account["balance"] += amount

def withdraw(account, amount):
    account["balance"] -= amount
    
def get_balance(account):
    return account["balance"]

def main():
    a1 = make_account()
    a2 = make_account()
    
    deposit(a1, 100)
    deposit(a2, 50)
    print(get_balance(a1), get_balance(a2))
    
    withdraw(a1, 30)
    withdraw(a2, 20)
    print(get_balance(a1), get_balance(a2))

if __name__ == "__main__":
    main()

Writing bank1.py


In [65]:
!python bank1.py

100 50
70 30


- functions **manipulate** data nicely, while classes **model** the data nicely. A class is a great way of describing what something is rather than manipulating data. Above example shows that you can model data even with functions. but now lets try it using classes.

In [66]:
%%file bank2.py
"""Class-based implementation of bank account.
"""

class BankAccount:
    def __init__(self):
        self.balance = 0

    def deposit(self, amount):
        self.balance += amount

    def withdraw(self, amount):
        self.balance -= amount

    def get_balance(self):
        return self.balance

def main():
    a1 = BankAccount()
    a2 = BankAccount()
    
    a1.deposit(100)
    a2.deposit(50)
    print(a1.get_balance(), a2.get_balance())
    
    a1.withdraw(30)
    a2.withdraw(20)
    print(a1.get_balance(), a2.get_balance())

if __name__ == "__main__":
    main()

Writing bank2.py


In [67]:
!python bank2.py 

100 50
70 30


- If you have a number of related functions that you just want to bundle, a module will be do the work.
- The purpose of a class is to bundle a data structure which represents some logical entity with the operations that work with this data structure. ** Classes are convenient namespaces **

In [89]:
class Foo:
    pass

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
p = Point(3,4)
f = Foo()

In [90]:
p.__dict__

{'x': 3, 'y': 4}

In [91]:
f.__dict__

{}

In [82]:
p.z = 10 # nothing stops you from adding new attribute to class

In [83]:
print(p.z)

10


In [84]:
p.__dict__

{'x': 3, 'y': 4, 'z': 10}

In [85]:
del p.z

In [86]:
p.__dict__

{'x': 3, 'y': 4}

- One of the big advantages of using OOP is **extensibility**. Suppose you have code base with this class everywhere in data analysis as well as in visualization. There is a need to add color to point in visualization module!

In [87]:
class ColoredPoint(Point):
    color = (0,0,0) #r,g,b
    
    def get_color(self):
        return self.color


In [88]:
p = ColoredPoint(10,5)
print(p.x)
print(p.get_color())

10
(0, 0, 0)


### Do it yourself ###
- Write a class Timer to measure the time taken in a task. The class should have start and stop methods and it should be able to find the time taken between then. __*Hint:use time.time()*__
```
t = Timer()
t.start()
do_some_stuff()
t.stop()
print("Time taken: ", t.get_time_taken())
```

In [92]:
import time
class Timer:
    starttime = 0
    endtime = 0
    
    def start(self):
        self.starttime = time.time()
        
    def stop(self):
        self.endtime = time.time()
    
    def get_time_taken(self):
        return self.endtime - self.starttime

In [95]:
t = Timer()
t.start()
s = 0
for i in range(1000):
    for j in range(10000):
        s += i*j
t.stop()
print("Time taken:",t.get_time_taken(),"sec" )

Time taken: 3.8748726844787598 sec


### Exeptions ###

In [96]:
x

NameError: name 'x' is not defined

In [97]:
int("Hello")

ValueError: invalid literal for int() with base 10: 'Hello'

In [98]:
"2" * "3"

TypeError: can't multiply sequence by non-int of type 'str'

How to handle exceptions?

In [102]:
#b = "hello"
b = "2"
c = "3"
try:
    #a = int(b) * c
    a = b*c
except TypeError as e:
    a = 1
    print("Handled TypeError",e)
except ValueError as e:
    b = 0
    print("Handled ValueError",e)

Handled TypeError can't multiply sequence by non-int of type 'str'


In [111]:
import sys
def get_value(data, key, default):
    try:
        return data[key]
    except KeyError as e:
        print("Value not found, returning default" ,e, file=sys.stderr)
        return default

In [112]:
data = {"name":"alice"}

In [113]:
get_value(data, "name", "Noname")

'alice'

In [114]:
get_value(data, "language", "Hindi")

Value not found, returning default 'language'


'Hindi'

Usually when we load numeric data from a csv or tsv file, there is a need to handle missing values. Missing values can be emptyspaces ot N/A or Nan in some file. lets write a csv parser that handles such missing values.

lets start with simple case first ...lets read numbers (one number on one line ) from a file first.

In [121]:
%%file missing.txt
1
2
3
4
N/A
5
6
7
8
9
Nan
10

Overwriting missing.txt


In [122]:
def parsenumber(strnum):
    try:
        return int(strnum)
    except ValueError as e:
        print("Invalid integer ", repr(strnum), e,file=sys.stderr)
        return 0
    
def read_with_missing(filename):
    with open(filename) as f:
        return [parsenumber(line.strip()) for line in f.readlines()]

In [123]:
read_with_missing("missing.txt")

Invalid integer  'N/A' invalid literal for int() with base 10: 'N/A'
Invalid integer  'Nan' invalid literal for int() with base 10: 'Nan'


[1, 2, 3, 4, 0, 5, 6, 7, 8, 9, 0, 10]

### Writing Command-line Applications ###

Consider following unix command

In [1]:
!ls

Foundation1.html   Foundation2.html   Foundation3.html
Foundation1.ipynb  Foundation2.ipynb  Foundation3.ipynb


In [7]:
!ls /home/vikrant

Desktop    Downloads  Pictures	   Public     usr
Documents  Music      programming  Templates  Videos


In [8]:
!cp Foundation1.html /tmp/

These are called positional arguments. They can be optional or compulsory.

Lets make our own small command fib.py which should work like as given below
```
Usage: python fib.py n
computes nth fibonacci number
```

In [16]:
%%file fib.py
import argparse

def fib(n):
    prev = 1
    current = 1
    
    for i in range(2,n):
        current, prev = prev+current, current
    return current

def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument("n", help="n for computing nth fiboinacci number",
                  type=int)
    return p.parse_args()


def main():
    args = parse_args()
    print(args)
    print(fib(args.n))

if __name__ == "__main__":
    main()

Overwriting fib.py


In [17]:
!python fib.py

usage: fib.py [-h] n
fib.py: error: the following arguments are required: n


In [18]:
!python fib.py -h

usage: fib.py [-h] n

positional arguments:
  n           n for computing nth fiboinacci number

optional arguments:
  -h, --help  show this help message and exit


In [22]:
!python fib.py 100

Namespace(n=100)
354224848179261915075


Lets extend out command to print sequence of fibinacci numbers till nth fibonacci. Lets add one optional argument to out command -s, if this is given our command should print sequence and not just a number.

In [24]:
%%file fib.py
import argparse

def fib(n):
    prev = 1
    current = 1
    
    for i in range(2,n):
        current, prev = prev+current, current
    return current

def printfiblist(n):
    prev, current = 1, 1
    print(prev, current, end=" ")
    for i in range(2, n):
        current, prev = prev+ current, current
        print(current, end=" ")

def parse_args():
    p = argparse.ArgumentParser()
    p.add_argument("n", 
                   help="n for computing nth fiboinacci number",
                   type=int)
    p.add_argument("-s", "--sequence",
                  help="Print sequence of fibonaci",
                  action="store_true")
    return p.parse_args()


def main():
    args = parse_args()
    print(args)
    if args.sequence:
        printfiblist(args.n)
    else:
        print(fib(args.n))

if __name__ == "__main__":
    main()

Overwriting fib.py


In [25]:
!python fib.py -h

usage: fib.py [-h] [-s] n

positional arguments:
  n               n for computing nth fiboinacci number

optional arguments:
  -h, --help      show this help message and exit
  -s, --sequence  Print sequence of fibonaci


In [28]:
!python fib.py -s 25

Namespace(n=25, sequence=True)
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 

In [27]:
!python fib.py 100

Namespace(n=100, sequence=False)
354224848179261915075


### Downloading stuff from web ###

In [29]:
from urllib.request import urlopen

In [30]:
response = urlopen("http://httpbin.org/html")

In [31]:
response

<http.client.HTTPResponse at 0x7f6d6cb9f400>

In [32]:
contents = response.read()

In [33]:
contents[:100]

b'<!DOCTYPE html>\n<html>\n  <head>\n  </head>\n  <body>\n      <h1>Herman Melville - Moby-Dick</h1>\n\n     '

Contents of http response are always bytes

In [34]:
html = contents.decode("utf-8")

In [36]:
print(html[:400])

<!DOCTYPE html>
<html>
  <head>
  </head>
  <body>
      <h1>Herman Melville - Moby-Dick</h1>

      <div>
        <p>
          Availing himself of the mild, summer-cool weather that now reigned in these latitudes, and in preparation for the peculiarly active pursuits shortly to be anticipated, Perth, the begrimed, blistered old blacksmith, had not removed his portable forge to the hold again, af


Finding status

In [38]:
response.status

200

Third party library `requests` makes it very easy to work with HTTP requests

Install using pip
```
pip3 install requests
```

In [2]:
import requests

In [3]:
response = requests.get("http://httpbin.org/html")
print(response.text[:400])

<!DOCTYPE html>
<html>
  <head>
  </head>
  <body>
      <h1>Herman Melville - Moby-Dick</h1>

      <div>
        <p>
          Availing himself of the mild, summer-cool weather that now reigned in these latitudes, and in preparation for the peculiarly active pursuits shortly to be anticipated, Perth, the begrimed, blistered old blacksmith, had not removed his portable forge to the hold again, af


In [4]:
response.headers

{'X-Powered-By': 'Flask', 'Access-Control-Allow-Origin': '*', 'Connection': 'keep-alive', 'Server': 'meinheld/0.6.1', 'Date': 'Sat, 16 Sep 2017 07:39:32 GMT', 'Via': '1.1 vegur', 'X-Processed-Time': '0.000817775726318', 'Access-Control-Allow-Credentials': 'true', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '3741'}

In [5]:
response.status_code

200

Passing parameters to get request

In [6]:
response = requests.get("http://httpbin.org/get", params= {"parameter":"dummy", "language":"python"})

In [8]:
print(response.text)

{
  "args": {
    "language": "python", 
    "parameter": "dummy"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "49.248.169.20", 
  "url": "http://httpbin.org/get?parameter=dummy&language=python"
}



How to perform http post request?

In [9]:
response = requests.post("http://httpbin.org/post", data="Some data to be posted")

In [10]:
print(response.text)

{
  "args": {}, 
  "data": "Some data to be posted", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "22", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "49.248.169.20", 
  "url": "http://httpbin.org/post"
}



In [11]:
print(requests.post("http://httpbin.org/post",data={"name":"alice", "email":"alice@wonderland.world"}).text)

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "email": "alice@wonderland.world", 
    "name": "alice"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "41", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "49.248.169.20", 
  "url": "http://httpbin.org/post"
}



In [14]:
!echo "Hello world" > /tmp/postdata.txt

In [15]:
print(requests.post("http://httpbin.org/post",data=open("/tmp/postdata.txt")).text)

{
  "args": {}, 
  "data": "Hello world\n", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "12", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "49.248.169.20", 
  "url": "http://httpbin.org/post"
}



### Working with json ###

In [16]:
import requests
response = requests.get("http://httpbin.org/get")
print(response.text)

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "49.248.169.20", 
  "url": "http://httpbin.org/get"
}



In [17]:
response.json()

{'args': {},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Connection': 'close',
  'Host': 'httpbin.org',
  'User-Agent': 'python-requests/2.18.4'},
 'origin': '49.248.169.20',
 'url': 'http://httpbin.org/get'}

In [18]:
d = response.json()

In [20]:
d['url']

'http://httpbin.org/get'

### Find popular repositories of vmware on github ###

In [32]:
import requests
url = "https://api.github.com/orgs/vmware/repos"
repos = requests.get(url).json()

In [33]:
type(repos)

list

In [34]:
for repo in repos:
    print(repo['full_name'], repo['forks'])

vmware/pyvco 4
vmware/rvc 46
vmware/rbvmomi 152
vmware/vprobe-toolkit 8
vmware/CloudFS 15
vmware/vcd-nclient 2
vmware/lmock 5
vmware/FireBreath 2
vmware/weasel 1
vmware/vmware-vcenter 83
vmware/vmware-vshield 6
vmware/vcloud-rest 38
vmware/GemstoneWebTools 0
vmware/vmware-vcsa 17
vmware/vmware-vmware_lib 23
vmware/saml20serviceprovider 1
vmware/pg_rewind 19
vmware/vco-powershel-plugin 2
vmware/jenkins-reviewbot 12
vmware/dbeekeeper 0
vmware/thinapp_factory 16
vmware/vmware-cassandra 4
vmware/vmware-java 0
vmware/data-driven-framework 2
vmware/pyvmomi 419
vmware/pyvmomi-community-samples 353
vmware/open-vm-tools 132
vmware/pyvmomi-tools 18
vmware/upgrade-framework 11
vmware/webcommander 29


Now lets find top 5 repos by number of forks. How will we do that? _*hint:can we use build in function `sorted`?*_

In [35]:
repos = sorted(repos , key=lambda r:r['forks'], reverse=True)[:5]

In [36]:
for repo in repos:
    print(repo['full_name'], repo['forks'])

vmware/pyvmomi 419
vmware/pyvmomi-community-samples 353
vmware/rbvmomi 152
vmware/open-vm-tools 132
vmware/vmware-vcenter 83
