### Working with files ###

In [1]:
%%file three.txt
one
two
three

Overwriting three.txt


In [2]:
fhandle = open("three.txt")

In [3]:
fhandle.read() # you can read whole content in a single statement

'one\ntwo\nthree'

In [4]:
fhandle.read() # trying read after this will result in empty string

''

In [5]:
fhandle.close()

In [6]:
!python3 -c "import this" > data.txt

In [7]:
filehandle = open("data.txt")

In [8]:
filehandle.readline() # possible to read one line at a time

'The Zen of Python, by Tim Peters\n'

In [9]:
lines = filehandle.readlines() # read content of file and return as lines

In [10]:
lines

['\n',
 'Beautiful is better than ugly.\n',
 'Explicit is better than implicit.\n',
 'Simple is better than complex.\n',
 'Complex is better than complicated.\n',
 'Flat is better than nested.\n',
 'Sparse is better than dense.\n',
 'Readability counts.\n',
 "Special cases aren't special enough to break the rules.\n",
 'Although practicality beats purity.\n',
 'Errors should never pass silently.\n',
 'Unless explicitly silenced.\n',
 'In the face of ambiguity, refuse the temptation to guess.\n',
 'There should be one-- and preferably only one --obvious way to do it.\n',
 "Although that way may not be obvious at first unless you're Dutch.\n",
 'Now is better than never.\n',
 'Although never is often better than *right* now.\n',
 "If the implementation is hard to explain, it's a bad idea.\n",
 'If the implementation is easy to explain, it may be a good idea.\n',
 "Namespaces are one honking great idea -- let's do more of those!\n"]

lets do some work with this data. how about computing number of words on every line?

In [11]:
filehandle = open("data.txt")

In [12]:
for line in filehandle.readlines():
    print(len(line.strip().split()))

7
0
5
5
5
5
5
5
2
9
4
5
3
10
13
12
5
8
11
13
12


### Do it yourself ###
- Write a program cat.py equivalent to `cat` command in unix.
```
python cat.py three.txt
one
two
three
```

- Write a program head.py equivalent to unix command head. it should take first commandline argument as number of lines and second argument as filename

```
python head.py 5 data.txt
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
```

- Write a program wc.py which implements functions to compute line count, word count and charecter count.
```
python wc.py data.txt
20 144 856
```

In [13]:
!python cat.py three.txt

one
two
three

In [14]:
%%file cat.py
"""
module cat is rough implementation of unix command cat.
"""
import sys

def cat(file):
    """
    prints contents of file to standard output
    """
    f = open(file)
    
    for line in f.readlines():
        print(line, end="")# try this first without end=""
    
    f.close()

if __name__ == "__main__":
    cat(sys.argv[1])

Overwriting cat.py


In [15]:
%%file head.py
"""
module head is rough implementation of unix command head.
it prints initial part of file to standard output
"""
import sys

def head(file, n):
    """
    prints initial n lines of file to standard output
    """
    f = open(file)
    for line in f.readlines()[:n]:
        print(line, end="")

if __name__ == "__main__":
    head(sys.argv[2], int(sys.argv[1]))

Overwriting head.py


In [16]:
!python head.py 5 data.txt 

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.


In [17]:
%%file wc.py
"""
module wc implements unix equivalent of wc command
"""
import sys

def line_count(f):
    return len(open(f).readlines())

def word_count(f):
    return len(open(f).read().split())

def char_count(f):
    return len(open(f).read())

if __name__ == "__main__":
    file = sys.argv[1]
    print(line_count(file), word_count(file), char_count(file))

Overwriting wc.py


In [18]:
!python wc.py data.txt

21 144 857


### Do it yourself ###
- Using above wc module 
  1. Find file with maximum number of lines in current directory
  2. How about file with maximum number of words?

In [19]:
import os
files = [f for f in os.listdir(os.getcwd()) if os.path.isfile(f)]

In [20]:
import wc

In [21]:
max(files, key=wc.line_count)

'Foundation1.html'

In [22]:
max(files, key=wc.word_count)

'Foundation1.html'

### Writing files ###
for writing you need to open file with write mode.

In [23]:
f = open("primes.txt","w")
f.write("two\n")
f.write("three\n")
f.write("five\n")
f.close() # only on close contents are flushed to disk.

Lets check contents of our file with our own module `cat.py`

In [24]:
!python cat.py primes.txt 

two
three
five


When we open file with `w` mode , file contents are overwritten. if appned is expected, open the file with `a`mode

In [25]:
f = open("primes.txt", "a")
f.write("seven\n")
f.write("eleven\n")
f.close()

In [26]:
!python cat.py primes.txt

two
three
five
seven
eleven


Similarly we can read and write binary files with following modes.
1. `rb`  => read in binary mode
2. `wb` => write in binary mode
3. `ab` => append in binary mode

In [27]:
open("primes.txt",'r').read() # text mode

'two\nthree\nfive\nseven\neleven\n'

In [28]:
open("primes.txt", "rb").read() # binary mode

b'two\nthree\nfive\nseven\neleven\n'

In [29]:
f = open("binarydata.bin", "wb")
f.write(b'x025x082')
f.close()

In [30]:
open("binarydata.bin", "rb").read()

b'x025x082'

In [31]:
f = open("binarydata.bin", "ab")
f.write(b'Hello')
f.close()

In [32]:
open("binarydata.bin", "rb").read()

b'x025x082Hello'

### with statement ###
with `with` statements it becomes very easy to write file as file handle is closed automatically by python after exit from with block.

In [33]:
with open("primes.txt", "a") as f:
    f.write("thirteen")

In [34]:
open("primes.txt").read() # note that contents are flushed to disk only after close

'two\nthree\nfive\nseven\neleven\nthirteen'

In [35]:
with open("regional.txt","w", encoding="utf-8" ) as regional:
    regional.write("\u0c05\u0c06")

In [36]:
open("regional.txt", encoding="utf-8").read()

'అఆ'

In [37]:
open("regional.txt", "rb").read()

b'\xe0\xb0\x85\xe0\xb0\x86'

### Do it yourself ###
1. Write a function to write multiplication tables upto 11 in csv format in a file as given below.
```
1,2,3,4,5,6,7,8,9,10,11
2,4,6,8,10,12,14,16,18,20,22
.
.
.
```
2. Write python module tabulate.py which converts a cvs file data into a nice prety print table format.
```
python tabulate.py multi_tables.csv tabulated.txt
cat tabulated.txt
  1   2   3   4   5   6   7   8   9  10  11
  2   4   6   8  10  12  14  16  18  20  22
  3   6   9  12  15  18  21  24  27  30  33
  4   8  12  16  20  24  28  32  36  40  44
  5  10  15  20  25  30  35  40  45  50  55
  6  12  18  24  30  36  42  48  54  60  66
  7  14  21  28  35  42  49  56  63  70  77
  8  16  24  32  40  48  56  64  72  80  88
  9  18  27  36  45  54  63  72  81  90  99
 10  20  30  40  50  60  70  80  90 100 110
```


In [1]:

def write_tables(n, filename):
    def tables(n):
        return [[str(m*i) for m in range(1,n+1)] for i in range(1,11)]
    
    with open(filename, "w") as f:
        data = tables(n)
        for row in data:
            f.write(",".join(row))
            f.write("\n")
    
        

In [2]:
write_tables(11, "multi_tables.csv")

In [3]:
!cat multi_tables.csv

1,2,3,4,5,6,7,8,9,10,11
2,4,6,8,10,12,14,16,18,20,22
3,6,9,12,15,18,21,24,27,30,33
4,8,12,16,20,24,28,32,36,40,44
5,10,15,20,25,30,35,40,45,50,55
6,12,18,24,30,36,42,48,54,60,66
7,14,21,28,35,42,49,56,63,70,77
8,16,24,32,40,48,56,64,72,80,88
9,18,27,36,45,54,63,72,81,90,99
10,20,30,40,50,60,70,80,90,100,110


In [4]:
%%file tabulate.py
"""
# tabulate.py
converts cvs file in to nice prety print table
"""
import sys

def parse_csv(filename):
    return [line.strip().split(",") for line in open(filename).readlines()]
    
def tabulate(csv, tabular):
    data = parse_csv(csv)
    maxwordlength = len(max([max(row, key=len) for row in data], key=len))
    with open(tabular, "w") as f:
        for row in data:
            f.write(" ".join([word.rjust(maxwordlength) for word in row]))
            f.write("\n")
    
    

if __name__ == "__main__":
    tabulate(sys.argv[1], sys.argv[2])

Overwriting tabulate.py


In [5]:
!python tabulate.py multi_tables.csv tabulated.txt

In [45]:
!cat tabulated.txt

  1   2   3   4   5   6   7   8   9  10  11
  2   4   6   8  10  12  14  16  18  20  22
  3   6   9  12  15  18  21  24  27  30  33
  4   8  12  16  20  24  28  32  36  40  44
  5  10  15  20  25  30  35  40  45  50  55
  6  12  18  24  30  36  42  48  54  60  66
  7  14  21  28  35  42  49  56  63  70  77
  8  16  24  32  40  48  56  64  72  80  88
  9  18  27  36  45  54  63  72  81  90  99
 10  20  30  40  50  60  70  80  90 100 110


### Writing to stderr, stdout ###
Any proces when starts it opens up three files, stdout, stderr, stdin. 
1. stdout -> standard output usually directed to terminal
2. stdinput -> standard input, usually takes from keyboard
3. stderr -> standard error

python provides handle to write to these buffers via sys module

In [None]:
import sys
sys.stdout.write("Hello Python")

In [None]:
sys.stderr.write("Error: some ..Exception..")

### Dictionaries ##
Dictionaries are used internally extensively by python langauge. Lets try to work with dictionaries

In [50]:
author = {'name':"lewis carrol", 
          'books':["alice in wonderland", "looking through the glass"],
          'language':"english"}

In [51]:
author['name']

'lewis carrol'

In [53]:
author['books']

['alice in wonderland', 'looking through the glass']

In [54]:
print(author)

{'books': ['alice in wonderland', 'looking through the glass'], 'language': 'english', 'name': 'lewis carrol'}


In [55]:
author['name'] = "lewis"

In [56]:
print(author)

{'books': ['alice in wonderland', 'looking through the glass'], 'language': 'english', 'name': 'lewis'}


In [57]:
del author['books']

In [58]:
print(author)

{'language': 'english', 'name': 'lewis'}


In [59]:
"name" in author

True

In [60]:
"lewis" in author

False

In [61]:
author['language']

'english'

In [62]:
author.get("language")

'english'

In [64]:
'books' in author # what happens if we try to get item that it not there in dictionary

False

In [65]:
author['books]']

KeyError: 'books]'

In [66]:
author.get("books",[])

[]

Lets use dictionaries in real world. Here is how default grub config file looks like on my debian.
```
GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`
GRUB_CMDLINE_LINUX_DEFAULT="quiet"
GRUB_CMDLINE_LINUX=""
```
I want to have two different configs for ubuntu and debian as there are some modules which are not compatible on debian for my laptop hardware

In [76]:
grub_debian = {
    'GRUB_DEFAULT' :0,
    'GRUB_TIMEOUT' :5,
    'GRUB_DISTRIBUTOR' :"`lsb_release -i -s 2> /dev/null || echo Debian`",
    'GRUB_CMDLINE_LINUX_DEFAULT' :"quiet splash modprob.blacklist=dw_dmac, dw_dmax_core",
    'GRUB_CMDLINE_LINUX' : ""
}

In [77]:
grub_ubuntu ={
        'GRUB_DEFAULT' :0,
        'GRUB_HIDDEN_TIMEOUT_QUIET' :'true',
        'GRUB_TIMEOUT' :10,
        'GRUB_DISTRIBUTOR' :"`lsb_release -i -s 2> /dev/null || echo Debian`",
        'GRUB_CMDLINE_LINUX_DEFAULT' :"quiet splash reboot=bios",
        'GRUB_CMDLINE_LINUX' : ""
}

In [73]:
grub_config = {'debian':grub_debian,
               'ubuntu':grub_ubuntu
              }

In [74]:
grub_config['debian']

{'GRUB_CMDLINE_LINUX': '',
 'GRUB_CMDLINE_LINUX_DEFAULT': 'quiet splash modprob.blacklist=dw_dmac, dw_dmax_core',
 'GRUB_DEFAULT': 0,
 'GRUB_DISTRIBUTOR': '`lsb_release -i -s 2> /dev/null || echo Debian`',
 'GRUB_TIMEOUT': 5}

In [75]:
grub_config['debian']['GRUB_CMDLINE_LINUX_DEFAULT']

'quiet splash modprob.blacklist=dw_dmac, dw_dmax_core'

### Iterating over dictionaries ###

In [79]:
d = {"one":1, "two":2 , "three":3}

Iterating over keys

In [81]:
for key in d.keys():
    print(key, d[key])

one 1
three 3
two 2


Iterating over values directly

In [82]:
for value in d.values():
    print(value)

1
3
2


Iterating over keys and values together

In [83]:
for k,v in d.items():
    print(k,v)

one 1
three 3
two 2


what if we iterate directly over dictionary?

In [84]:
for item in d:
    print(item)

one
three
two


it goes over keys!

In [87]:
numbers = [("one", 1), ("two", 2) , ("three", 3)]

In [88]:
dict(numbers)

{'one': 1, 'three': 3, 'two': 2}

In [91]:
dict(zip(('a','b','c'), (1, 2, 3)))

{'a': 1, 'b': 2, 'c': 3}

In [93]:
items = ("Pen", "Pencil", "Colorbox")
prices = (25, 10, 50)
cart = dict(zip(items, prices))

In [94]:
for item, price in cart.items():
    print(item.rjust(8), price)
print("-"*12)
print("Total".rjust(8), sum(cart.values()))

  Pencil 10
     Pen 25
Colorbox 50
------------
   Total 85


### Do it yourself ###
- write a function unzip which operates over a dictionary and returns keys and and values as seperate lists.

In [97]:
def unzip(d):
    keys = d.keys()
    values = [d[k] for k in keys]
    return list(keys), values

In [98]:
unzip(cart)

(['Pencil', 'Pen', 'Colorbox'], [10, 25, 50])

### More examples ###
Write a program to compute frequency of words in a file.

In [104]:
%%file words.txt
five
five four
five four three
five four three two
five four three two one
six seven eight nine
six seven eight
six seven
six

Overwriting words.txt


In [105]:
%%file wordfreq.py
"""
computes frequency of every word in a file
"""
import sys

def read_words(filename):
    return open(filename).read().split()

def wordfreq(words):
    freq = {}
    for word in words:
        if word in freq:
            freq[word] += 1
        else:
            freq[word] = 1
    return freq

if __name__ == "__main__":
    words = read_words(sys.argv[1])
    freq = wordfreq(words)
    print(freq)

Overwriting wordfreq.py


In [106]:
!python wordfreq.py words.txt

{'two': 2, 'one': 1, 'six': 4, 'five': 5, 'nine': 1, 'three': 3, 'seven': 3, 'four': 4, 'eight': 2}


Can we improve this?

In [113]:
%%file wordfreq.py
"""
computes frequency of every word in a file
usage:
python wordfreq.py filename
"""
import sys

def read_words(filename):
    return open(filename).read().split()

def wordfreq(words):
    freq = {}
    for word in words:
        freq[word] = freq.get(word,0) + 1
    return freq

if __name__ == "__main__":
    words = read_words(sys.argv[1])
    freq = wordfreq(words)
    print(freq)

Overwriting wordfreq.py


In [7]:
!python wordfreq.py words.txt

{'six': 4, 'five': 5, 'eight': 2, 'four': 4, 'nine': 1, 'three': 3, 'seven': 3, 'one': 1, 'two': 2}


In [8]:
def wordfreq1(words):
    freq = {}
    uniqwords = set(words)
    for w in uniqwords:
        freq[w] = words.count(w)
    return freq

In [9]:
import wordfreq

In [10]:
words = wordfreq.read_words("words.txt")

In [11]:
freq = wordfreq1(words)

How about printing it nicely

In [12]:
for w, f in freq.items():
    print(w.rjust(5),f)

eight 2
  two 2
 four 4
seven 3
 five 5
 nine 1
  one 1
  six 4
three 3


In sorted order of frequencies?

In [17]:
for k,v in sorted(freq.items()):
    print(k.rjust(5), v)

eight 2
 five 5
 four 4
 nine 1
  one 1
seven 3
  six 4
three 3
  two 2


In [18]:
for k,v in sorted(freq.items(), key=lambda x:x[1]):
    print(k.rjust(5),v)

 nine 1
  one 1
eight 2
  two 2
seven 3
three 3
 four 4
  six 4
 five 5


In [20]:
for k,v in sorted(freq.items(), key=lambda x:x[1], reverse=True):
    print(k.rjust(5),v)

 five 5
 four 4
  six 4
seven 3
three 3
eight 2
  two 2
 nine 1
  one 1


Print a horixontal Text histogram of word frequency

In [22]:
for k,v in sorted(freq.items(), key=lambda x:x[1], reverse=True):
    print(k.rjust(5),v, "*"*v)

 five 5 *****
 four 4 ****
  six 4 ****
seven 3 ***
three 3 ***
eight 2 **
  two 2 **
 nine 1 *
  one 1 *


Grouping all keys with given values!

In [23]:
team = {"david":"USA", "anand":"india", "linus":"USA","nouful":"india","alice":"UK"}

In [24]:
[name for name in team.keys() if team[name]=="india"]

['nouful', 'anand']

In [25]:
[name for name in team.keys() if team[name]=="USA"]

['linus', 'david']

### Pitfalls ###

In [29]:
x = [1, 2, 3, 4]
y = x
y.append(5)
print(x)

[1, 2, 3, 4, 5]


In [30]:
x = [1, 2, 3, 4]
y = x
y = [1, 2, 3]
print(x)

[1, 2, 3, 4]


In [31]:
x = 1
y = x
y = 2
print(x)

1


### Classes ###