# Programming Python: The Short Story

This book demonstrates Python in action.

The task is to construct a database.

There are other things in Python not covered here.

# The Task

To keep a database of some people.

There are programing tools for this task, but the point is that we will learn a lot by programming it ourselves.

# Step 1: Representing Records

First we have to decide how to represent a single record.

## Using Lists

The lists can be used. 

In [5]:
bob = ['Bob Smith', 42, 30000, 'software']
sue = ['Sue Jones', 45, 40000, 'hardware']

Each record is a list.

In [7]:
bob[0], sue[2]

('Bob Smith', 40000)

Processing is easy:

In [9]:
bob[0].split()[-1]

'Smith'

In [10]:
sue[2] *= 1.25
sue

['Sue Jones', 45, 50000.0, 'hardware']

The processing for variable bob above is from the left to the right.

### Start-up pointers

We will work in jupyter notebook.

### A database list

First we combine variables bob and sue into a database.

In [14]:
people = [bob, sue]
for person in people:
    print(person)

['Bob Smith', 42, 30000, 'software']
['Sue Jones', 45, 50000.0, 'hardware']


The variable people is a database.

In [16]:
people[1][0]

'Sue Jones'

In [17]:
for person in people:
    print(person[0].split()[-1])
    person[2] *= 1.20

Smith
Jones


In [18]:
for person in people:
    print(person[2])

36000.0
60000.0


We can do list comprehension and maps.

In [20]:
pays = [person[2] for person in people]
pays

[36000.0, 60000.0]

In [21]:
pays = map(lambda x: x[2], people)
pays = list(pays)
pays

[36000.0, 60000.0]

In [22]:
sum(pays)

96000.0

In [23]:
sum(person[2] for person in people)

96000.0

Adding new records to the database

In [25]:
people.append(['Tom', 50, 0, None])

In [26]:
len(people)

3

In [27]:
for person in people:
    print(person)

['Bob Smith', 42, 36000.0, 'software']
['Sue Jones', 45, 60000.0, 'hardware']
['Tom', 50, 0, None]


In [28]:
people[-1][0]

'Tom'

Weaknes: our database is in memory only.

### Field labels

Another weaknes: we are accesing fields by integer positions.

Let's try to use the range function:

In [31]:
NAME, AGE, PAY = range(3)
bob = ['Bob Smith', 42, 10000]

In [32]:
bob[NAME]

'Bob Smith'

In [33]:
PAY, bob[PAY]

(2, 10000)

The uppercase variables have become field names.  But we remain dependant on this numbering.

Another problem: there is no mapping from the field instances names to field names.

We might try the list of tuples structure:

In [35]:
bob = [['name', 'Bob Smith'], ['age', 42], ['pay', 10000]]
sue = [['name', 'Sue Jones'], ['age', 45], ['pay', 20000]]
people = [bob, sue]
people

[[['name', 'Bob Smith'], ['age', 42], ['pay', 10000]],
 [['name', 'Sue Jones'], ['age', 45], ['pay', 20000]]]

But it stil does not solve the problem, since we still have to index by position:

In [37]:
for person in people:
    print(person[0][1], person[2][1])

Bob Smith 10000
Sue Jones 20000


In [38]:
[person[0][1] for person in people]

['Bob Smith', 'Sue Jones']

In [39]:
for person in people:
    print(person[0][1].split()[-1]) # give last names
    person[2][1] *= 1.10

Smith
Jones


In [40]:
for person in people:
    print(person[2])

['pay', 11000.0]
['pay', 22000.0]


Let us inspect field names in loops:

In [42]:
for person in people:
    for name, value in person:
        if name == 'name':
            print(value)

Bob Smith
Sue Jones


Better yet writing a *fetcher* function to do this:

In [44]:
def field(record, label):
    for fname, fvalue in record:
        if fname == label:
            return fvalue

In [45]:
bob

[['name', 'Bob Smith'], ['age', 42], ['pay', 11000.0]]

In [46]:
field(bob, 'name')

'Bob Smith'

In [47]:
field(sue, 'pay')

22000.0

In [48]:
for rec in people:
    print(field(rec, 'name'), field(rec, 'age'))

Bob Smith 42
Sue Jones 45


This leads to set of record interface functions.  In the next chapter we will find a better way.

### Using Dictionaries

Python dictionaries seem to be a natural solution:

In [51]:
bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'}
sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'hdw'}

Now this is meaningful and does not depend on positions.

In [53]:
bob['name'], sue['pay']

('Bob Smith', 40000)

In [54]:
bob['name'].split()[-1]

'Smith'

In [55]:
sue['pay'] *= 1.1
sue['pay']

44000.0

Fields are accessed mnemonically now. This is more meaningful.

#### Other ways to make dictionaries

Namely there are several ways to do this.  Here is a function syntacs using named arguments (keyword arguments):

In [58]:
bob = dict(name='Bob Smith', age=42, pay=30000, job='dev')
sue = dict(name='Sue Jones', age=45, pay=40000, job='hdw')
bob

{'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'}

In [59]:
sue

{'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'hdw'}

alternativly we can fill individual fields in a dictionary:

In [61]:
sue = {}
sue['name'] = 'Sue Jones'
sue['age'] = 45
sue['pay'] = 40000
sue['job'] = 'htw'
sue

{'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'htw'}

or by the usage of the zip function:

In [63]:
names = ['name', 'age', 'pay', 'job']
values = ['Bob Smith', 42, 30000, 'dev']
list(zip(names, values))

[('name', 'Bob Smith'), ('age', 42), ('pay', 30000), ('job', 'dev')]

In [64]:
bob = dict(zip(names, values))
bob

{'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'}

Finaly, we can initialize a dictionary like this:

In [66]:
fields = ('name', 'age', 'pay', 'job')

record = dict.fromkeys(fields, '?')

record

{'name': '?', 'age': '?', 'pay': '?', 'job': '?'}

#### Lists of dictionaries

We still need to colect records into a database:

In [68]:
bob

{'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'}

In [69]:
sue

{'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'htw'}

In [70]:
people = [bob, sue]
people

[{'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'},
 {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'htw'}]

In [71]:
for person in people:
    print(person['name'], person['pay'], sep=', ')

Bob Smith, 30000
Sue Jones, 40000


In [72]:
for person in people:
    if person['name'] == 'Sue Jones':
        print(person['pay'])

40000


We use keys here:

In [74]:
names = [person['name'] for person in people]
names

['Bob Smith', 'Sue Jones']

In [75]:
list(map(lambda x: x['name'], people))

['Bob Smith', 'Sue Jones']

In [76]:
sum(person['pay'] for person in people)

70000

Interestingly, the list comprehensions and the generator expressions can approach SQL queries:

In [78]:
[rec['name'] for rec in people if rec['age'] >= 45]

['Sue Jones']

In [79]:
[(rec['age'] ** 2 if rec['age'] >= 45 else rec['age']) for rec in people]

[42, 2025]

In [80]:
G = (rec['name'] for rec in people if rec['age'] >= 45)
G

<generator object <genexpr> at 0x000001AF86320110>

In [81]:
next(G)

'Sue Jones'

In [82]:
G = ((rec['age'] ** 2 if rec['age'] >= 45 else rec['age']) for rec in people)
G

<generator object <genexpr> at 0x000001AF86320860>

In [83]:
G.__next__()

42

Easy access via Python dictionares:

In [85]:
for person in people:
    print(person['name'].split()[-1])
    person['pay'] *= 1.10

Smith
Jones


In [86]:
for person in people:
    print(person)

{'name': 'Bob Smith', 'age': 42, 'pay': 33000.0, 'job': 'dev'}
{'name': 'Sue Jones', 'age': 45, 'pay': 44000.0, 'job': 'htw'}


In [87]:
for person in people:
    print(person['pay'])

33000.0
44000.0


#### Nested structures

Nesting dictionaries even deeper, dictionaries within dictionaries:

In [89]:
bob2 = {'name': {'first': 'Bob', 'last': 'Smith'},
        'age': 42,
        'job': ['software', 'writing'],
        'pay': (40_000, 50_000)}

In [90]:
bob2

{'name': {'first': 'Bob', 'last': 'Smith'},
 'age': 42,
 'job': ['software', 'writing'],
 'pay': (40000, 50000)}

In [91]:
bob2['name']

{'first': 'Bob', 'last': 'Smith'}

In [92]:
bob2['name']['last']

'Smith'

In [93]:
bob2['pay'][1]

50000

Working with lists:

In [95]:
for job in bob2['job']:
    print(job)

software
writing


In [96]:
bob2['job'].append('janitor')
print(bob2['job'])

['software', 'writing', 'janitor']


In [97]:
bob2

{'name': {'first': 'Bob', 'last': 'Smith'},
 'age': 42,
 'job': ['software', 'writing', 'janitor'],
 'pay': (40000, 50000)}

#### Dictionaries of dictionaries

The whole database is expressed as a dictionary.  We are retrieving record by **symbolic key**:

In [99]:
bob = dict(name='Bob Smith', age=42, pay=30000, job='dev')
sue = dict(name='Sue Jones', age=45, pay=40000, job='hdw')
bob

{'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'}

In [100]:
db = {}
db['bob'] = bob
db['sue'] = sue
db

{'bob': {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'},
 'sue': {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'hdw'}}

In [101]:
db['bob']['name']

'Bob Smith'

In [102]:
db['sue']['pay'] = 50_000

In [103]:
db['sue']

{'name': 'Sue Jones', 'age': 45, 'pay': 50000, 'job': 'hdw'}

Notice how we access the data **directly**, instead of searching the data in a loop.

For nice output we use the **pprint** function.

In [105]:
db

{'bob': {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'},
 'sue': {'name': 'Sue Jones', 'age': 45, 'pay': 50000, 'job': 'hdw'}}

In [106]:
import pprint
pprint.pprint(db)

{'bob': {'age': 42, 'job': 'dev', 'name': 'Bob Smith', 'pay': 30000},
 'sue': {'age': 45, 'job': 'hdw', 'name': 'Sue Jones', 'pay': 50000}}


For iterating over the db we use dictionary iterators (keys of a dictionary):

In [108]:
for key in db:
    print(key, '=>', db[key])

bob => {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'}
sue => {'name': 'Sue Jones', 'age': 45, 'pay': 50000, 'job': 'hdw'}


In [109]:
for key in db:
    print(key, '=>', db[key]['name'])

bob => Bob Smith
sue => Sue Jones


In [110]:
for key in db:
    print(key, '=>', db[key]['pay'])

bob => 30000
sue => 50000


In [111]:
for key in db:
    print(db[key]['name'].split()[-1])
    db[key]['pay'] *= 1.10

Smith
Jones


In [112]:
for rec in db.values():
    print(rec['pay'])

33000.0
55000.00000000001


In [113]:
x = [db[key]['name'] for key in db]
x

['Bob Smith', 'Sue Jones']

In [114]:
x = [rec['name'] for rec in db.values()]
x

['Bob Smith', 'Sue Jones']

Adding a new record by simple dictionary-assignment:

In [116]:
db['tom'] = dict(name='Tom', age=50, job=None, pay=0)
db

{'bob': {'name': 'Bob Smith', 'age': 42, 'pay': 33000.0, 'job': 'dev'},
 'sue': {'name': 'Sue Jones',
  'age': 45,
  'pay': 55000.00000000001,
  'job': 'hdw'},
 'tom': {'name': 'Tom', 'age': 50, 'job': None, 'pay': 0}}

In [117]:
pprint.pprint(db)

{'bob': {'age': 42, 'job': 'dev', 'name': 'Bob Smith', 'pay': 33000.0},
 'sue': {'age': 45,
         'job': 'hdw',
         'name': 'Sue Jones',
         'pay': 55000.00000000001},
 'tom': {'age': 50, 'job': None, 'name': 'Tom', 'pay': 0}}


In [118]:
db['tom']['name']

'Tom'

In [119]:
list(db.keys())

['bob', 'sue', 'tom']

In [120]:
len(db)

3

In [121]:
[rec['age'] for rec in db.values()]    

[42, 45, 50]

In [122]:
[rec['name'] for rec in db.values() if rec['age'] >= 45]  

['Sue Jones', 'Tom']

This dictionary of dictionaries is still transient in memory, but it corresponds totally to the system **shelve** for permanent storage.

# Step 2: Storing Records Persistently

The objects so far live only in memory.  We need to save them to the disk.

## Using Formated Files

This is the most primitive way to do it.  We use simple text files.

### Test data script

We will first create our database.  So that we will not need to type interactively, we shall do it in a file:

In [125]:
%%writefile initdata.py

# initalize data to be stored in files, pickles, shelves

# records
bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'}
sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'hdw'}
tom = {'name': 'Tom', 'age': 50, 'pay': 0, 'job': None}

# database
db = {}
db['bob'] = bob
db['sue'] = sue
db['tom'] = tom

print(__name__)

if __name__ == '__main__':
    for key in db:
        print(key, '=>\n  ', db[key])

Overwriting initdata.py


Note how we used the magic command **writefile**.

By exporting this cell to the file **initdata.py** we will be later able to **import** it into our code as a module.

In [127]:
import initdata

initdata


In [128]:
for key in db:
    print(key, '=>\n  ', db[key])

bob =>
   {'name': 'Bob Smith', 'age': 42, 'pay': 33000.0, 'job': 'dev'}
sue =>
   {'name': 'Sue Jones', 'age': 45, 'pay': 55000.00000000001, 'job': 'hdw'}
tom =>
   {'name': 'Tom', 'age': 50, 'job': None, 'pay': 0}


### File name conventions

The data is located in the **PP4E** directory.

### Script start-up pointers

Sometimes you need to include the **input()** function to force the cmd window not to desapear too quickly.

### Data format script

Now all that we need is a way to store the database.  This is our first try:

In [130]:
# %%writefile make_db_file.py

import sys

dbfilename = 'people-file'
ENDDB = 'enddb.'
ENDREC = 'endrec.'
RECSEP = '=>'

def storeDbase(db, dbfilename=dbfilename):
    "formated dump of database to flat file"
    if dbfilename == None:
        dbfile = sys.stdout
    else:
        dbfile = open(dbfilename, 'w')
    for key, rec in db.items():
        print(key, file=dbfile)
        for name, value in rec.items():
            print(name + RECSEP + repr(value), file=dbfile)
        print(ENDREC, file=dbfile)
    print(ENDDB, file=dbfile)
    dbfile.close()

def loadDbase(dbfilename=dbfilename):
    "parse data to reconstruct database"
    dbfile = open(dbfilename, 'r')
    sys.stdin = dbfile
    db = {}
    key = input()
    while key != ENDDB:
        rec = {}
        field = input()
        while field != ENDREC:
            name, value = field.split(RECSEP)
            rec[name] = eval(value)
            field = input()
        db[key] = rec
        key = input()
    return db
    
# if __name__ == '__main__':
    # from initdata import db
    # storeDbase(db, 'file01.txt')
    #
    # db = loadDbase('file01.txt')
    # storeDbase(db, 'file02.txt')
    #
    # db = loadDbase('file02.txt')
    # storeDbase(db, dbfilename=None)

Overwriting make_db_file.py


This was our first rather lengthy try.  

Note the **repr** function to get more raw output when storing to disk and note the **eval** function when accessing the data from the disk.  

Now let see how **initdata** defines the database **db**:

In [132]:
from initdata import *
from pprint import pprint

pprint(db)

{'bob': {'age': 42, 'job': 'dev', 'name': 'Bob Smith', 'pay': 30000},
 'sue': {'age': 45, 'job': 'hdw', 'name': 'Sue Jones', 'pay': 40000},
 'tom': {'age': 50, 'job': None, 'name': 'Tom', 'pay': 0}}


Here we store the database using **storeDbase**:

In [134]:
from initdata import *
from make_db_file import *

storeDbase(db)

for line in open('people-file'):
    print(line, end='')

bob
name=>'Bob Smith'
age=>42
pay=>30000
job=>'dev'
endrec.
sue
name=>'Sue Jones'
age=>45
pay=>40000
job=>'hdw'
endrec.
tom
name=>'Tom'
age=>50
pay=>0
job=>None
endrec.
enddb.


### Utility scripts

This program reloads the database:

In [None]:
# %%writefile dump_db_file

# from make_db_file import loadDbase

db = loadDbase()

# for key in db:
#     print(key, '=>\n', db[key])

# print(db['sue']['name'])

# Shelves

In [136]:
from initdata import bob, sue


In [137]:
print(__name__)

__main__
