# Python Appetiser - an Overview
1. Intro Python
2. os.walk
3. Webscraping met BeautifulSoup (with star performer requests)
4. Relationele databases
5. Pandas &amp; matplotlib
6. Webserver met Flask
7. Jinja2 templating
8. Log file analysis

In [None]:
from IPython.display import display, IFrame

def iframe(src, width='100%', height=300):
    return IFrame(src, width=str(width), height=str(height), scrolling='no', frameborder='0')

# Intro Python

Python is a very powerful programming language
* designed to create readable code
* a very ecosystem covering many domains
* very portable (Windows, UNIX/Linux, Mac OSX, ...)
* mixes well with other languages
* easy to learn
* indentation matters
* two versions:
  - Python 2 still being used (not all modules available in version 3)
  - Python 3 adoption is almost complete

## Install Python

* Basic installation (pre-packaged from https://www.python.org/downloads/), or
* Available from your OS (bijv. Red Hat, SuSE, Mac OS X)
* Python distribution (add many additional packages to standard Python):
    - WinPython, 
    - ActivePython,
    - Anaconda,
    - Enthought Canopy, 
    - Python(x,y)

## Extend Python
* `pip` ("Pip Installs Packages")
    - standard tool, repository: Python Package Index (PyPI, > 185k packages)
* OS specific
    - `yum` (Red Hat-like Linux systems, also SuSE)
    - `apt-get` (Debian-like Linux system, like Ubuntu)
    - `HomeBrew` (Mac OS X)
* Distribution specific, e.g.:
    - `enpkg` (for Enthought Canopy), and
    - `conda` (for Continuum's Anaconda)

## PIP (package manager/installer program)
Many packages and modules are pre-installed with each Python installation. <br/>
Use pip to list, add or remove additional ones.
* `user@system:~$: pip --help` to learn about pip commands and flags

* `user@system:~$: pip list` to list installed packages

* `user@system:~$: pip install pkg` to install a packages

Some Python distributions have their own manager/installer.
* e.g. Anaconda uses `conda`

## The Python lifecycle

* **1989**: Python version 1
* **2000**: Python version 2
* **2008**: Python version 3


The latest major release is 3.7.

Python 2 is still being supported, but no new features will be added: <br/>
2.7 is the final major Python 2 release.

Python 2 and 3 are  incompatible. The biggest differences:

- strings: ASCII vs. Unicode
- print: statement vs. function
- integer division: integer vs. floating point result
- common use of generators

* Python was inspired by the computer language *ABC*
* developed at the CWI
* a tool to teach students how to program.

The main developers of  *ABC* are Frank van Dijk, Timo Krijnen, Guido van Rossum
and Eddy Boeve.

The difference between Python and *ABC* is in the possibility to make real applications and not
just code examples or small scripts.

## Creating and executing a Python program
* To execute a python script, type `python` followed by the name of the script

  ```
    user@system:~$: python hello_world.py
  ```
  

* Use an editor like `sublime text`, `vim` or `emacs` to create or modify the script

  ```
    user@system:~$: vim hello_world.py
  ```
  

* Python can also be used interactively - like a command shell

  ```
    user@system:~$: python
    Python 3.7.3 (default, Mar 27 2019, 22:11:17) 
    [GCC 7.3.0] :: Anaconda, Inc. on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> print('Hello, World!')
    Hello, World!
    >>> quit()
  ```

## Variables
Python variables are strongly typed
* no explicit type declaration
* the assigned value determines the type

Names of variables
* consist of letters, numbers and underscores (`_`)
* must start with a letter or an underscore
* are case sensitive
* the use of non-western letters is deprecated
* no reserved words such as:
    `if`, `for`, `with` and `in`.

## Simple data types
The basic data types are:
- numbers:
  - integer (`int`)
  - real (`float`)
  - complexe getallen (`complex`)

- text:  Unicode strings (`str`)
- Boolean:  truth-values (`bool`)

### Numbers: `int`

The numerical type for integer numbers is `int`.

Examples of integer numbers:

In [None]:
42                               # decimal notation 

In [None]:
0b1001                           # binary notation

In [None]:
0xff                              # hexadecimal notation

In [None]:
7137789078178097823478092341789 ** 20
# no effective size limit

### Numbers: `float`

The type for real numbers is `float`.<br/>
Some real numbers:

In [None]:
3.141592653589793            # Pi

In [None]:
6.022140857e23               # Avogadro

In [None]:
1.0 * 10**100                # Googol

### Text: `str`

A string (type `str`) is a sequence of *Unicode*-characters.
- between single (`'`) or double (`"`) quotes (begin and end must match)

In [None]:
"Here's Python!"

In [None]:
'Inspired by "Monty Python"' 

- backslash (`\`) starts an escape code for special characters:

In [None]:
print("characters like \t, \\, \n and \"it costs \u20ac12,-\"")

### Boolean: `bool`

The Boolean data type denotes truth values true or false.
- Reserved words: `True` en `False`
- Important for flow control in your program.
- Often the result of logical operations and comparisons

In [None]:
5 <= -10

In [None]:
6 / 4 == 1.5

# Quiz

## Statements

Python is line oriented: each line is one statement.<br/>
Split long statements into multiple lines:
- add a backslash (`\`), just before the line end:
``` python
    moving_average = first_function(observations) / \
                        second_function(observations, 3)
```
- construction with parentheses ( `()`, `[]` or `{}`) is allowed over multiple lines:
``` python
    result = my_function(observations,
                         'Description'
                          (1, 3, 6))
```

### Blocks

Code blocks are determined by indentation.
- Example `if` - statement:
``` python
    if a <= b:
        if a != b:
            print("a is smaller than b")
    else:
        print("a is greater than b")
```
- recommended:
  * use 4 spaces per indentation level

### `if` - statement
``` python
    if condition:
        code block
        
    elif condition:
        code block
        
    else:
        code block
        ....
```
- the `elif`-part can occur zero or more times
- the `else`-part is optional

### `while` - loop
``` python
    while condition:
        code block
```
As long as `condition` is True, the code block is executed.

It is possible to *jump* out of the loop using a `break`-statement. Example:

In [None]:
n = 1
fac = 1

while n < 100:
    fac = fac * n
    if fac > 10000:
        break
    print(n, fac)
    n = n + 1

### `for`-loop
Do something for all elements in a collection.
``` python
    for element in collection:
        code block
```
The `for`-loop is use with all collection types.<br/>
It works identical for all of them:


In [None]:
l = [1, 2, 4]                             # list
s = "Hey"                                 # string
d = {"a": 15, "e": 3, "i": 13}            # dict

In [None]:
for e in d: 
    print(e)

In [None]:
for e in s: print(e, end=' ')

In [None]:
for e in d: print(e, d[e], end=' ')

### Exceptions
Errors that are not caught (more general: Exceptions) cause a program to crash.
``` python
   n_items = 0
   print(5 / n_items)
```
```
   ---------------------------------------------------------------------------
   ZeroDivisionError                         Traceback (most recent call last)
   <ipython-input-20-97156f842340> in <module>()
         1 n_items = 0
   ----> 2 print(5 / n_items)

   ZeroDivisionError: division by zero
```

- Of course, you can be defensive and check everything beforehand...
- but there is a more convenient way...

### Catch exceptions `try: except:`

In [None]:
n_items = 0

try:
    print(5 / n_items)
    print("Calculation is ready")
except ZeroDivisionError:
    print("There where no items")

print("This part is always reached")

# Quiz

## Container structures
Container types - objects that can contain other objects

| type  | description |          examples              | content           | properties
|:------|:------------|:------------------------------|:------------------|:---------
| str   | String      | `'Ni Hao', "Don't mind"`        | chars             | immutable, ordered
| list  | List        | `[1, 2, 'Peter', 4]`            | objects           | mutable, ordered
| tuple | Tuple       | `(1, 2, 'ABC', 4, 'U')`         | objects           | immutable, ordered
| dict  | Dictionary  | `{'John': 1975, 'Mary': 1979}`  | objects           | mutable, unordered, key-access
| set   | Set         | `{3, 42, 19, 55}`               | immutable objects | mutable, unordered


### `list`
- mutable: replace, add or remove elements
- can contain arbitrary objects
- order is maintained
- use an index or a slice to retrieve elements

In [None]:
k = []                     # an empty list
l = ["first", 2, 3]        # 3 elements
m = ["combi", k, l]        # a list with elements from the previous
print(m)

[ x * x  for x in range(10)]

### `list` *indexing* and *slicing*

In [None]:
primes = [2, 3, 5, 7, 11, 13]   # create a list
primes[1]                       # retrieve the 2nd element

In [None]:
primes[0:3]                     # retrieve the elements 0, 1 and 2 op

In [None]:
if 7 in primes:                  # check if 7 is an element of primes
    print('yes!')
else:
    print('No')

In [None]:
pp = primes[:]                  # make a copy of primes
pp.append(17)                   # add 17 at the end of this copy
help(pp.sort)

For lists (and other sequences) Python uses an index, 
which starts from 0 and ends with *n-1*, with *n* the number of elements.

#### list methods:
| method       | signature            | returns| description |
|--------------|----------------------|--------|-------------|
| **append()** | `L.append(object)`   | `None` | add an object at the end |
| **clear()**  | `L.clear()`          | `None` | remove all items from L |
| **copy()**   | `L.copy()`           | `list` | a shallow copy of L |
| **count()**  | `L.count(value)`     | `int`  | returns the number of occurrences of value in L |
| **extend()** | `L.extend(iterable)` | `None` | extend the list with the elements in iterable |
| **index()**  | `L.index(value, [start, [stop]])` | `int` | returns the index of the first occurence of value. ValueError if not found. |
| **insert()** | `L.insert(index, object)` | `None` | at object at position index |
| **pop()**    | `L.pop(index=-1)`    | `item` | remove and return the item at index. IndexError if empty list or wrong index. |
| **remove()** | `L.remove(value)`    | `None` | remove the first occurence of value. ValueError if value not found. |
| **sort()**   | `L.sort(key=None, reverse=False)` | `None` | sort *IN SITU* |

### `dict`
- mutable: replace, add or remove elements
- can contain arbitrary objects
- each element is stored with an associated key
- key must be unique and immutable
- a single element can be retrieved with the key

In [None]:
d = { 1: 3, "two": 6, 3: 0}    # a dict with keys 1, "two" and 3
d['googol'] = 10.0 ** 100       # add a big value with key "googol"
d["two"]                       # retrieve the value from key "two"
d

In [None]:
6 in d                         # d has a key 1

In [None]:
del d[3]                       # delete the element with key 3
d

#### Dict methods:

| method           | signature      | returns   | description |
|------------------|----------------|-----------|-------------|
| **clear()**      | `D.clear()`    | `None`    | empty `D`   |
| **copy()**       | `D.copy()`     | dict      | make a shallow copy of `D` | 
| **fromkeys()**   | `dict.fromkeys(iterable, value=None)` | dict | make a new dict with keys from iterable and values equal to value. | 
| **get()**        | `D.get(k[,d])` | `v` | `D[k]` if `k` in `D` and `d` otherwise. Default value of `d` is `None` | 
| **items()**      | `D.items()`    | `{(k,v),...}` | return a list-like object with `D`'s key-value pairs | 
| **keys()**       | `D.keys()`     | `{k,...}` | return a list-like object with `D`'s keys | 
| **pop()**        | `D.pop(k[,d])` | `v` | remove the indicated key and return its associated value. If `k` is not found `d` is returned. KeyError if also `d` is not provided. | 
| **popitem()**    | `D.popitem()`  | `(k, v)`  | remove and return an arbitrary key-value pair as a 2-tuple; KeyError if `D` is empty. | 
| **setdefault()** | `D.setdefault(k[,d])` | `v`| `D.get(k,d)`, als put `D[k]=d` if `k` not in `D` | 
| **update()**     | `D.update([E,] **F)` | `None`  | update `D` from dict/iterable `E` en `F`.        If `E` is given and `E.keys()` exists, then:        `for k in E: D[k] = E[k]`        If `E` is geven, but there is no method `E.keys()`, then:        `for k, v in E: D[k] = v`        In any case followed by: `for k in F:  D[k] = F[k]` | 
| **values()**     | `D.values()`       | `[v,...]` | return a list-like object with `D`'s values | 

# Quiz

## File I/O
### Opening a file
``` python
    f = open(fname, mode)
```
open modes:

| Character	| Description | Remark
|-----------|-------------|-------------------------------------------------
| '`r`'     | read        | (default) error if file does not exist
| '`w`'     | write	      | truncate the file, create a new one if necessary
| '`x`'     | create      | error if the file exists
| '`a`'     | append      | create a new file if necessary
can be combined with:

| Character	| Description | Remark
|-----------|-------------|-------------------------------------------------
| '`t`'     | text mode   | (default) uses strings (`"..."`)
| '`b`'     | binary mode | uses `bytes` objects (`b"..."`)
| '`+`'     | updating    | reading *and* writing


### Reading text files

In [None]:
import sys

fname = "../data/story.txt"

try:
    f = open(fname, "rt")              # open-call can fail, so ...
except Exception:                      # always protect with try: except:
    sys.exit("Cannot open " + fname)   # exit with error message

freq = {}
for line in f: 
    words = line.split()               # file is like a sequence of lines
    for word in words:
        if word in freq:
            freq[word] += 1 
        else:
            freq[word] = 1             # split() returns the words in the line

f.close()                              # don't forget to close the file
for word in sorted(freq, key=freq.get, reverse=True):
    print(word, freq[word])
help(sorted)

### Writing text files

There is a low-level `write()` function, but it is easier to use `print()` with the `file` argument.
 

In [None]:
fname = "../tmp/two_powers.txt"
try:
    f = open(fname, "wt")
except Exception:
    sys.exit("Cannot create " + fname)
    
for n in range(10):
    print(n, ":", 2**n, file=f)
    
f.close() 

## Functions
Defining a function:


In [None]:
def times(x, y):
    result = x * y
    return result

Using a function:

In [None]:
r = times(3, 5)
r

### Function arguments

Calling functies can be in 2 ways:
- positional arguments
- keyword arguments

``` python
    def line_to(x_pos, y_pos):
        code block ...
```
Various ways to call:
``` python
    line_to(160, 5)                # positional arguments
    line_to(x_pos=160, y_pos=5)    # explicitly name (keyword) arguments
    line_to(y_pos=5, x_pos=160)    # not necessary to use them in order
```
Use good, descriptive names!   

### Default function arguments
In the definition of a function you can indicate default values for the arguments.
``` python
    def line_to(x_pos=0, y_pos=0):
        code block ...
```

Now you can call this function in various ways:
``` python
    line_to()             # x_pos=0, y_pos=0
    line_to(100)          # x_pos=100, y_pos=0
    line_to(100, 50)      # x_pos=100, y_pos=50
```

And with keyword arguments:
``` python
    line_to(x_pos=100)    # x_pos=100, y_pos=0
    line_to(y_pos=50)     # x_pos=0, y_pos=50
```

## Modules

Modules allow for:
- re-usage of functionality
- ease of maintenance

and usually contain:
- function definitions
- variables (often used as constants)
- class definitions
- *(can also contain runnable code)*

The name of the script is the module name with the suffix `.py`.

Example `mymod.py`:
``` python
    def times(n, arg):
        return n * str(arg)
```

### Using functions from a module
#### the `import` statement
``` python
    import mymod
    
    mymod.times(3, ['alpha', 'beta'])
```

#### alternative: the `from` ... `import` statement
``` python
    from mymod import times
    
    times(3, ['alpha', 'beta'])
```
In this case you only select the function(s) you need.

### How are modules found?

Python looks for a module in various locations:
- the directory where the top-level program lives
- each directory in the shell variable PYTHONPATH
- the default path (defined during the intallation of Python)

On a Windows system you set PYTHONPATH like this:
``` shell
set PYTHONPATH=C:\python36\lib;C:\Users\Joop\PythonLibs```
On UNIX / Linux this is:
``` shell
export PYTHONPATH=/usr/local/lib/python:/home/joop/pythonlibs```

# Quiz

# os.walk
Traverses a directory tree and for each subdirectory returns:

  1                    |  2                     |  3
-----------------------|------------------------|------------------------
the name of the folder | a list with subfolders | a list with other files

In [None]:
import os
from os.path import isdir, join, getsize

def dir_sizes(topdir):
    for thisdir, subdirs, nondirs in os.walk(topdir):
        total = 0

        for name in nondirs:
            path = join(thisdir, name)
            try:
                total += getsize(path)
            except OSError:
                pass

        if total:
            print('{:10} bytes used in {}'.format(total, thisdir))

In [None]:
topdir = '/usr/lib'
dir_sizes(topdir)

# Webscraping with BeautifulSoup
This is the page providing the info (and much more...)

In [None]:
iframe("https://www.nu.nl/brandstof", height=500)

##  The awesome `requests` module
An elegant and simple HTTP library
allowing you to send HTTP/1.1 requests.<br/>
Add headers, form data, multipart files, and parameters with simple Python dictionaries.

In [None]:
import requests

the_soup = requests.get("https://www.nu.nl/brandstof").text
print(the_soup[:1000])

## Determine what to look for...
Define a generator function that *scrapes* the daily average price per liter for fuel.

In [None]:
import bs4
import requests

def scrape():
    fuels = ('Euro95', 'Diesel', 'LPG')
    response = requests.get("https://www.nu.nl/brandstof")
    soup = bs4.BeautifulSoup(response.text, 'html.parser')
    targetcells = soup.find_all('td', string='GLA*')

    for i, td in enumerate(targetcells): 
        pricecell = td.parent.contents[3]
        yield fuels[i], pricecell.text.replace(',', '.')
    
        if i == 2: # only get Euro95, Diesel, and LPG
            break
            
# TEST CODE
for fueltype, price in scrape():
    print(fueltype, price)
    

# Relationele databases 
## SQLite3
Define a function that stores the results in an SQLite embedded database.

In [None]:
import sqlite3

def store(db, fuel, price):
    curs = db.cursor()
    try:
        curs.execute('INSERT INTO prices (fuel, price) VALUES (?, ?)', (fuel, price))
        db.commit()
    finally:
        curs.close() 

## ... and action!
Loop through today's results (*Euro95*, *Diesel* and *LPG*) and store the results in the database.

In [None]:
with sqlite3.connect('../data/fuelprices.db') as db:
    for fuel, price in scrape():
        print(fuel, price)
        store(db, fuel, price)

### Show what's in the database:

In [None]:
with sqlite3.connect('../data/fuelprices.db') as db:
    curs = db.cursor()
    curs.execute('SELECT * FROM prices;')
    for n, record in enumerate(curs):
        if n > 100000: break
        print(*record, sep=' | ')
    curs.close()

# Pandas &amp; matplotlib

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# df = pd.read_sql('SELECT * FROM prices;', con='sqlite:///../data/fuelprices.db', index_col='day')

df = pd.read_sql_table('prices', con='sqlite:///../data/fuelprices.db', index_col='day')
df.tail(7)

In [None]:
pd.__version__

In [None]:
for kind in ['Euro95', 'Diesel', 'LPG']:
    df[df.fuel == kind].plot(title=kind)

# Webserver with Flask

In [None]:
from flask import Flask, request, Response, g, render_template
import logging

app = Flask(__name__)
app.logger.setLevel(logging.WARN)

@app.route("/hello") 
def hello():
    return "Hello World!"

In [None]:
import threading
from werkzeug.serving import make_server

class FlaskServer(threading.Thread):

    def __init__(self, app, host, port):
        super().__init__()
        self.srv = make_server(host=host, port=port, app=app)
        
    def __del__(self):
        self.srv.shutdown()

    def run(self):
        self.srv.serve_forever()

    def shutdown(self):
        self.srv.shutdown()
        
    @classmethod
    def startup(cls, app, host='localhost', port=5000):
        server = cls(app, host, port)
        server.start()
        return server

In [None]:
try:
    server = FlaskServer.startup(app)
except OSError:
    print("Server (probably) already started")

In [None]:
iframe("http://localhost:5000/hello", height=160)

# Templating with Jinja2
Add a function to our app that uses **Jinja2** to render server variables.

In [None]:
@app.route('/vars/') 
def show_server_vars():
    env = request.environ
    return render_template('vars.html',
                           vars=[(k, env[k]) for k in sorted(env)])

In [None]:
def print_file(fname):
    with open(fname) as f: print(f.read())

print_file('templates/vars.html')

In [None]:
iframe("http://localhost:5000/vars/", height=600, width=1400)

## Now let's make a web page that renders the fuel price info
Some functions to connect to our SQLite database:

In [None]:
import sqlite3
from os.path import join


def get_row_as_dict(row, curs):
    return {col[0]:row[idx] for idx, col in enumerate(curs.description)}

def do_query(query, bindings=(), result_set_name='result set'):
    """
    Returns a single key dict, the value being a list of dicts. 
    Each dict in the list represents a row of database data.
    """
    db = get_db()
    curs = db.execute(query, bindings)    
    return { result_set_name: [ get_row_as_dict(row, curs) for row in curs ] }

def connect_db():
    rv = sqlite3.connect(os.path.join(app.root_path, '..', 'data', 'fuelprices.db'))
    rv.row_factory = sqlite3.Row
    return rv

def get_db():
    if not hasattr(g, 'sqlite_db'):
        g.sqlite_db = connect_db()
    return g.sqlite_db

In [None]:
@app.route('/')
def fuels():
    result = do_query('SELECT * FROM prices', result_set_name='fuels')
    x = {}
    for row in result['fuels']:
        day = row['day']
        fuel = row['fuel']
        price = row['price']
        try:
            x[day][fuel] = price
        except KeyError:
            x[day] = {'Datum': day, fuel: '{:.3f}'.format(price)}
    return render_template('fuels.html', 
                           fuels=[ x[day] for day in sorted(x, reverse=True) ])
        

In [None]:
print_file('templates/fuels.html')

In [None]:
iframe("http://localhost:5000/", height=500)

In [None]:
@app.route('/greeting/', methods=['GET', 'POST'])
def greeting():
    if request.method == 'POST':
        return render_template('greeting_post.html', name=request.form['name'])
    else:
        return render_template('greeting_get.html')

In [None]:
print_file('templates/greeting_get.html')

In [None]:
print_file('templates/greeting_post.html')

In [None]:
iframe("http://localhost:5000/greeting/", height=240)

In [None]:
from functools import wraps

def check_auth(username, password):
    """Checks for a valid username / password combination.
    """
    pw_store = {'at': 'geheim', 'guru1': 'ookgeheim'}
    return username in pw_store and password == pw_store[username]

def authenticate():
    """Sends a 401 response that enables basic auth"""
    return Response(
    'Could not verify your access level for that URL.\n'
    'You have to login with proper credentials', 401,
    {'WWW-Authenticate': 'Basic realm="Login Required"'})

def requires_auth(f):
    @wraps(f)
    def decorated(*args, **kwargs):
        auth = request.authorization
        if auth:
            app.logger.info('[' + auth.username + '] ' + request.url)
            if check_auth(auth.username, auth.password):
                return f(*args, **kwargs)
        return authenticate()
     
    return decorated

In [None]:
@app.route('/secret/')
@requires_auth
def secret():
    return render_template('secret.html')

In [None]:
iframe("http://localhost:5000/secret/")

In [None]:
@app.teardown_appcontext
def close_db(error):
    if hasattr(g, 'sqlite_db'):
        g.sqlite_db.close()

In [None]:
# server.shutdown()

# Log file analyse

In [None]:
import re
from subprocess import Popen, PIPE
from collections import Counter

def get_log_freqs(log_application='journalctl'):
    counter = Counter()

    # Apr 13 14:46:23 adnovo.local kernel: BIOS-e820: [mem 0x0000000000000000- ....
    c_re = re.compile(r'\w{3} \d{2} \d{2}:\d{2}:\d{2} \S* (\S*): .*')

    with Popen(log_application, shell=True, stdout=PIPE, universal_newlines=True) as log:
        for line in log.stdout:
            match = c_re.search(line)
            if match:
                who = match.group(1)
                who = who.split('[')[0]
                counter[who] += 1
        log.stdout.close()
    return counter

In [None]:
import os

def report_freqs(frequencies, key=None, reverse=False, number=None, width=None):
    if not frequencies:
        print('Nothing to report')
        return
    total = sum(frequencies.values())
    maxratio = max(frequencies.values()) / total
    
    maxval_w = max(map(len, map(str, frequencies.values())))
    maxkey_w = max(map(len, map(str, frequencies.keys())))

    if os.isatty(1):
        columns, rows = os.get_terminal_size()
    else:
        columns, rows = 80, 24

    if number is None:
        number = len(frequencies)
    if width:
        columns = width

    barspace = columns - maxkey_w - maxval_w - 13

    for i, k in enumerate(sorted(frequencies, key=key, reverse=reverse)):
        if i >= number:
            break
        ratio = frequencies[k] / total
        bar_w = round(ratio / maxratio * barspace)
        print('{who:{w1}}: {freq:{w2}} ({perc:5.2f}%) {bar}'.format(who=k, freq=frequencies[k], perc=100 * ratio,
                                                                    bar=bar_w * '\u25a0', w1=maxkey_w, w2=maxval_w))


In [None]:
freqs = get_log_freqs('journalctl')
print(freqs)

In [None]:
report_freqs(freqs, key=freqs.get, reverse=True, number=20)

## Python and Machine Learning

In [None]:
import seaborn as sns
iris = sns.load_dataset("iris")
iris

In [None]:
iris.describe()

In [None]:
sns.jointplot(x="sepal_length", y="petal_length", data=iris);

In [None]:
sns.pairplot(data=iris, hue="species");

In [None]:
# We're using all four measurements as inputs
# Note that scikit-learn expects each entry to be a list of values, e.g.,
# [ [val1, val2, val3],
#   [val1, val2, val3],
#   ... ]
# such that our input data set is represented as a list of lists

# We can extract the data in this format from pandas like this:
all_inputs = iris[['sepal_length', 'sepal_width',
                             'petal_length', 'petal_width']].values

# Similarly, we can extract the class labels
all_labels = iris['species'].values

# Make sure that you don't mix up the order of the entries
# all_inputs[5] inputs should correspond to the class in all_labels[5]

# Here's what a subset of our inputs looks like:
all_labels[:5], all_inputs[:5]

In [None]:
from sklearn.model_selection import train_test_split

training_inputs, testing_inputs, training_classes, testing_classes = train_test_split(
    all_inputs, all_labels, test_size=0.25, random_state=1)

In [None]:
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
qda = QuadraticDiscriminantAnalysis()
qda.fit(training_inputs, training_classes)
qda.score(testing_inputs, testing_classes)