## `lab5`—Loops and Function Logic

❖ Objectives

-   More complicated logic in a function
-   Use loops to carry out repetitive tasks.
-   Load and save data to and from disk.

### Data Permutation

Let us do some more work with functions to get used to their applications.

Imagine performing more complicated tasks like permuting a set of variables (which is useful in tensor algebra, for instance<sup>[[MathWorld](http://mathworld.wolfram.com/Tensor.html)]</sup>):

<img src="./img/permute.png" width="50%;"/>

-   Write a function `permute` which carries out this operation.  (It really doesn't need to be more complicated than the `swap` function above, just with more variables.)

In [None]:
# define your function here
def permute(i,j,k,l,m):
    # insert your code here
    tmp = m
    m = l
    l = k
    k = j
    j = i
    i = tmp
    return i,j,k,l,m    #do not change this line

<div class="alert alert-warning">
The code stub above has a `pass` statement.  This is a placeholder which does nothing, and you should replace it with either a `return` statement or another block of appropriate code.  Notebooks that you submit should never leave `pass` statements in.
</div>

In [None]:
# it should pass this test---do NOT edit this cell
i,j,k,l,m = 1,2,4,8,16
assert (m,i,j,k,l) == permute(i,j,k,l,m)
print('Success!')

### Loops

Although a few varieties of loops are supported in Python (`do`, `while`), the loop *par excellence* is `for`.  `for` loops show up in solving a wide variety of engineering problems, ranging from structural analysis to equations of motion for molecules or satellites.

As you have seen in the lecture, `for` yields each element of an existing `list` of data in turn.

In [None]:
colors = ['teal', 'mauve', 'taupe', 'ecru']
for color in colors:
    print('a can of %s paint'%color)

In [None]:
print('index\trunning total')
counter_sum = 0
for counter in range(20):
    counter_sum += counter
    print('%5d %15d'%(counter, counter_sum))     # incidentally, note the use of a specified field width to right-align the numbers

In [None]:
for letter in 'string':
    print(letter)

Note that counter continues to exist after the loop, containing its final assigned value of 19.

-   Write a function `fragment` which accepts a string `string`.  `fragment` should return the string converted into a list of letters.  That is,
        
        fragment('capsaicin') == ['c', 'a', 'p', 's', 'a', 'i', 'c', 'i', 'n']
    
    A `for` loop and the `list` `append` method may be useful in accomplishing this.

In [None]:
# define your function here
def fragment('''(delete this string and replace it with the incoming variables)'''):
    return_list = []
    pass # you can always delete a `pass` statement, since it does nothing

In [None]:
# it should pass this test---do NOT edit this cell
# test for specified case
assert fragment('salamander') == ['s', 'a', 'l', 'a', 'm', 'a', 'n', 'd', 'e', 'r']

###  Application I:  Counting String Components

A `for` loop printing out the alphabet could look like this (using a *string constant* `ascii_lowercase`<sup>[[`string`](https://docs.python.org/2/library/string.html#string-constants)]</sup>):

In [None]:
from string import ascii_lowercase as alphabet  # a useful predefined string 'abcde...'
print(alphabet)

for letter in alphabet:
    print('letter:\t%s'%letter)

-   Write a function `count_letters` which accepts a string `text`.  `count_letters` should contain a loop over the alphabet and count the number of times each letter appears in the string `text` using the string method `count`.  The string `text` should be converted to lower-case.  This function should `print` the results, *not* `return` them.

In [None]:
# a refresher on how to use string.count
"'Impossible' n'est pas français. (Napoléon Bonaparte)".count('n')

In [None]:
# another refresher---note that these are different!
"'Impossible' n'est pas français. (Napoléon Bonaparte)".lower().count('n')

In [None]:
# define your function here
def count_letters('''(delete this string and replace it with the incoming variables)'''):
    # write a loop statement here
        # calculate the count here
        print('letter:\t%s @ %s'%(letter, count))

In [None]:
# run test
count_letters('To thy Happy Children of the Future: Those of the Past Send Greetings')

Let's test `count_letters` on a couple of strings.  The `true_pangram` string is correct, and you should double-check your results closely.  Case matters!

In [None]:
true_pangram = """
This pangram contains four a’s, one b, two c’s, one d, thirty e’s, six f’s, five g’s, seven h’s, eleven i’s, one j,
one k, two l’s, two m’s, eighteen n’s, fifteen o’s, two p’s, one q, five r’s, twenty-seven s’s, eighteen t’s, two u’s,
seven v’s, eight w’s, two x’s, three y’s, & one z.
""".strip()

# you can edit this cell to use count_letters

In [None]:
false_pangram = """
This pangram contains four As, one B, two Cs, one D, thirty Es, five Fs, five Gs, seven Hs, eleven Is, one J,
one K, two Ls, two Ms, eighteen Ns, fifteen Os, two Ps, one Q, five Rs, twenty-seven Ss, eighteen Ts, two Us,
seven Vs, eight Ws, two Xs, three Ys, & one Z.
""".strip()

# you can edit this cell to use count_letters

-   Which letters, if any, have different counts from the claimed values?  Answer as a `tuple` in alphabetical order:  for example, `a,b,o,x`.

### Application II:  Chemical Symbol Translator Revisited

Below is a function what accepts as input a chemical element symbol and returns the corresponding name of the element.  You will now use this function for the application next.

In [None]:
def symbol2name(symbol):
    if 'H' in symbol:
        if 'He' in symbol:
            return 'Helium'
        else:
            return 'Hydrogen'
    elif 'B' in symbol:
        if 'Be' in symbol:
            return 'Beryllium'
        else:
            return 'Boron'
    elif 'Li' in symbol:
        return 'Lithium'
    elif 'C' in symbol:
        return 'Carbon'
    elif 'N' in symbol:
        if 'Ne' in symbol:
            return None
        else:
            return 'Nitrogen'
    elif 'O' in symbol:
        return 'Oxygen'
    else:
        return None

You will write a `while` loop around `symbol2name` which will do the following.  (No change is necessary to the function `symbol2name`.)

1.  Check whether we should end (the `while` statement).  This is done by using a *flag*, in this case called `go_flag`.  A flag is a common way of signalling to a loop that the code is ready to proceed.  As long as the flag is `True`, the code runs; when the flag becomes `False`, the loop stops.

2.  Ask the user for a symbol to check using `input`.  (Remember that a `In [*]:` at the left of the cell block often means Python is waiting for input.)

3.  If the symbol input is `'x'`, then you should tell the program to end the loop the next time around.  Otherwise, you should have the program call `symbol2name` on the input string and output the result.

An outline:

In [None]:
# compose your code here
go_flag = True
while go_flag:
    # ask for input symbol
    #(your code here)

    # check input symbol for 'x'
    if ('''your code here'''):
        # change go_flag to the stop condition
        #(your code here)
    else:
        # change symbol to name
        #(your code here)

### Getting Data In and Out of File

A program that you write is like machine that takes one set of inputs and transforms them into a set of outputs, whether as text data, file manipulations, or raw binary data in memory.

Effective use of input and output is often the only way to get data into your program or out again.  Formally, there are a couple of ways you can handle this:

-   You can explicitly `open`, `read` from or `write` to, and finally `close` the file.  This is the classic model of working with files<sup>[[docs](https://docs.python.org/2/tutorial/inputoutput.html#reading-and-writing-files)]</sup> and transfers well across all computing languages.
    
    You first must `open` a file.  You should additionally specify the *mode* you are opening the file in—this tells Python what you intend to do with the file and its contents.  You will commonly only need to `'r'`ead a file (as input) or `'w'`rite a file (as output).


In [None]:
dataFileName = './data/lists.txt'  # . means the current directory, then a directory called data/ contains the lists.txt file
dataFile = open(dataFileName, 'r')
print('Using file %s'%dataFileName)

After `open`ing the file, you are free to read the contents using a number of methods.  `read` will give you the complete contents of the file as a string; you may also find some utility in using `readline`, which will be used later in this course.

In [None]:
data = dataFile.read().splitlines()
print(data)

When you are done, don't forget to `close` the file.  If you neglect this step, the file may lose data and you are wasting system memory by keeping it open.

In [None]:
dataFile.close()

This cycle of opening a file, reading data from it, and closing it is one you will repeat every time you need data.  (Many times, however, these steps may be hidden by a function which reads the data for you.)

There is an alternative way to do this as well, using a *context*.  In this case, you only have to specify the opening and reading/writeing actions, and if anything fails for any reason, Python will close the file for you automatically.  The equivalent to the foregoing code may be written concisely as:

In [None]:
with open('./data/lists.txt', 'r') as dataFile:
    print('Using file %s'%dataFileName)
    data = dataFile.readlines()
print(data)

You may notice the `readlines` method above.  This is used to extract the information (the entire contents of the file) into a persistent variable so you can use it after you close the file.

In [None]:
for line in data:
    print(line.strip())

-   Write a block of code which loads the file `./data/skyscrapers.txt`.  This file contains the name of a skyscraper and its corresponding height in meters.  You should load the data in the file into a variable called `height_data`, `split` it by commas, and convert the second piece of each line into a floating-point number.  Add these numbers together and store the result in the variable `sum_heights`.

<!--
with open('./data/skyscrapers.txt') as datafile:
    data = datafile.readlines()

sum_heights = 0
for line in data:
    print(line.split(','))
    sum_heights += float(line.split(',')[1])
-->

In [None]:
# compose your code here
# load the file, get the data out, close the file
# loop over each line in the file
    # split the line by the comma into two pieces (use the split method)
    # convert the second part of the line into a float
    # sum the total of all heights
# sum_heights should contain your result


In [None]:
# it should pass this test---do NOT edit this cell
from numpy import isclose
assert isclose(sum_heights, 10204.3)

### Application III:  DNA and RNA Sequencing

![](./img/DNA_Translation_and_Codons.jpg)

A DNA sequence is composed of adenine (`'A'`), guanine (`'G'`), cytosine (`'C'`), and thymine (`'T'`) nucleobases.  During the process of gene expression, RNA reads off each nucleobase with its opposite.  Thus an RNA sequence is a string containing uracil (`'U'`), cytosine (`'C'`), guanine (`'G'`), and adenine (`'A'`) bases<sup>[[Wikipedia](https://en.wikipedia.org/wiki/RNA#Types_of_RNA)]</sup>.  (Note that U pairs with A as RNA does not contain thymine.)

| Symbol | Name     | Complementary Base |
|--------|----------|--------------------|
| A  | adenine  | T (DNA); U (RNA)   |
| C  | cytosine | G                  |
| G  | guanine  | C                  |
| T  | thymine  | A                  |
| U  | uracil   | A                  |

This multi-part problem will lead you through processing DNA sequence data through transcription into RNA and then examining sequences.

#### Complementing DNA

-   Write a function `dna2rna` which accepts a string `seq_dna` representing a template strand of DNA.  `dna2rna` should `return` a string `seq_rna` which should contain the DNA strand transcribed to its RNA complement.  That is, the input `'ACGT'` should return `'UGCA'`.  The function should not be case sensitive with respect to input, but should return an upper-case transcription.
    
    You may use any means to accomplish this, but you may find the [`replace` function](http://www.tutorialspoint.com/python/string_replace.htm) useful.

In [None]:
# define your function here
def dna2rna('''(delete this string and replace it with the incoming variables)'''):
    pass # you can always delete a `pass` statement, since it does nothing

In [None]:
# it should pass this test---do NOT edit this cell
# test for simple case
assert dna2rna('CGAT') == 'GCUA'

In [None]:
# it should pass this test---do NOT edit this cell
# test for case insensitivity
assert dna2rna('CgATaaTTgcGGAttCAGatcGAaacGcg') == 'GCUAUUAACGCCUAAGUCUAGCUUUGCGC'

In the directory `./data/` there is a file called `dna_seq.dat` containing many lines of DNA sequences.

-   Write a function `read_and_complement_dna` which accepts a filename as a string `dna_file`.  `read_and_complement_dna` then loads the data in the file, converts each line into its RNA complement using `dna2rna`, and `return`s the resulting string.

In [None]:
# define your function here
def read_and_complement_dna('''(delete this string and replace it with the incoming variables)'''):
    # load the file, get the data out, close the file
    result_string = ""
    # loop over each line in the file
        # convert the string to its RNA complement
        converted_string = # your code here
        # append the string to the overall result string
        result_string += converted_string
    # return the result string
    pass # you can always delete a `pass` statement, since it does nothing

In [None]:
# it should pass this test---do NOT edit this cell
# test for simple case
assert read_and_complement_dna('./data/dna_seq.dat') == 'AUGCCGCAAUCUGUUCACGCACUCAUGUGU'

#### Mapping RNA to Amino Acids (Codons)

One of the major functions of RNA in the body is as “messenger RNA”, which contains groups of three-letter *codons* mapping to amino acids expressed in the cell.  Thus if we find `CUU CAG` in mRNA, we anticipate that the cell will create leucine and glutamine, written `LQ`.  The full table of codons follows.

<table class="wikitable">
<h4>Standard genetic code<sup>[[Wikipedia](https://en.wikipedia.org/wiki/Genetic_code#RNA_codon_table)]</sup></h4>
<tr>
<th rowspan="2">1st<br />
base</th>
<th colspan="8">2nd base</th>
<th rowspan="2">3rd<br />
base</th>
</tr>
<tr>
<th colspan="2">U</th>
<th colspan="2">C</th>
<th colspan="2">A</th>
<th colspan="2">G</th>
</tr>
<tr>
<th rowspan="4">U</th>
<td>UUU</td>
<td rowspan="2" style="background-color:#ffe75f">(Phe/F) <a href="/wiki/Phenylalanine" title="Phenylalanine">Phenylalanine</a></td>
<td>UCU</td>
<td rowspan="4" style="background-color:#b3dec0">(Ser/S) <a href="/wiki/Serine" title="Serine">Serine</a></td>
<td>UAU</td>
<td rowspan="2" style="background-color:#b3dec0">(Tyr/Y) <a href="/wiki/Tyrosine" title="Tyrosine">Tyrosine</a></td>
<td>UGU</td>
<td rowspan="2" style="background-color:#b3dec0">(Cys/C) <a href="/wiki/Cysteine" title="Cysteine">Cysteine</a></td>
<th>U</th>
</tr>
<tr>
<td>UUC</td>
<td>UCC</td>
<td>UAC</td>
<td>UGC</td>
<th>C</th>
</tr>
<tr>
<td>UUA</td>
<td rowspan="6" style="background-color:#ffe75f">(Leu/L) <a href="/wiki/Leucine" title="Leucine">Leucine</a></td>
<td>UCA</td>
<td>UAA</td>
<td style="background-color:#B0B0B0;"><a href="/wiki/Stop_codon" title="Stop codon">Stop</a> (<i>Ochre</i>)</td>
<td>UGA</td>
<td style="background-color:#B0B0B0;"><a href="/wiki/Stop_codon" title="Stop codon">Stop</a> (<i>Opal</i>)</td>
<th>A</th>
</tr>
<tr>
<td>UUG</td>
<td>UCG</td>
<td>UAG</td>
<td style="background-color:#B0B0B0;"><a href="/wiki/Stop_codon" title="Stop codon">Stop</a> (<i>Amber</i>)</td>
<td>UGG</td>
<td style="background-color:#ffe75f;">(Trp/W) <a href="/wiki/Tryptophan" title="Tryptophan">Tryptophan</a>&#160;&#160;&#160;&#160;</td>
<th>G</th>
</tr>
<tr>
<th rowspan="4">C</th>
<td>CUU</td>
<td>CCU</td>
<td rowspan="4" style="background-color:#ffe75f">(Pro/P) <a href="/wiki/Proline" title="Proline">Proline</a></td>
<td>CAU</td>
<td rowspan="2" style="background-color:#bbbfe0">(His/H) <a href="/wiki/Histidine" title="Histidine">Histidine</a></td>
<td>CGU</td>
<td rowspan="4" style="background-color:#bbbfe0">(Arg/R) <a href="/wiki/Arginine" title="Arginine">Arginine</a></td>
<th>U</th>
</tr>
<tr>
<td>CUC</td>
<td>CCC</td>
<td>CAC</td>
<td>CGC</td>
<th>C</th>
</tr>
<tr>
<td>CUA</td>
<td>CCA</td>
<td>CAA</td>
<td rowspan="2" style="background-color:#b3dec0">(Gln/Q) <a href="/wiki/Glutamine" title="Glutamine">Glutamine</a></td>
<td>CGA</td>
<th>A</th>
</tr>
<tr>
<td>CUG</td>
<td>CCG</td>
<td>CAG</td>
<td>CGG</td>
<th>G</th>
</tr>
<tr>
<th rowspan="4">A</th>
<td>AUU</td>
<td rowspan="3" style="background-color:#ffe75f">(Ile/I) <a href="/wiki/Isoleucine" title="Isoleucine">Isoleucine</a></td>
<td>ACU</td>
<td rowspan="4" style="background-color:#b3dec0">(Thr/T) <a href="/wiki/Threonine" title="Threonine">Threonine</a>&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;</td>
<td>AAU</td>
<td rowspan="2" style="background-color:#b3dec0">(Asn/N) <a href="/wiki/Asparagine" title="Asparagine">Asparagine</a></td>
<td>AGU</td>
<td rowspan="2" style="background-color:#b3dec0">(Ser/S) <a href="/wiki/Serine" title="Serine">Serine</a></td>
<th>U</th>
</tr>
<tr>
<td>AUC</td>
<td>ACC</td>
<td>AAC</td>
<td>AGC</td>
<th>C</th>
</tr>
<tr>
<td>AUA</td>
<td>ACA</td>
<td>AAA</td>
<td rowspan="2" style="background-color:#bbbfe0">(Lys/K) <a href="/wiki/Lysine" title="Lysine">Lysine</a></td>
<td>AGA</td>
<td rowspan="2" style="background-color:#bbbfe0">(Arg/R) <a href="/wiki/Arginine" title="Arginine">Arginine</a></td>
<th>A</th>
</tr>
<tr>
<td>AUG<sup class="reference" id="ref_methionineA"><a href="#endnote_methionineA">[A]</a></sup></td>
<td style="background-color:#ffe75f;">(Met/M) <a href="/wiki/Methionine" title="Methionine">Methionine</a></td>
<td>ACG</td>
<td>AAG</td>
<td>AGG</td>
<th>G</th>
</tr>
<tr>
<th rowspan="4">G</th>
<td>GUU</td>
<td rowspan="4" style="background-color:#ffe75f">(Val/V) <a href="/wiki/Valine" title="Valine">Valine</a></td>
<td>GCU</td>
<td rowspan="4" style="background-color:#ffe75f">(Ala/A) <a href="/wiki/Alanine" title="Alanine">Alanine</a></td>
<td>GAU</td>
<td rowspan="2" style="background-color:#f8b7d3">(Asp/D) <a href="/wiki/Aspartic_acid" title="Aspartic acid">Aspartic acid</a></td>
<td>GGU</td>
<td rowspan="4" style="background-color:#ffe75f">(Gly/G) <a href="/wiki/Glycine" title="Glycine">Glycine</a></td>
<th>U</th>
</tr>
<tr>
<td>GUC</td>
<td>GCC</td>
<td>GAC</td>
<td>GGC</td>
<th>C</th>
</tr>
<tr>
<td>GUA</td>
<td>GCA</td>
<td>GAA</td>
<td rowspan="2" style="background-color:#f8b7d3">(Glu/E) <a href="/wiki/Glutamic_acid" title="Glutamic acid">Glutamic acid</a></td>
<td>GGA</td>
<th>A</th>
</tr>
<tr>
<td>GUG</td>
<td>GCG</td>
<td>GAG</td>
<td>GGG</td>
<th>G</th>
</tr>
</table>

We provide the function `rna2amino` which accepts a three-letter codon and returns the corresponding amino acid.  This uses a `dict`, a data type you haven't encountered yet but which is easy to use.

In [None]:
genetic_code = {
    'UUU': 'F', 'UUC': 'F', 'UUA': 'L', 'UUG': 'L',        'CUU': 'L', 'CUC': 'L', 'CUA': 'L', 'CUG': 'L',
    'AUU': 'I', 'AUC': 'I', 'AUA': 'I', 'AUG': 'M',        'GUU': 'V', 'GUC': 'V', 'GUA': 'V', 'GUG': 'V',
    
    'UCU': 'S', 'UCC': 'S', 'UCA': 'S', 'UCG': 'S',        'CCU': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
    'ACU': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',        'GCU': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
    
    'UAU': 'Y', 'UAC': 'Y', 'UAA': '*', 'UAG': '*',        'CAU': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q',
    'AAU': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K',        'GAU': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E',
    
    'UGU': 'C', 'UGC': 'C', 'UGA': '*', 'UGG': 'W',        'CGU': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R',
    'AGU': 'S', 'AGC': 'S', 'AGA': 'R', 'AGG': 'R',        'GGU': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G',
}
allowed_codons = set('ACGU')

def rna2amino(codon):
    if len(codon) != 3: return None
    codon = codon.upper()
    if (set(codon) > allowed_codons): return None
    return genetic_code[codon]

-   We next need a function `sequence_string` which accepts a string `rna_seq` containing RNA sequence data and maps it to amino acids.  This requires that you:
    
    1.  Break each string into three-letter chunks.
    2.  For each chunk, map it to a valid amino acid codon according to the table below.  (We provide code for this step.)
    3.  Return the result.

The tricky part is figuring out how to get a string chopped into three-letter chunks.  (This is harder than it seems at first.)  There are many ways you can think of to do this.  One possibility:

The first part is figuring out how to get a string chopped into three-letter chunks. This is harder than it seems at first. Please write a function `parse_rna_seq` that takes a RNA sequence rna_seq as input and outputs a list of its 3-letter codon chunks of it:

In [None]:
def parse_rna_seq('''(delete this string and replace it with the incoming variables)'''):
    # insert your code here
    pass # you can always delete a `pass` statement, since it does nothing

In [None]:
example_string = 'abcdefghijklmnopqrstuvwxyz'
assert parse_rna_seq(example_string) == ['abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx']

Now write the function `sequence_string` that accepts a string `rna_seq` containing RNA sequence data and maps it to amino acids

In [None]:
# define your function here
def sequence_string('''(delete this string and replace it with the incoming variables)'''):
    # divide the string into three-letter chunks
    # map each three-letter codon to a protein
    # append the protein to the result string
    # return the result string
    pass # you can always delete a `pass` statement, since it does nothing

In [None]:
# it should pass this test---do NOT edit this cell
# test for simple case
assert sequence_string('ACUGAU') == 'TD'

In [None]:
# it should pass this test---do NOT edit this cell
# test for a more complicated case
assert sequence_string('AUCACUGUAGUAGUAGCUGGAAAGAGAAAUCUGUGACUCCAAUUAGCC') == 'ITVVVAGKRNL*LQLA'

In [None]:
# it should pass this test---do NOT edit this cell
# test for failure case
try:
    sequence_string('ASDF')
except KeyError:
    True
else:
    False

Finally, we are interested in loading a file of DNA sequence data, complementing it, and mapping the resulting RNA to amino acids.  This requires that you:

1.  Load a file.
2.  Use your function `sequence_string` to convert each line of the file to its protein expression string.
3.  Return the resulting string.

-   Write a function `sequence_file` which accepts a string `dna_seq_file`.  This function will `return` (NOT write to disk) a string containing the amino acids described in the file `dna_seq_file`, including line breaks.

In [None]:
# define your function here
def sequence_file('''(delete this string and replace it with the incoming variables)'''):
    # use read_and_complement_dna to get the dna complement as rna
    # use sequence_string to convert rna to amino acid sequence
    pass # you can always delete a `pass` statement, since it does nothing

In [None]:
# it should pass this test---do NOT edit this cell
# test for simple case
test_data_file = './data/dna_test.dat'
assert sequence_file(test_data_file) == '''
YKRPPPPGDRGPPFSRRSRRRPKKRGRPGAPRPAPGGGTNSVLKMLGMERGQPCVVFGWPGWVAVCMISLTLLKAK*RGAHDLLLRGGLFARPHVLPVEFARGFHMTPPLLALQEAQPSPPKTGAASADTPPHLVRGRPKLGWQWQKVHML*DLLHLVRM
'''.strip()

This has quickly become a sophisticated workflow, and it is easy to lose track of both what you're doing and what you've already written!  This diagram shows how the data pipeline works to process data from disk:

![](./img/flowchart.png)