# Comprehensions 

## `map` and `filter`
In the session on functional programming, we were defining lists. E.g. the result of calling `get_at()` on each element of `dna_list`:

In [17]:
def get_at(dna): 
    return (dna.count('A') + dna.count('T')) / len(dna) 

dna_list = ['TAGC', 'ACGTATGC', 'ATG', 'ACGGCTAG'] 
mp = map(get_at, dna_list)
list(mp)


[0.5, 0.5, 0.6666666666666666, 0.375]

The elements of `dna_list` which are at least 4 bases long: 

In [18]:
f = filter(lambda x: len(x) > 3, dna_list)
list(f)

['TAGC', 'ACGTATGC', 'ACGGCTAG']

Notice that when using `map`, the items in the result are values returned by the function cube. 
`filter` is only used to determine if the value i should be kept in the result.

## List comprehensions
Python has a special syntax for defining lists called **list comprehensions**. Here's the list of lengths of the DNA sequences in four ways:

In [35]:
dna_list = ['TAGC', 'ACGTATGC', 'ATG', 'ACGGCTAG'] 
# with a loop
l1 = []
for dna in dna_list:
    l1.append(len(dna))
    
# with a map
l2 = list(map(len, dna_list))

# as a list comprehension
l3 = [len(dna) for dna in dna_list]

# C style: Ugly
l4 = [0] * len(dna_list)
for i in range(len(dna_list)):
    print(i)
    l4[i] = len(dna_list[i])
    
assert l1 == l2
assert l1 == l3
assert l1 == l4
l3

0
1
2
3


[4, 8, 3, 8]

List comprehensions can be very concise. They can operate on any iterable type, not just a list - E.g. get a list of all FASTA headers:

In [36]:
# using sequences.fasta from previous exercise
[line[1:] for line in open('sequences.fasta') if line.startswith('>')]

['normal_sequence\n',
 'sequence_in_lowercase\n',
 'sequence_with_unknown_bases\n',
 'this header contains spaces\n',
 'this_header_is_very_long_and_should_be_truncated\n',
 'some_gene\n']

## Dict comprehensions

Just like we often write loops to create lists (which we can replace with map/filter or list comprehensions), we often write loops to create dicts:

In [37]:
d = {}
for dna in dna_list:
    d[dna] = get_at(dna)
d

{'ACGGCTAG': 0.375, 'ACGTATGC': 0.5, 'ATG': 0.6666666666666666, 'TAGC': 0.5}

Dict comprehensions allow us to express these more compactly:

In [38]:
d = { dna : get_at(dna) for dna in dna_list }
d

{'ACGGCTAG': 0.375, 'ACGTATGC': 0.5, 'ATG': 0.6666666666666666, 'TAGC': 0.5}

### Set comprehensions

Mentioned for completeness. Curly brackets like a dict comprehension, but single elements rather than pairs:

In [39]:
even_integers = {x for x in range(1000) if x % 2 == 0}
# same as...
#even_integers == set((x for x in range(1000) if x % 2 == 0))
even_integers

{0,
 2,
 4,
 6,
 8,
 10,
 12,
 14,
 16,
 18,
 20,
 22,
 24,
 26,
 28,
 30,
 32,
 34,
 36,
 38,
 40,
 42,
 44,
 46,
 48,
 50,
 52,
 54,
 56,
 58,
 60,
 62,
 64,
 66,
 68,
 70,
 72,
 74,
 76,
 78,
 80,
 82,
 84,
 86,
 88,
 90,
 92,
 94,
 96,
 98,
 100,
 102,
 104,
 106,
 108,
 110,
 112,
 114,
 116,
 118,
 120,
 122,
 124,
 126,
 128,
 130,
 132,
 134,
 136,
 138,
 140,
 142,
 144,
 146,
 148,
 150,
 152,
 154,
 156,
 158,
 160,
 162,
 164,
 166,
 168,
 170,
 172,
 174,
 176,
 178,
 180,
 182,
 184,
 186,
 188,
 190,
 192,
 194,
 196,
 198,
 200,
 202,
 204,
 206,
 208,
 210,
 212,
 214,
 216,
 218,
 220,
 222,
 224,
 226,
 228,
 230,
 232,
 234,
 236,
 238,
 240,
 242,
 244,
 246,
 248,
 250,
 252,
 254,
 256,
 258,
 260,
 262,
 264,
 266,
 268,
 270,
 272,
 274,
 276,
 278,
 280,
 282,
 284,
 286,
 288,
 290,
 292,
 294,
 296,
 298,
 300,
 302,
 304,
 306,
 308,
 310,
 312,
 314,
 316,
 318,
 320,
 322,
 324,
 326,
 328,
 330,
 332,
 334,
 336,
 338,
 340,
 342,
 344,
 346,
 348,
 350,

In [13]:
# ignore this cell, it's for loading custom js code
from IPython.core.display import Javascript
Javascript(filename="custom.js")

<IPython.core.display.Javascript object>

In [14]:
# ignore this cell, it's for loading custom css code
from IPython.core.display import HTML
HTML(filename="custom.css")