# Computational Skills for Biocuration

## Programming Skills with Python

**Toby Hodges**

- email: toby.hodges@embl.de
- Twitter: [@tbyhdgs](https://twitter.com/tbyhdgs)
- GitHub: [tobyhodges](https://github.com/tobyhodges)

Alongside Malvika, I coordinate the [Bio-IT Project](https://bio-it.embl.de) at EMBL Heidelberg. You can find more training material, for Python and other tools, on our [Course Materials](https://bio-it.embl.de/course-materials/) webpage.

### Functions

We have started to combine multiple operations into more complex procedures. For example, if we want to find all the species of a particular genus in a list of names then create a comma-separated sting of these species names, we might write something like this:

In [1]:
species_list1 = ['Gorilla gorrila', 'Homo neandathalensis', 
                 'Pan troglodytes', 'Pongo pygmaeus', 
                 'Homo sapiens', 'Homo heidelbergensis']

In [2]:
True
False

False

In [3]:
if True:
    print("something was run")

something was run


In [5]:
if False:
    print("something was run")

In [7]:
'Homo heidelbergensis'.upper()

'HOMO HEIDELBERGENSIS'

In [8]:
species = 'Homo heidelbergensis'

In [11]:
species.upper()

'HOMO HEIDELBERGENSIS'

In [14]:
species.startswith('H')

True

In [15]:
species.startswith('X')

False

In [16]:
if species.startswith('Homo'):
    print(species)

Homo heidelbergensis


In [17]:
if species.startswith('Homo'):
    print(species)
print('always runs')

Homo heidelbergensis
always runs


In [19]:
if species.startswith('Gorilla'):
    print(species)
print('always runs')

always runs


In [20]:
genus = 'Homo'
if species.startswith(genus):
    print(species)

Homo heidelbergensis


In [22]:
species_list1

['Gorilla gorrila',
 'Homo neandathalensis',
 'Pan troglodytes',
 'Pongo pygmaeus',
 'Homo sapiens',
 'Homo heidelbergensis']

In [23]:
for species in species_list1:
    if species.startswith(genus):
        print(species)

Homo neandathalensis
Homo sapiens
Homo heidelbergensis


In [24]:
for species in species_list1:
    if species.startswith('Homo'):
        print(species)

Homo neandathalensis
Homo sapiens
Homo heidelbergensis


In [26]:
empty_list = []
print(empty_list)
empty_list.append('abc')
print(empty_list)
empty_list.append('def')
empty_list.append('ghi')
print(empty_list)

[]
['abc']
['abc', 'def', 'ghi']


In [29]:
homo_species = []
for species in species_list1:
    if species.startswith('Homo'):
        homo_species.append(species)
        print('        found a Homo species')
    print('    always runs')
print('finished!')
print(homo_species)

    always runs
        found a Homo species
    always runs
    always runs
    always runs
        found a Homo species
    always runs
        found a Homo species
    always runs
finished!
['Homo neandathalensis', 'Homo sapiens', 'Homo heidelbergensis']


In [30]:
'!!!'.join(homo_species)

'Homo neandathalensis!!!Homo sapiens!!!Homo heidelbergensis'

In [31]:
', '.join(homo_species)

'Homo neandathalensis, Homo sapiens, Homo heidelbergensis'

In [32]:
homo_species.join(',')

AttributeError: 'list' object has no attribute 'join'

In [33]:
genus = 'Homo'
homo_species = []
for species in species_list1:
    if species.startswith(genus):
        homo_species.append(species)
print(','.join(homo_species))

Homo neandathalensis,Homo sapiens,Homo heidelbergensis


What if we want to be able to execute this procedure on-demand, in multiple different locations in our code, with different genera, different lists of different species? We could copy & paste the block of code then make the edits...

In [34]:
species_list2 = ['Pieris rapae', 'Pygoscelis adeliae', 
                 'Aptenodytes forsteri', 'Equus equus', 
                 'Pygoscelis papua', 'Enhydra lutris', 
                 'Pygoscelis antarcticus']


In [38]:
genus = 'Pygoscelis'
pygocelis_species = []
for species in species_list2:
    if species.startswith(genus):
        pygocelis_species.append(species)
print(','.join(pygocelis_species))

Pygoscelis adeliae,Pygoscelis papua,Pygoscelis antarcticus


In [39]:
def find_penguins(list_of_species):
    genus = 'Pygoscelis'
    pygocelis_species = []
    for species in list_of_species:
        if species.startswith(genus):
            pygocelis_species.append(species)
    print(','.join(pygocelis_species))

In [40]:
find_penguins(species_list2)

Pygoscelis adeliae,Pygoscelis papua,Pygoscelis antarcticus


In [41]:
find_penguins(species_list1)




In [46]:
def find_genus(list_of_species, genus):
    species_belonging_to_genus = []
    for species in list_of_species:
        if species.startswith(genus):
            species_belonging_to_genus.append(species)
#    print(','.join(species_belonging_to_genus))
    return species_belonging_to_genus

In [47]:
find_genus(species_list2, 'Equus')

['Equus equus']

In [48]:
find_genus(species_list1, 'Homo')

['Homo neandathalensis', 'Homo sapiens', 'Homo heidelbergensis']

In [49]:
find_genus(species_list2, 'Pygoscelis')

['Pygoscelis adeliae', 'Pygoscelis papua', 'Pygoscelis antarcticus']

In [50]:
horses = find_genus(species_list2, 'Equus')

In [51]:
print(horses)
type(horses)

['Equus equus']


list

In [52]:
penguins = find_genus(species_list2, 'Pygoscelis')

In [53]:
penguins

['Pygoscelis adeliae', 'Pygoscelis papua', 'Pygoscelis antarcticus']

In [36]:
homo_species

['Homo neandathalensis', 'Homo sapiens', 'Homo heidelbergensis']

Functions offer us a better way.

In [54]:
help(len)

Help on built-in function len in module builtins:

len(obj, /)
    Return the number of items in a container.



In [55]:
help(find_genus)

Help on function find_genus in module __main__:

find_genus(list_of_species, genus)



In [60]:
def find_genus(list_of_species, genus):
    '''a function for finding species belonging to a particular genus.
    expects two arguments: list of species, which is a list of 
    species, and genus, which is the genus to look for.'''
    species_belonging_to_genus = []
    for species in list_of_species:
        if species.startswith(genus):
            species_belonging_to_genus.append(species)
#    print(','.join(species_belonging_to_genus))
    return species_belonging_to_genus

In [61]:
help(find_genus)

Help on function find_genus in module __main__:

find_genus(list_of_species, genus)
    a function for finding species belonging to a particular genus.
    expects two arguments: list of species, which is a list of 
    species, and genus, which is the genus to look for.



In [None]:
# use docstrings to tell your users (and to remind yourself):
#   - what a function does
#   - how to use it


What will the output be if the following block of code is run?

```Python
def greet(name):
    output = 'Hello, ' + name
    return output
    print('goodbye')
greeting = greet('Abby')
print(greeting)
```

a)
```
goodbye
Hello, Abby
```

b)
```
Hello, Abby
goodbye
```

c)
```
Hello, Abby
```

d)
```
goodbye
```

In [62]:
def greet(name):
    output = 'Hello, ' + name
    return output
    print('goodbye')
greeting = greet('Abby')
print(greeting)

Hello, Abby


In [None]:
def greet(name):
    output = 'Hello, ' + name
    if False:
        return output
    print('goodbye')
    if True:
        return output
greeting = greet('Abby')
print(greeting)