# Computational Skills for Biocuration

## Programming Skills with Python

**Malvika Sharan**

- email: malvika.sharan@embl.de
- Twitter: [@MalvikaSharan](https://twitter.com/MalvikaSharan)

### Introducing example

- **Uniprot example**: ["gene:tp53 AND reviewed:yes"](https://www.uniprot.org/uniprot/?query=gene%3ATP53+AND+reviewed%3Ayes&sort=score)

## Recap

### A few Python concepts we've seen in last webinars:

- Mathematical operations
- data type and type conversion
- `print( )`
- Variable assignement
    - Use gene name from UniProt example

#### Using methods on variables

- using methods (`.upper( )`, `.split( )` etc.)
    - The . here indicates that the method is part of the string s, and the brackets indicate that we want to execute it. 
    - Methods simply return a transformed version of the string, which you can then store in another variable.

#### Some notes on variables

- variables are mutable (their values can be overwritten)
- `len(data type)`
    - For any sequence data type, `len()` will tell us how many elements it has:
    - Strings have length, numbers don't

- creating empty list
- creating list with items
    - example: use UniProt entry names "O09185", "P79734", "P41685", "P04637"

- list method to append new item
    - example: use UniProt entry "P67938"
- getting length of the list

- accessing items using index
- accessing items by slicing

- sorting list alphanumerically

#### Exercise

After sorting retry accessing items we will notice that the order of the original list has changed. Remember data types are mutable, they are overwritten.

**Question:** What would you do to keep the original list but also have a list that contains items in a sorted manner? Start by creating the original list that you used one step earlier.

### Looping: `for` loop

1. use loop for printing each item of the list
1. iterate by `range()`
1. use range to access list item

#### Summary: `for` loop

- for loops can be used to repeat a block of code for each item in a list.
- `range( )` can be used to create a list of numbers, to execute the loop a given number of times, for e.g. to access items of a list and operate on them.

### String Formatting

Printing out multiple items with more information.

#### Exercise

You've learned how to use `for` loop using `range` of list's `len( )`, and now you know how to use string formatting.

- Combining these concepts can you create a `for` loop:
    - that iterates by the length of the list
    - uses a command written below to print each index and the corresponding item of your list?
        
        `f"Item in {index_number} index is {list_name[index_number]}"`

### Looking up items in multiple lists

- Comparing two lists
- Accessing items from multiple list using same index number in a `for` loop

Example: 

- If you used the UniProt gene list from the last exercise, make another list that comprises of organism names corresponding to each gene, i.e. "Chinese hamster", "Zebrafish", "Cat", "Human", "Zebu".
- How can you print each gene with its organism information? 
    - Use a print statement like this: "TP53 gene entry of human is P04637"

#### Summary: `for` loop

- for loops can be used to repeat a block of code for each item in a list.
- range() can be used to create a list of numbers, and to repeat the loop for each of those numbers, to execute the loop a given number of times.

### Dictionary: `dict( )`

#### working with dictionaries
 
- creating empty dictionary 
- creating dictionary with key-value pairs
- using key for looking up its value
- manipulating values

#### Example:

Use gene and organism information from the previous lists.

- Genes = 'O09185', 'P79734', 'P41685', 'P04637'
- Organism = "Chinese hamster", "Zebrafish", "Cat", "Human"

#### adding or removing items

- adding new key-value pair in dict
    - example: "Zebu" -> 'P67938'
- manipulating values
- deleting certain key-value pair

#### operating on dictionary items

- getting all the `.keys( )`
- getting all the `.values( )`
- getting all the `.items( )`

#### Summary: `Dict( )`

- Dictionaries are another object data type which stores key-value pairs.
- The `.keys( )`, `.values()` and `.items()` methods are used to get lists of the contents of a dictionary.

## Combining Python Concepts

### Looping on `dict( )` items

#### Questions

- How can I get all the key value pairs one by one?
- How can I operate on the values of each key?

## Exercise:

**Objective**: Understanding how to work with multiple dictionaries that have same set of keys but different information in their associated values.

- Create another dictionary that contains organism as key (like the last dictionary), but have UniProt Entry Names associated with them.
    - To continue using the same UniProt example, use this set of key value pairs:
        - Chinese hamster -> P53_CRIGR
        - Zebrafish -> P53_DANRE
        - Cat -> P53_FELCA
        - Human -> P53_HUMAN
- Use for loop to access keys from one dictionary and print information from both the dictionaries.
- Discuss where in your work such concepts will be useful.