In [None]:
# http://teaching.bc.ic.ac.uk/msc/ipython-files/exercises.html

# Python Tutorial - Part 1

---


## Using the Tutorial

---

The tutorial provides runnable code to demonstrate the concepts being discussed. You should run these and also edit them as that will help with the understanding of the functionality.

There are also numerous exercises and these should be completed in a Linux environment, creating a script for each answer. 

---

## Tutorial Exercises

---

You should complete the tutorial exercises in Linux using a text editor to create Python scripts. The code cells within the Jupyter page are suitable for testing code but using a text editor will help develop your script writing skills.

---

## A Basic Program

Here is a basic Python program that we'll use to get started.

In [1]:

# Program to do the obvious

print ("Hello world!")    # Print a message


Hello world!


<i>Note: In version 2.7 this would be:

print "Hello world!"</i>

Each of the parts will be discussed in turn. 

---

## Comments and statements

Comments can be inserted into a program with the # symbol, and anything from the # to the end of the line is ignored . The only way to stretch comments over several lines is to use a # on each line.

Everything else is a Python statement. 

--- 

## Simple printing 

The print function outputs some information and in the above case it prints out the literal string "Hello world".

In Python 3 the print command was updated to a function, which is covered later. For the moment the important part of the change is that the text to be printed is placed within brackets, as shown. In version 2.7 no brackets are used.

---

## Running the program as a script

Type in the example program using a Linux text editor and save it. Gedit is a suitable editor for this.

The easiest way to run the program is to just type the following at the prompt: 

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<b>python progname</b>

If something goes wrong then you may get error messages, or you may get nothing.

When the file is executed Python first compiles it and then executes that compiled version. So after a short pause for compilation the program should run quite quickly.

Make sure your program works as a standalone script running in Linux before proceeding. 

---

# Python Operations and Assignment

---

 Python uses all the usual programming arithmetic operators: 
 
 <b>a = 1 + 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Add 1 and 2 and store in a<br>
 a = 3 – 4&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Subtract 4 from 3 and store in a<br>
 a = 5 \* 6&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Multiply 5 and 6<br>
 a = 16 / 8&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Divide 16 by 8 to give 2<br>
 a = 9 ** 10&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Nine to the power of 10<br>
 a = 5 % 2&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Remainder of 5 divided by 2</b>

Try each of the opertaors below to test them:


In [2]:

a = 1 + 2

print (a)


3


To assign values Python includes:

<b>a = b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Assign b to a<br>
a += b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Add b to a<br>
a -= b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Subtract b from a

Test the assignment operators below:


In [3]:

a = 5
b = a
print (a, b)


5 5


Note that when Python assigns a value with a = b it makes a copy of b and then assigns that to a. Therefore the next time you change b it will not alter a. 

---

# Integers and Floats

You need to be careful with arithmetic in Python 2.7 as numbers are automatically assigned the most appropriate type. For example:

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;10.5 will be assigned as a float

&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;10 will be assigned as an integer (int)

This can cause unexpected problems with division. Try the following:
    

In [4]:

x = 7
y = 2
z = x/y
print (z)


3.5


This will print 3.5 as expected as this tutorial is usning Python 3. However, if you use Python 2.7 it will print 3 as it will assume integer division. To obtain the correct result in Python 2.7 one or both of the variables x and y need to be specified as a float. To do this:


In [None]:

x = 7
y = 2
z = float(x)/float(y)
print (z)

    # or
    
x = float(7)
y = 2
z = x/y
print (z)


This is only necessary when using versions of Python pre version 3.

---




# Printing Embedded Variables

---

The previous print version printed out a specified string. It is also possible to print out a variable with the same method:

<b>print (variable)</b>

Note - for version 2.7: 

print variable


In [None]:

phrase = "Hello world"

print (phrase)


Note there are no quotes, otherwise the word "phrase" would be printed. It is also possible to print a combination of variables and text:

<b>print ("Hello", variable)</b>


In [None]:

name = "Anastasia"

print ("Hello", name)


A space will automatically be created between quoted text and variables.

This method only works for printing strings and to store a string as a variable a plus sign is required. Test the following code and edit it to print a string of multiple combined quoted text and variables as you did above:


In [None]:

name = "Anastasia"
phrase = "Hello " + name

print(phrase)


Also, when constructing a string, if integer or float variables are to be included they need to be cast to a string. This is achieved with the str command:


In [None]:

a = 5
b = 10
c = a * b

test_string = str(a) + " multiplied by " + str(b) + " = " + str(c)

print (test_string)



---

## Printing Without a Newline

The print command in both versions will automatically add a newline character to the end of the printed string. However, sometimes this is not desirable and can be prevented. In version 3 this is achieved by overriding the function variable for the end of string character, by default a newline, with a single space. At the end of the printed string a comma is followed by <b>end=" "</b>:


In [None]:

print ("This line does not end with a newline character. ", end=" ")

print ("The second print line does ...")

print ("As does the next ...")



---

## Exercise 1

---

Rewrite the Hello world program so that:

(a) the string is assigned to a variable before it is printed and<br>
(b) this variable is then printed without a newline,<br>
(c) the program combines a quoted string and a variable.

(Answers to all exercises are available <a href="http://teaching.bc.ic.ac.uk/msc/ipython-files/exercises.html">here</a>.)

---


# Range Command

---

Ranges are useful for generating a sequence of numbers as a list and these will be useful in understanding loops (for and while) and conditions (if/else). 

range(10) returns a list of numbers starting from zero counting up to (but not including) ten in steps of one.

range(2, 10) will do the same, only starting from two instead of zero.

range(-3, 3) starts from -3 and ends at 2 (not 3) in steps of one.

The step size can also be altered. 

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;range(0, 10, 2)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;is&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[0, 2, 4, 6, 8]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;range(10, 2, -1)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;is&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[10, 9, 8, 7, 6, 5, 4, 3]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;range(-5, 7, 3)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;is&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[-5, -2, 1, 4]<br></b>

The third argument to the range function alters the step size. Making the step size negative will result in a list that counts down.

---

# For Loop

---

A for loop will iterate over a block of code a certain number of defined times. This may be for each item in a list or a set number of times. For this a range can be used, which can first be assigned to a vraible and then iterated:





onetoten = range(1,11)
for count in onetoten:
    print (count)
    

Or it can be iterated directly:

In [None]:

 for count in range(1,11) :
        print (count )
        

Test both versions of the for loop above.

Note that the first line of the loop ends in a colon. This and the word <b>for</b> in the line tell the interpreter that this a for loop and the indented block below is the code to be executed repeatedly until the last element in the list is reached. 

Note that unlike other languages Python uses indentation to define code blocks. Many languages use curly braces, parentheses, etc. In Python the loop ends by checking the indentation level of lines. This is also true for if/else and while loops.

---

## Exercise 2

Create a script using the above 2 versions of the for loop code and test it.

Modify the code to calculate the factorial of 5.

(Answers to all exercises are available <a href="http://teaching.bc.ic.ac.uk/msc/ipython-files/exercises.html">here</a>.)


---


# Conditions

The next few structures rely on a test being true or false. In Python any non-zero number and non-empty string is counted as true. The number zero and the empty string are counted as false. 

---

# If/Else

---

As with other languages Python includes if/else statements. These are of the following form: 
    

In [10]:
for i in range(-3, 4):
    #print(i)
    if i > 0:
        print("the number" + str(i) + "is greater than 0")
    else:
        print("the number" + str(i) + "is less than 0")

the number-3is less than 0
the number-2is less than 0
the number-1is less than 0
the number0is less than 0
the number1is greater than 0
the number2is greater than 0
the number3is greater than 0



The "if" statement tests if "i" is greater than 0. If it is then the message "The number is greater than 0" is printed and the "else" statement is skipped. If the "if" statement is false the "else" part of the code block is entered and "The number is zero or less" is printed.

It is also possible to include more alternatives in a conditional statement: 


In [None]:

for i in range(-3, 4):
    if a > 2:
        print ("The number is greater than 2")
    elif a > 0:                 # If above fails, try this
        print ("The number is greater than 0")
    elif < 0:                   # If that fails, try this
        print ("The number is less than 0")
    else:                      # Now, everything has failed
        print ("The number is 0")


Note the elif for "else if".

Test the code again with different values for a so each print statement is called.

Other possible tests on numbers and strings are:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a == b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Is a equal to b? Numerical or string.
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Beware: Don't use the single = operator.<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a != b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Is a unequal to b?</b>

You can also use logical and, or and not:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a and b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Are a and b true?<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;a or b&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Is either a or b true?<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;not a&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Is a false</b>

The first of these other tests is shown below. Edit the code to test each of the other options and also combinations with elif:


In [None]:

a = 10
b = 5

if a == b:
    print ("a equals b")
else:
    print ("a does not equal b")
    


---

## Exercise 3

---

Write a script that uses a for loop to iterate over a range from 1 to 20 and prints out the numbers that are exactly divisible by 4 and/or by 5, with an appropriate message (i.e. “16 divisible by 4”). Your script should also print out the other numbers, again with an appropriate message (i.e. “3 not divisible by 4 or 5”).

You will find the modulus (%) suitable for this, which yields the remainder from a division (i.e. 5%2 = 1).


(Answers to all exercises are available <a href="http://teaching.bc.ic.ac.uk/msc/ipython-files/exercises.html">here</a>.)

---

# While

---

The while loop executes a block of code as long as a condition is true. This can be demonstarted with the following code, which increments a number from 0 and prints it <b>while</b> it is less than 10:


In [None]:

count = 0
while count < 10:   
    count += 1
    print (count) 
    


The while loop is iterated until count is 10.

The while loop version of the for loop if/else example above is:


In [None]:

i = -3
while i <= 3:
    if i > 0:
        print ("The number”, i, “ is greater than 0")
    else:
        print ("The number”, i, “ is zero or less")
        


Note that unlike other languages Python uses indentation to define code blocks. Many languages use curly braces, parentheses, etc. In Python the loop ends by checking the indentation level of lines. This is also true for if/else and for loops.

---

## Exercise 4


Write a while loop to calculate the factorial of 5.

(Answers to all exercises are available <a href="http://teaching.bc.ic.ac.uk/msc/ipython-files/exercises.html">here</a>.)

---



# Lists

---

Before looking at the for loop in more detail the Python list needs to be understood.

A list (or array) is a list of items (ie numbers and strings). The statements below assign two lists, one a six element list of strings and the second a four element list of numbers:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;numbers = [1, 2, 3, 4]</b><br>

It is possible to mix element types in a list:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mixed =  ['Jan', 2, 'Mar', 4, 'May', 'Jun']</b><br>

or to include a list as an element:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mixed =  ['Jan', 2, 'Mar', 4, 'May', ['Mon', 'Tues', 'Wed'], 'Jun']</b><br>

The list is accessed by using indices starting from 0, and square brackets are used to specify the index:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;mixed[2]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Returns Mar</b><br>

Elements of a list are accessed by the index in square brackets.

<b>list[2]</b> accesses the element at index 2, as demonstrated below:


In [None]:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
month = months[2]

print (month)

# Assigns ‘Mar’ to variable month and prints



In the same way that elements can be accessed they can also be changed. The following code demonstartes this:
    

In [None]:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
months[3] = 'Aug'

print (months)

# Now months is: ['Jan', 'Feb', ‘Mar, 'Aug', 'May', 'Jun']



The append method adds a single item to the end of the list:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;months.append(‘Jul’)</b><br>

The extend method adds items from another list to the end:


In [None]:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
months2 = ['Aug', 'Sep', 'Oct']
months.extend(months2)

print (months)



Insert inserts an item at a given index, and moves the remaining items to the right - <b>list.insert(index, item)</b>:
                                                                                    

In [None]:
 
months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
months.insert(3, 'Dec')

print (months)



You can also remove items from a list.

The <b>del</b> statement can be used to remove an individual item, or to remove all items identified by a slice. A slice identifies a starting and ending index:
    

In [11]:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']    
del months[2]

print (months)
            

['Jan', 'Feb', 'Apr', 'May', 'Jun']



The <b>pop</b> method removes an individual item and returns it. With no index specified it removes the last item in the list:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;month = months.pop()&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;last item<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;month = months.pop(0)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;first item</b><br>

The <b>del</b> statement and the <b>pop</b> method do pretty much the same thing, except that pop returns the removed item.

The <b>remove</b> method searches for an item, and removes the first matching item from the list:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;months.remove('Jul')</b><br>

The order of the list can be reversed or sorted:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;months.reverse()<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;months.sort()</b><br>

The length of a list can be returned:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;length = len(months)</b><br>

The presence of an item can be checked:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if('Jul' not in months):<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print ("In list")<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if('Jul' in months):<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print ("Not in list")</b><br><br>

The index position of the first occurrence of an item in a list can be returned:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;index = months.index('Jul')</b><br>

Test the various list options in the coding cell below:


---

# Strings as Lists

---

A string is essentially a list of characters and can be treated as such, for example in a for loop:


In [None]:

for c in "ATCGGCAATGCCTGGATA":
    print(c)



will iterate over the nucleotide string and print each character in turn

---

# List Slices

---

Slices are used to return part of a list. 

The slice operator is in the form list[first_index:following_index]. The slice goes from the first_index to the index before the following_index. If the first index is not specified the beginning of the list is assumed. If the following index is not specified the whole rest of the list is assumed. You can use both types of indexing: 	

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;list = ['zero','one','two','three','four','five']<br>
 
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;list[0:3]  ->  ['zero','one','two']<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;list[-4:-2]  ->  ['two','three']<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;list[-5:6]  ->  ['one','two','three','four','five']<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;list[3:]  ->  ['three','four','five']<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;list[:3]  ->  ['zero','one','two']</b>

---

# String slices

With Python a string is essentially a list of characters and so can be manipulated in the same way as a list. This means that some operators that work on lists can also be used on strings:

For example to substring or get string length:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Extract amino acids 3 to 8 from a protein sequence<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;protein = "MEFTIKRDYFITQLNDTL"<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Note that the third amino acid is index position 2<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seq = protein[2:8]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print (seq, “has length”, len(seq))</b><br>

Output:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;FTIKRD has length 6</b><br>


Lists can be reversed with the reverse function but this does not work on strings. In fact Python does not provide a built in string reversal function. However, there is the slice option to reverse a list, and hence a string. To reverse a sequence for example:
	
<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seq = "ATCGGATT"<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;seq = s[::-1]<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print (seq)</b><br>

Output:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;TTAGGCTA</b><br>

---

# Exercise 5

Write a script that creates a list and manipulates it using the method detailed above.
After you have implemented each method print the list out to ensure that the list has changed as expected.

<b>There is no model answer for this exercise.</b>

Write a script that iterates through the following nucleotide sequence and counts the number of cytosine and thymine:

 ACTCGGCTTAAATGTTATTCGTACCTACGCCTT

The individual counts should be printed.

(Answers to all exercises are available <a href="http://teaching.bc.ic.ac.uk/msc/ipython-files/exercises.html">here</a>.)


---


# Printing Lists

---

As mentioned previously, the range used in for loops is a list. Any list can therefore be used:


In [None]:

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
for month in months:
    print ("The month is:", month )



---

# Enumerating a List

---

When looping through a list you may want to know the index of the current item. You could use the <b>list.index(value)</b> syntax, but there is a simpler way, the <b>enumerate()</b> function. This tracks the index of each item as it loops through the list:


In [5]:
#for counter, value in enumerate(some_list):
#    print(counter, value)

months = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun']
for index, month in enumerate(months):
    print ("Month", index, "is:", month)


Month 0 is: Jan
Month 1 is: Feb
Month 2 is: Mar
Month 3 is: Apr
Month 4 is: May
Month 5 is: Jun


In [None]:
# how about this for a wee test test test

In [3]:
# enumerate friends
friends = ['swan', 'steph', 'nath', 'pete']
for counter, value in enumerate(friends):
    print(counter, value)
if counter < 5:
    print('go and get some more friends')

0 swan
1 steph
2 nath
3 pete
go and get some more friends


In [6]:
# enumerate friends
friends = ['swan', 'steph', 'nath', 'pete']
for counter, value in enumerate(friends):
    print(counter, value)
if counter < 5:
    print('go and get some more friends')

0 swan
1 steph
2 nath
3 pete
go and get some more friends



---

# Example to construct a position weight matrix (PWM)

A PWM stores the nucleotide count at each position in a multiple sequence alignment (MSA), for example representing a transcription factor binding motif. 

An example MSA is:

GGTAG
GGTAC
GATAG
GGTCC

To build a PWM of these sequences they first need to be stored in a list:

seq_list = [‘GGTAG’, ‘GGTAC’, ‘GATAG’, ‘GGTCC’]


The code to build the PWM is:


In [17]:

seq_list = ['GGTAG', 'GGTAC', 'GATAG', 'GGTCC']

# The PWM requires 4 lists, one for each nucleotide count. These lists need 
# to be the same length as the MSA sequences so this value is obtained     # from the first sequence in the list

n = len(seq_list[0])
n



5

In [11]:

# Next the 4 lists are initialised at the required length

A = [0] * n
C = [0] * n
G = [0] * n
T = [0] * n

# In the first for loop work through each sequence

for dna in seq_list:

# In the inner for loop use enumerate to obtain each nucleotide in the sequence and its position

    for i, nuc in enumerate(dna): 

# Check the nucleotide and increment the PWM count at the  # appropriate position	

        base = dna[i]
        if base == 'A':
            A[i] +=1
        elif base == 'C':
            C[i] +=1
        elif base == 'G':
            G[i] +=1  
        elif base == 'T':
            T[i] +=1

print(A)
print(C)
print(G)
print(T)



[0, 1, 0, 3, 0]
[0, 0, 0, 1, 2]
[4, 3, 0, 0, 2]
[0, 0, 4, 0, 0]


 ---
    
# Exercise 6

The above PWM code stores the matrix in 4 individual lists. Modify it so that when these lists are initialised they are stored in a single, nested list. The base counts should then be applied to this nested list in the inner for loop.

(Answers to all exercises are available <a href="http://teaching.bc.ic.ac.uk/msc/ipython-files/exercises.html">here</a>.)

---



---

# Tuples

---

Tuples are like lists but they can not be modified. Items have to be enclosed by parentheses instead of square brackets to create a tuple instead of a list.

In general all that can be done using tuples can be done with lists, but sometimes it is more secure to prevent internal changes:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;list_test = ['zero','one','two','three','four','five']<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tuple_test = ('zero','one','two','three','four','five‘)</b><br>

---

# Dictionaries

Dictionaries are the Python version of the hash, or associative array.

Dictionaries are like lists but instead of having numbers as their index they can have any value as an index (key) associated with the array element (value).

Dictionaries associate a key with a value, an example being associating codons with their amino acids:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'ttt' > 'F'<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'tta' > 'L'<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;etc</b><br>

Dictionaries enable this type of data to be stored and handled.

A dictionary is created in a similar manner to a list, but with key/value pairs and using curly braces:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;codons = {‘ttt’:’F’, ‘tta’:’L’, ‘gga’:’G’}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Note curly braces</b><br>

Or items can be added directly:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;codons['aac'] = 'N'&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Note square braces</b><br>

A key can be searched for:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (‘aac’) in codons:</b><br>

All keys or values can be retrieved as a list:


In [16]:

codons = {'ttt':'F', 'tta':'L', 'gga':'G'} 

#pass the incremented ked to dictionary to print all the values

keys = codons.keys()
for x in keys:
    print(codons[x])
    # print (x + ': '+ codons[x])
    

L
F
G



The values in a dictionary can also be retrieved if desired, although it is more usual to require the keys:



If you need to remove a key:value pair froma  dictionary you can use the <b>del</b> function:


In [37]:

codons = {'ttt':'F', 'tta':'L', 'gga':'G'} 

del codons['tta']

keys = codons.keys()
for x in keys:
    print (x)
    print (codons[x])
    

ttt
F
gga
G



---

## Exercise 7

---

a) Write a script in Linux that creates a dictionary for the months of the year. The keys should be the number of the month and the value the actual month. For example:

    1 -> January
    2 -> February
    etc …

Test your dictionary with a for loop that iterates over the keys and prints out the key and value.

b) Modify your script to prompt the user for a number and print out the month with an appropriate message.

(Answers to all exercises are available <a href="http://teaching.bc.ic.ac.uk/msc/ipython-files/exercises.html">here</a>.)

---


# File Handling

---

Basic file handling is very straightforward.

To write to a file:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;out_file = open("test.txt","w")&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Open a file to write to. If it doesn't exist it will be created<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;out_file.write("This Text is going to out file\nSome more text\n")&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Write text to the file<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;out_file.close()&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# Close the file</b><br>


To read a file the entire file contents and print them to screen: 

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in_file = open("test.txt","r")&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;# “r” is optional as it is the default<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;text = in_file.read()<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in_file.close()<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print (text)</b><br>

The file can be read one line at a time by storing it in a list with readlines and iterating over it in a for loop:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in_file = open(filename)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for line in in_file.readlines():<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print (line)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in_file.close() </b><br>

This is the most efficient way to read a file but it does read the entire file into memory so for very large files this could be a problem. If the file being read is very large then a better method is to treat the file as a list so you don't need to use readlines() inside a for loop, you just iterate over the file: 

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in_file = open(filename)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for line in in_file:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print (line)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;in_file.close()</b><br>

This is less efficient as it requires disc accesses to read the file but only one line is read into memory at a time.

Either method will read the file one line a time enabling you to process that line as needed.

Another way of working with files is the <b>with</b> statement and it is good practice to use this statement. With the <b>with</b> statement you get better syntax and exceptions handling.
 
It will also automatically close the file so it statement provides a way for ensuring that a clean-up is always used. The syntax is very straightforward:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with open(filename) as file:</b><br>

You can of course also loop over the file as before: 

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with open("newfile.txt") as in_file:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for line in in_file:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print (line)</b><br>

Note that all of the code related to the open file is indented within the <b> with open</b> code block.

There is also no need to close the file, it is done automatically.

---

# Writing Embedded Variables to a File

---

When writing text to a file it is constructed as a string, which means any variables need to be appended to quoted text with a plus sign. It also means that non string variables, such as integers, need to be cast as strings - <b>str(int value)</b>:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;name = "Dave"<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;age = 20<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;out_file = open("test.txt","w")<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;out_file.write("Student: Name = " + name + " Age = " + str(age) = "\n")<br><br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;out_file.close()</b><br><br>


Note that spaces are not automatically added after each quoted string and a newline is not automatically added. If required both need to be included in the write statement. 



---

# Strip

---

Python provides a string method called <b>strip()</b> which will remove whitespace, including newlines, from both ends of a string. It also has variants which can strip one end only called <b>rstrip</b> and <b>lstrip</b> too. The <b>rstrip</b> method can be used to remove newlines:


<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with open(filename) as in_file:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for line in in_file:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;strip_line = line.rstrip()<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print 
(strip_line)</b><br>

Or more simply:

<b>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;with open(filename) as in_file:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;for line in in_file:<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;print (line.rstrip())</b><br>

Note that an alternative is line = line[:-1]

---

## Exercise 8

---

a) Write a script that opens a text file and prints the contents to screen (use the with statement). You can use the entamoeba.txt file for this exercise available on the <a href="http://teaching.bc.ic.ac.uk/msc/ipython-files/exercises.html">Exercise Answers</a> page.

Try using the various options to read the contents of the file.

b) Modify the script to write the contents of a file to a second file. Do this twice, the first time as an exact copy and the second removing all of the newlines.

(Answers to all exercises are available <a href="http://teaching.bc.ic.ac.uk/msc/ipython-files/exercises.html">here</a>.)

---

<b>The second part of the Python tutorial is available <a href="PythonTutorial_Pt2.ipynb">here</a>.</b><br><br>



In [None]:
"""

#sys.argv[0] is the read.py file
#sys.argv[1] is the argument passed to read.py
#we set the_file variable to the arg passed to read.py
the_file = sys.argv[1]
print(the_file)


# iterates through file outputing each line to terminal
in_file = open(the_file)
for line in in_file:
	print (line)
in_file.close()

"""