# While Loops #

While loops are one of Python's two essential kinds of loops. We have already seen for loops, which allow you to repeat an operation *for item in iterable object:*  While loops, on the other hand, are indefinite loops that terminate when a condition is reached rather than when an object's items are consumed.

While loops are useful for problems where you don't know how many iterations will be needed. 

You might not strictly "know" that with a for loop, either, but there's a countable object (string, list, file, etc) that "knows" it for you based on how many individual items (characters, items, lines) that object contains.

A generic while loop might go like this:

```i = 0
while i < 10:
    print(i)
    i += 1```
    
You saw a bunch of other examples in the prep work video, so let's get to using them.

# Random Choices #

(5 points) Last week you generated an arbitrary, fake DNA sequence by producing a list of nucleic acids with the desired composition, and using random.shuffle() to shuffle the list.

If we didn't want to create an unshuffled list of nucleotides and then shuffle it, we could just choose the new nucleotides one at a time out of a list. The function that does this is the choice() function from the random module.

Given a list, choice will pick one item at random from the list. Run the cell below to see.

In [1]:
import random

nucs = ['A','T','G','C']

print(random.choice(['A','T','G','C']))

C


In the cells below, use 1) a for loop and 2) a while loop with a counter to generate a random sequence of an arbitrary length using choice(). Practice writing the code in the form of a function or functions. Some methods which may be useful to you:

- The "".join() string method for joining a list into a string
- The .append() method for adding something to a list
- Initializing an empty list [] and/or counter before a looping process starts
- The range() function for generating a numerical iterable
- You can use <, >, >= or <= in comparison statements, not just == 

In [4]:
seq = ""
size = int(input("Enter number of nucleotides: "))
for i in range(size):
    seq += random.choice(nucs)
print(seq)

Enter number of nucleotides:  5


ACTAT


In [5]:
seq = ""
i=0
while i<size:
    seq += random.choice(nucs)
    i+=1
print(seq)

ACTGT


# Random Choices #

Now consider how you're stopping your while loop. One option is the classic: set a counter to zero outside the loop, and let it run until it hits the desired number of iterations. Another option would be to use the length of the joined list which will be growing with every iteration.

In the cell below, implement a while loop that uses the length of the growing sequence as its measure of whether it's done.

In [6]:
seq = ""
while len(seq) < 25:
    seq += random.choice(nucs)
print(seq)
print(len(seq))

ACCGATCCGGAACTGTGGATTCGTG
25


# Complex conditions #

In the cell below, implement a while loop that cuts off when the count of Gs + Cs in the growing sequence reaches a given threshold. Instead of saying, I want a sequence 1000 nucleotides long, you'd be saying "I want whatever length sequence gives me at least 250 Gs and 250 Cs. Because of the dynamics of the random process, this won't always be 1000 nucleotides -- you'll get a distribution of sequences and lengths.

The condition that you give for the while loop can be complex: 

```while seq.count("G") <= 250 or seq.count("C") <= 250:```

gives a loop that would keep going as long as at least one of the two values <= 250. We'll spend some time with complex Booleans and the True or False values they yield, next week. Your job as the programmer is to think through what has to be true before your loop will stop.

Here, if I said:

```while seq.count("G") <= 250 and seq.count("C") <= 250:```

The loop would stop as soon as *one* of the two values reached 250, which has a slightly different meaning than the original statement above.

In [7]:
seq = ""
while seq.count("G") <= 250 or seq.count("C") <= 250:
    seq += random.choice(nucs)
print("G:", seq.count("G"))
print("C:", seq.count("C"))

#both must be > 250

G: 251
C: 275


In [8]:
seq = ""
while seq.count("G") <= 250 and seq.count("C") <= 250:
    seq += random.choice(nucs)
print("G:", seq.count("G"))
print("C:", seq.count("C"))

#stops when one is > 250

G: 245
C: 251


# If...else #

Last week we looked at a very simple if statement. If a condition was met, you'd do the thing, and no other alternative was specified. But what if you want to do the thing under one condition, and do a different thing under another condition?

You use if/else!

```for i in range(0,100):
    if i % 2 == 0:
        print("It's even!")
    else:
        print("It's odd!")```
        
Given the list of charged amino acids ['H','K','R','D','E'], create a function that adds to one counter if the current amino acid is charged, and a different counter if it is not. Use the included sequence of human insulin as your input. It has 19 charged and 91 uncharged amino acids.

In [9]:
sequence = "MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN"
charged=0
uncharged=0
for n in sequence:
    if n in ['H','K','R','D','E']:
        charged+=1
    else:
        uncharged+=1
print(charged, uncharged)

19 91


# While...else #

The while loop is different than a for loop in that it can also take an 'else' statement. What that says is "while the condition is true, do the first thing, but when the condition is no longer true, do the second thing".

```i = 0
while i <= 10:
    print(i)
    i += 1
else:
    print("All out of numbers!")```
    
Once you've entered the else part of the block, if the condition depends on a counter, you can't trick the loop into starting up again. For instance, in the example above, you can't reset the loop by re-assigning a value of 0 to i after the second print statement.

(5 points) Use a while/else construct to gather input from the user. Ask them for an amino acid three-letter code. Foolproof your handling of the input to get the input to the right case.

Methods to remember:
- input() gets input from the user
- .capitalize() puts a string in capitalized case
- .strip() strips whitespace off your input
- the in operator has an inverse -- the *not in* operator
- the == operator has an inverse -- the != operator

If the name given is not of length 3, or it does not match one of the the contents of the all_aminos list, keep looping until your user gets it right. WHILE they are getting it wrong, keep looping, ELSE finally give them your approval.

In [15]:
all_aminos = ["Ala", "Arg", "Asn", "Asp", "Cys", "Glu", "Gln", "Gly", "His", "Ile", "Leu", "Lys", "Met", "Phe", "Pro", "Ser", "Thr", "Trp", "Tyr", "Val"]

inp = input("Enter a nucleotide: ").strip()
while len(inp) != 3 or inp not in all_aminos:
    inp = input("Enter a nucleotide: ").strip()
else:
    print("You have my approval.")
    print(inp)

Enter a nucleotide:  Asp


You have my approval.
Asp


# Infinite loops #

While loops are very flexible, and they are theoretically infinite. In the loop below, we have implemented the Fibonacci function, which is also infinite by nature.

```a = 1
b = 1
while a <= b:
    a,b = b,a+b
    print("a: " + str(a) + " b: " + str(b) + "\n")```
    
If we put this in our Jupyter notebook and run it, it will probably cause you to have to quit out of Jupyter entirely. 

In general, we don't want to create infinite loops. If we're dumping big numbers to standard output on the screen, it's not so dangerous, but if we were speedily writing huge Fibonacci numbers to a file on our hard drive, or storing them in a list in memory, things could get VERY VERY BAD very quickly. 

If you think you've made a while loop that might be infinite, before you store its output to a file or list, print first just to be sure.

In the cell below, see if you can come up with a different way to create an infinite loop. Test it at the python command line as shown in class. Enter it here, BUT COMMENT IT OUT SO YOU DON'T RUN THE CELL.

In [46]:
# a = 1
# b = 1
# while a <= b:
#     a,b = b,a+b
#     print("a: " + str(a) + " b: " + str(b) + "\n")

# Fibonacci with while/else #

In the cell below, modify the Fibonacci code to stop when a cutoff criterion (for example, b <= 100) is met, and print a friendly message to your user in the "else" block.

In [16]:
a = 1
b = 1
while b < 100:
    a,b = b,a+b
    print("a: " + str(a) + " b: " + str(b) + "\n")
else:
    print("Wow! That's a great number.")

a: 1 b: 2

a: 2 b: 3

a: 3 b: 5

a: 5 b: 8

a: 8 b: 13

a: 13 b: 21

a: 21 b: 34

a: 34 b: 55

a: 55 b: 89

a: 89 b: 144

Wow! That's a great number.


# Stopping a loop with break #

We can stop a loop (either for or while) with some special reserved words that change the control flow. The first is break. Break stops the loop completely and moves on to the next statement in the program if it's reached. To see the effect of this, put the following code into an empty cell and run it.

```for i in range(0,10):
	if i == 5:
		break
	else:
		print(i)```

What would you expect to see as the output of this program?

In the first cell below, use a for loop with a conditional inside it to loop through the sequence and print out each character. If the program encounters a character that is not in ['A','T','G','C'], it should break.

In the second cell below, modify the Fibonacci while loop by adding a conditional inside it. Stop the loop with a break when b gets to a threshold. 

**ADD THE BREAKING CONDITION AND TEST BEFORE YOU RUN IT IN JUPYTER. IF NOT YOU WILL CAUSE A JUPYTER KERNEL PANIC AND HAVE TO START OVER. ASK ME HOW I KNOW.**

In [17]:
problem_seq = "ATGCATGCATGCBCTGACGTACGAT"

In [18]:
for n in problem_seq:
    print(n)

A
T
G
C
A
T
G
C
A
T
G
C
B
C
T
G
A
C
G
T
A
C
G
A
T


In [19]:
a = 1
b = 1
while a <= b:
    a,b = b,a+b
    print("a: " + str(a) + " b: " + str(b) + "\n")
    if b > 1000:
        break

a: 1 b: 2

a: 2 b: 3

a: 3 b: 5

a: 5 b: 8

a: 8 b: 13

a: 13 b: 21

a: 21 b: 34

a: 34 b: 55

a: 55 b: 89

a: 89 b: 144

a: 144 b: 233

a: 233 b: 377

a: 377 b: 610

a: 610 b: 987

a: 987 b: 1597



# Skipping an iteration with continue #

The other way we can stop a loop is with continue. Unlike break, continue stops the current iteration, and moves to the next iteration. To see the effect of continue, put the following code into an empty cell and run it.

```for i in range(0,10):
	if i == 5:
		continue
	else:
		print(i)```

In the first cell below, use a for loop with a "continue" on the same sequence you used "break" on above.

In the second cell below, "continue" the Fibonacci loop if "b" is odd, and print if it's even. You'll still have to stop the loop with a cutoff criterion if you don't want it to be infinite. Bonus question: in the Fibonacci series < 1000, are Fibonacci numbers more likely to be odd or even, or is it equal?

**MAKE SURE TO ADD A CUTOFF CONDITION BEFORE YOU TRY RUNNING THE CELL**

In [20]:
for i in range(0,10):
    if i == 5:
        continue
    else:
        print(i)

0
1
2
3
4
6
7
8
9


In [21]:
a = 1
b = 1
while b < 100:
    a,b = b,a+b
#     print("a: " + str(a) + " b: " + str(b) + "\n")
    if b % 2 == 1:
        continue
    else:
        print(b)

#odd + even = odd
#odd + odd = even
#even + even = even
#more likely to be odd because the sequence cycles like this (starts at 0+1 --> even+odd):
#1. even + odd = odd
#2. odd + odd = even
#3. odd + even = odd
#4. even + odd = odd
#5. odd + odd = even
#6. odd + even = odd
#clearly, there is a 3 step cycle that results in fibonnaci numbers that are odd, even, and then odd
#1/3 of numbers are even
#2/3 of numbers are odd

2
8
34
144


# Consume a list with .pop() and stop when you run out #

If we want to use the existence of a list as the condition to continue our loop, we can consume the list with the method .pop(). This works only for lists and not for strings, but we also know how to make a string into a list and join it back together.

.pop() takes the last item ([-1]) off a list after using it. To see how this plays out, run the code in the cell below:

Then try the following. Before the first print statement, insert the following line. 

```all_aminos_r = [i for i in reversed(all_aminos)]```

This is a list comprehension, and it's the easiest way to do what we just did, which is reverse the order of the list items in a list to make a new list. Reversed doesn't make a new LIST or modify the existing list, it makes a reversed iterator out of a list which then has to be consumed somehow. 

In the second cell below, take the alphabet string "ABCDEFGHIJKLMNOPQRSTUVWXYZ" and make it into a list with list(). Then reverse it as shown above and pop the items out with a while loop.

While loops and other operations that consume a list with .pop() are great for cases, where once you find the thing, you don't need it anymore. This doesn't matter with a short list, but it matters when data gets large in memory or there are a large number of comparisons to be made. .pop() can also be used with an argument, the exact index of the item to be removed from the list, which makes it useful in many situations.

In [22]:
all_aminos = ["Ala", "Arg", "Asn", "Asp", "Cys", "Glu", "Gln", "Gly", "His", "Ile", "Leu", "Lys", "Met", "Phe", "Pro", "Ser", "Thr", "Trp", "Tyr", "Val"]
while all_aminos:
    all_aminos_r = [i for i in reversed(all_aminos)]
    print(all_aminos.pop())
print(all_aminos)

Val
Tyr
Trp
Thr
Ser
Pro
Phe
Met
Lys
Leu
Ile
His
Gly
Gln
Glu
Cys
Asp
Asn
Arg
Ala
[]


In [23]:
abc = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
abc_list = list(abc)
r_abc = [i for i in reversed(abc_list)]
print("r_abc")
while r_abc:
    print(r_abc.pop())
print(r_abc)
print("abc_list")
while abc_list:
    print(abc_list.pop())
print(abc_list)

r_abc
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
[]
abc_list
Z
Y
X
W
V
U
T
S
R
Q
P
O
N
M
L
K
J
I
H
G
F
E
D
C
B
A
[]


# Challenge: mutate a sequence #

(5 points) In the cell below, you have the code that loads in the chloroplast genome sequence we worked with before, as one big sequence. 

Write a program that loops through the characters in a sequence and randomly changes one in every 150 nucleotides to a different nucleotide. 

In order to get string or list elements with a while loop, you'll need to use a counter. This is not the most efficient way to step through a string or list, UNLESS you need to add some other conditions for the loop to continue.

```i = 0
while i <= len(string) - 1:
    char = string[i]
    i += 1```
    
You can combine the basic "are we still in the string" condition with other conditions:

```while i <= len(string) and percent_changed <= 1.0:
    char = string[i]```

Your looping should stop when 1% of the total sequence length is substituted, or when the sequence ends. Remember that there's a good chance (25%) that a "random substitution" will yield the same nucleotide that is already there, and that's OK. But if the original sequence doesn't change, don't add to the substitution counter.

Write your new sequence and your original sequence out to two lists. Iterate through them and print out the list index number, the substitution count, the original nucleotide, and the new nucleotide, wherever the two lists are different.

In [24]:
def genomic_fasta(genome):
    """genomic_fasta: parses the sequence lines out of a genomic DNA FASTA file
    parameters: expects an open file object
    return: a single DNA sequence string
    """
    lines = genome.readlines() # converts the open file object to a list
    DNASeq = [] # creates a new empty list
    for i in range (0, len(lines)): # iterates over the lines from the file
        if lines[i][0:1] != ">": # filters out the header line that starts with >
            DNASeq.append(lines[i].strip("\n")) # appends the remaining lines to the empty list after stripping
        
    return ''.join(DNASeq)    # joins and returns the DNA sequence lines

DNAFile = open("NC_007898.fasta") # creates a file object from a stored file
DNASeq = genomic_fasta(DNAFile) # passes the file object to the fasta parser function
DNAFile.close() # closes the file object

In [25]:
def mutate(original):
    orig_seq = list(original)
    new_seq = list(original)
    percent_changed = 0.0
    count_changed = 0
    print("i\t", "changes  ", "%\t", "Original ", "Mutation")
    for i in range(len(new_seq)):
        if percent_changed > 1.0:
            break
        if 100 != random.randint(1,150):
            continue
        char = new_seq[i]
        new_char = random.choice(['A','T','G','C'])
        if char == new_char:
            continue
        else:
            new_seq[i] = new_char
            count_changed += 1
            percent_changed = (count_changed / len(new_seq)) * 100
            print(i, "\t", count_changed, "\t  ", round(percent_changed,2), "%\t", orig_seq[i], "\t  ", new_seq[i])

In [26]:
mutate(DNASeq)

i	 changes   %	 Original  Mutation
193 	 1 	   0.0 %	 A 	   T
241 	 2 	   0.0 %	 A 	   C
249 	 3 	   0.0 %	 A 	   C
320 	 4 	   0.0 %	 A 	   G
849 	 5 	   0.0 %	 G 	   A
863 	 6 	   0.0 %	 A 	   C
1138 	 7 	   0.0 %	 C 	   G
1466 	 8 	   0.01 %	 T 	   C
2123 	 9 	   0.01 %	 T 	   C
2453 	 10 	   0.01 %	 T 	   G
2714 	 11 	   0.01 %	 A 	   G
2715 	 12 	   0.01 %	 A 	   T
2890 	 13 	   0.01 %	 T 	   A
3297 	 14 	   0.01 %	 A 	   G
3434 	 15 	   0.01 %	 A 	   C
3956 	 16 	   0.01 %	 A 	   T
3993 	 17 	   0.01 %	 A 	   C
4009 	 18 	   0.01 %	 A 	   G
4094 	 19 	   0.01 %	 G 	   T
4304 	 20 	   0.01 %	 C 	   G
4830 	 21 	   0.01 %	 T 	   G
5367 	 22 	   0.01 %	 C 	   T
5602 	 23 	   0.01 %	 A 	   T
5652 	 24 	   0.02 %	 A 	   T
5664 	 25 	   0.02 %	 T 	   C
5679 	 26 	   0.02 %	 T 	   A
6113 	 27 	   0.02 %	 A 	   G
6425 	 28 	   0.02 %	 G 	   A
6652 	 29 	   0.02 %	 T 	   A
6653 	 30 	   0.02 %	 A 	   T
6806 	 31 	   0.02 %	 A 	   C
7526 	 32 	   0.02 %	 T 	   G
7851 	 33 	   0.02 %	 T 	  