# Introduction

Welcome to the introductory Python module!  This module will guide you through the process of using Python. Don't worry if you don't have any programming experience; we'll start from the basics and don't assume any prior knowledge.

## What is Python?

Python is a **general-purpose programming language**. This means that it is suited for a wide variety of tasks (as opposed to domain-specific programming languages like R, which is used primarily for statistical computing). Python is generally considered an easy-to-use and flexible programming language, and because of this it's used in nearly every field imaginable (including as a teaching language in most introductory CS and Data Science courses, and very frequently in Computational Biology). One of the biggest strengths of Python is its large community and the wealth of **libraries** available for it, which are collections of code written by other people that you can use in your own projects. 

## Interacting with Python through Jupyter

The main way we will be interacting with Python is through an environment called **Jupyter**. As mentioned in the first module, this is the environment we are currently working in.

### What is Jupyter?
*From the Jupyter documentation:* "First and foremost, the Jupyter Notebook is **an interactive environment for writing and running code**."

Jupyter provides an interactive environment called a *notebook*, which enables you to run code and directly observe its output. It also allows you to mix text, LaTeX, images, and more with your code. This document was written using a Jupyter notebook.

Python is also commonly run on the command line (on Mac or Linux, you can test this out, since some form of python is usually pre-installed: open the terminal on your operating system and type `python` to launch the Python interactive mode. Note that on Macs, you may have the older Python 2 version installed, which is now no longer supported). We prefer to use Jupyter Notebook (at least for now!) because of its friendlier interface and the ability to have plain text, LaTeX, images, and even videos on the same screen as your code. 

Additionally, unlike a Python script, which is a continuous "block" of code, Jupyter Notebooks are split up into **cells** of code that can be run independently.

Jupyter notebooks run in a web browser; however, you do **not** need to be connected to the internet to access or edit them. For the purposes of this module we will be working online through the Berkeley Datahub (visit [datahub.berkeley.edu](https://datahub.berkeley.edu) to create your own notebooks), but if necessary you can install Jupyter on your personal computer.

### What are Cells?
Cells are the modular pieces that make up a Jupyter Notebook. Cells primarily come in two types (although there are more): **Code** and **Markdown**. The cell that you are currently looking at is a Markdown cell: it displays plain text. A code cell, as the name suggests, contains code that can be executed. Here is an example of a cell:

In [None]:
print("Hello, World!")

By default, a new cell is created as a code cell, but you can change a cell from one to the other in the dropdown at the top of this panel.

There are two ways to interact with cells:

**Command Mode:** Press `Esc`, notice the cell outline will turn blue.  <br>
**Edit Mode:** Press `Enter`, or double-click on a cell. Notice it will now have a green outline. As the name suggests, this is the mode to be in if you want to edit the contents of individual cells.<br>
<br>**Common Shortcuts**<br>
* `Ctrl-Enter`: Run current cell. Do this for all example cells presented in this module so you can observe the output (try it out with the example above; you may have to use `Cmd-Enter` on a Mac).
* `Shift-Enter`: Run current cell, and select cell below
* `H`: Show all keyboard shortcuts

There are a lot of great guides in the Help menu at the top of the window as well!

### What is the kernel?
Another term that pops up often is the kernel. The IPython kernel is the "computational engine" that executes the code contained in a Notebook document. You can think of it as the active process that is processing all of the Notebook's activity and executing its Python code.

## Exercises

From now on, each section of this module will feature one or more short exercises, which are designed to get you familiar with using the content presented on your own. Note that you may have to look through the built-in help or at online documentation in order to complete some exercises; this is a useful skill when dealing with unfamiliar code such as that in some of your projects. You should complete the exercises as instructed within this document, and turn in the entire ipynb document for credit for the module.

1. Edit the contents of this cell by adding some lines below this section, and hit `Ctrl-Enter` to display its contents in a nicely-formatted way. Notice the formatting in the text here, such as how I've used **two asterisks** to bold text, *one asterisk* to italicize it, `backticks` to denote code, and hashtags for headers. To get a more thorough understanding of the different formatting options available, look into "Markdown Syntax".
2. The next cell will be stuck in an infinite loop once it is run. Much of the time, infinite loops are problematic, since they will just continuously run without any direct way to stop them. An external influence is needed to "pull the plug". In Jupyter's case, you can interrupt the kernel by pressing the `Stop` (black square) button at the top of the editor panel.<br><br>

In [None]:
# This code is an infinite loop; do not run without reading #2 above
while True:
    print('running')
    time.sleep(1)

If you ever encounter any bizarre errors, restarting the kernel is the equivalent of 'turning Jupyter off and on again'. 
__*Note that restarting the kernel will lose variable values and data! This means that if you close the Jupyter notebook and return sometime later, when you return, you'll need to re-run the cells you've already completed. There are a number of useful options in the Cell tab to help you with this.*__
<br><br>

# Python Introduction

## Mathematical Operations

One of the most basic functions of a programming language such as Python is computing mathematical expressions. Listed below in the table are common operations defined natively in Python:

| **Expression Type** | **Operator** | **Example** |
| - | - | - |
| Addition | `+` | `5+6` <br> `>>> 11` |
| Subtraction | `-` | `5-6` <br> `>>> -1` |
| Multiplication | `*` | `5*6` <br> `>>> 30` |
| Division | `/` | `5 / 6` <br> `>>> 0.8333333333333334` |
| Remainder (modulus) | `%` | `5 % 6` <br> `>>> 5` |
| Exponentiation | `**` | `2**4` <br> `>>> 16` |

Try out some mathematical expressions in the space below for yourself:

In [5]:
# Example mathematical expressions
# Feel free to edit this!
2 + (4 - (5%2))

0.8333333333333334

## Variables

Something you may have noticed is that you currently have no way of storing the information from your calculations. To do this, we use **variables**. Chances are you've probably run into variables in a math class, representing a quantity that changes, e.g. $x$ is the variable in the polynomial $f(x) = x^2$. In programming languages, a variable is a reserved spot in computer memory that stores a particular value or a reference to an object. Each variable has a *type*, which is the kind of data that is being stored (for example, `integer` or `int` variables are whole numbers, `string` variables store text, etc.); this idea is discussed more in the next section. All languages have variables but variables in Python are *dynamically-typed* (unlike other statically-typed languages like Java). This means that you can define a variable `x` to be certain type (e.g. integer) and then later in your code assign `x` to a different type (e.g string).

To assign values to a variable, you use the `=` symbol:

In [None]:
# The following assigns a the value of 5
# Run this cell with Ctrl-Enter to observe the output
a = 5
print(a)

You may have also noticed the use of the `print()` expression. This is an example of a **function**, which is a way of performing specific tasks. We will learn more about functions, including built-in functions and how to define your own, later on, but for now you should know that `print()` displays the value of whatever is in its parentheses, and can be very useful in debugging your code. 

Something else you may notice here is the comment, which reads `# The following assigns a the value of 5`. The # character is used to mark comments, and it indicates to Python that everything from that character to the end of the line should not be read as code. This is very useful to document the different parts of your own code, which is then helpful when you are trying to track down errors. Well-commented code is also incredibly useful when collaborating with others, as it may not always be easy to understand each others' code. Commenting is also useful for commenting-out sections of code you don't want to run temporarily (i.e. without deleting the code), which is useful in debugging and when you may be editing your code but don't want to lose what you previously had.

## Data Types

The "type" of a variable is the type of data that the variable can "contain." You can find the type of any variable or piece of data with the `type` function. Let's explore some common data types!

### Booleans

`bool` values are the simplest datatype: they contain either `True` or `False`. Later, you'll see Booleans being used for control and loop structures (`if`, `while`, `for`).

In [3]:
print(type(True)) # Note that you can place functions inside of each other

<class 'bool'>


### Integers and Floats

`int`s contain exactly what you might expect: the integers $-\infty, \dots, -1, 0, 1, \dots, \infty$. Unlike some other languages, [integers in Python 3](https://docs.python.org/3.1/whatsnew/3.0.html#integers) are unbounded; that is, there is no theoretical maximum integer that Python can store. `float`s, short for "floating point," represents a real number with a decimal value. We have been interacting with integers and floats since the start of the module.

The code below prints out the types of `1` and `1.0`. Note the difference between the two.

In [6]:
print(type(1))
print(type(1.0))

<class 'int'>
<class 'float'>


Note: adding floats together can result in some approximation errors. This has to do with the way floats are stored in computer memory: since computers store things in base 2, but decimals are in base 10, often floats cannot be stored exactly. In practice this isn't always a concern since the errors are small in magnitude, but it can be relevant at times.

In [8]:
print(0.1 + 0.2) # Note how the result is not exactly 0.3, there is a small rounding error

0.30000000000000004


### Strings
Simply put, strings are lists of characters. `'hello'`, `'the blue lake'`, and `'15 pounds'` are all strings. In biological data, you will also often see sequences of genes or proteins represented with strings (for example, the string `'RRR'` represents an amino acid sequence of 3 arginines). Just like we can add integers and floats, we can concatenate strings using the addition operator. For example, let's try writing out `ULAB Computational Biology DeCal` by adding each word together:

In [None]:
concat = "ULAB" + "Computational" + "Biology" + "DeCal"
print(concat)

Note that there are no spaces between the words; one of the exercises for this section will be to fix this.

Something else that is important to note is that in Python, it does not matter whether we type a string using single-quotes (`'`) or double-quotes (`"`). Many other programming languages do make a distinction between these two; within your own code, it is recommended to pick one of the two and stay consistent.

## Exercises

Write the answers to these exercises in a code block(s) below this section, labeled with comments indicating which question the code is answering.

1. Calculate `(9 plus 9) times ((99 divided by 9) plus 999) divided by 9`, and set `x` as your answer. Then, print `x`.
2. Fix the output from the **Strings** section above by adding spaces between the words.

# Data Collections

It is often useful to store multiple different related values together. In this section we will go over three data types used for storing collections of data in Python: lists, tuples, and dictionaries.

## Lists

In Python, any kind of data can be put into a `list` if we want to refer to several variables in some order all at once. Examples of lists include `[7,15,3,30]` or `['Canada','Belgium','Brazil','Singapore']`.


**IMPORTANT**: Lists in Python are **zero-indexed**, meaning that the *first* item is considered to be at position `0`. Subsequently, the second item is at position `1`, and so on. This is how most programming languages work, but is different from the way R works if you have familiarity with that programming language, and it can be somewhat unintuitive at first. Check your code carefully for off-by-one errors as a result of this.

In the code cell below, `a` is set with the list `[1,2,3]`. <br>`b` is then assigned the value of `a`. <br>Finally, `b[0]` (which means "the `0`th item in list `b`") is set to `5`. 

In [None]:
a = [1,2,3]
b = a

b[0] = 5

What is `a`? Think through the above lines, and predict what will happen when we output the value of `a`. Then, run the cell below (once you have run the one above).

In [None]:
print(a)
print(len(a))

That's weird: the value of `a` changed, even when the code only altered `b`. This occurs because the line `b=a` causes the variable `b` to point to the same location in memory as the list `a`. In other words, both `a` and `b` are "pointing" to the **same object**: `[1,2,3]`. Thus, when `b[0]` is set to `5`, the list changes to `[5,2,3]`, and that's what both `a` and `b` now display as their value.

Note that this is not how "plain" variables work; if you assign two variables to be equal to each other, and then reassign one of them later, the original variable will not also change. 

Note how we accessed the 0'th element of the list with `b[0]`. If you wanted to access the last element of this list, you can do this two ways: `b[2]`(since there are 3 total members of the list) or `b[-1]`. The second syntax is more general, and is useful when you don't know how long the list will be, but want to take just the last element; negative numbers in indices count backwards from the last position.

Note also the use of the `len()` function. This function is used to give the length of a list.

### Nesting Lists

It is possible to "nest" a list within another list. So, you could have something like `[ [55, 23], 8, 33 ]`.


With this, it's possible to make multidimensional list, instead of just the "1-D lists" that we have been discussing above. For example, `newList` is a 3 × 3 list, created by putting 3 lists (themselves with a length of 3) in each slot.

In [9]:
newList = [ [12,6,7], [5,8,8], [15,11,3] ]

### Adding Lists

Try running the code below, and then printing out the values of each variable. What do you observe?

In [None]:
sum_1 = [1,2]+[3]
sum_2 = [[1,2]] + [3]

### List Slicing

**List slicing** is a convenient syntax for selecting elements of a list. For example, suppose we have the following list:

In [10]:
ex = [1,2,3,4,5,6,7,8]

Now suppose we wanted to take the 3rd through 5th elements of this list, and assign them to a new list. We could use the following syntax to do this:

In [None]:
ex2 = ex[2:5]
print(ex2)

Try running the above code to see what happens. The selection `a:b` within square brackets will take the elements of the list at indices from `a` up to and NOT including `b`; notice here that since Python lists are 0-indexed, the third element is at index 2 and the 5th element is at index 4, so this extracts the values that we want.

This syntax can also be more flexible. For example:

In [None]:
# Select everything from the second element of the list onwards
print(ex[1:])

# Select everything up to the 5th element
print(ex[:5])

# Select everything up to the last element of the list
print(ex[:-1])

# Assign different values to multiple elements of the list
ex[2:4] = ["a","b"]
print(ex)
# Note what happens to ex2 when we do this:
print(ex2)

Something interesting to note is that a string behaves like a list of characters; this means that you can use the same indexing and slicing syntax of a list to extract characters from a string. Try this yourself in the console!

## Tuples

**Tuples** behave a lot like lists, with one key exception: they are *immutable*. This means their values cannot be altered. This is in contrast to a list: since you can change the values of elements of a list after you have created it, lists are *mutable*. Lists, strings, and tuples are examples of *sequence data types*, which are data types in Python designed for working with sequences of data.

You can define a tuple and extract its elements with the following syntax:

In [None]:
# Define a 4-membered tuple
t = ("a","b","c",1)

# Extract the first two elements of the tuple
print(t[:2])

## Dictionaries

It is often useful to have the ability to have a list-like object where each element is associated with a name. For example, suppose we are storing a list of people and their individual grades. We would usually like to access an individual's grade using their name, without having to remember which position they are in a list. Python has a solution for this sort of problem: the **dictionary**. This data type allows you to store *key-value pairs*: the *key* can be thought of as the "name" of the element, and the *value* is the element itself. The one requirement of a dictionary is that its keys be unique; in other words, each key must refer to only one value.

We can define a dictionary and access its elements as follows (run the code below):

In [None]:
grades = {"Joe":82, "Jane":95, "Bob":20}

# Find Joe's grade
print(grades["Joe"])

# Add a new person to the dictionary
grades["Jack"] = 87
# This also works:
grades.update([("Jill",92)])
print(grades)

Note the last thing we did: we used `grades.update([("Jill",92)])` to add the element for `"Jill"` to the dictionary; however, there doesn't seem to be an assignment here. This sort function, which is called with the `.` syntax, is called a **method**; each Python **object** (for example, a string, integer, list, etc.) has methods associated with it depending on its data type, which can operate on that object. You will learn more about objects and methods in future modules.

## Exercises

Write the answer to these exercise in a code block below, as you did in the previous section.

1. Look up the documentation on lists and the different methods available for them (the official Python documentation can be found at [https://www.python.org/doc/](https://www.python.org/doc/) , and the relevant section for this problem can be found in the official tutorial). Use one of these methods to print the reverse of the list `ex` in the section above, and use another method to add the list `ex2` as an element at the end of `ex` (print `ex` again after you do this).
2. Look up the "join" method for strings. Use this along with a list to more elegantly solve exercise 2 in the previous section

# Control Flow

**Control Flow** is syntax that allows you to control the order in which parts of your code is run (the "flow" in the name). This module will provide a brief overview of the three main control flow constructs: `if`, `while`, and `for` statements. Further modules will give more detail on each of these.

## If Statements

**If statements** are an extremely useful control structure which allow you to selectively evaluate parts of your code based on certain conditions you set. They rely on **conditional expressions**, which are expressions that are either `True` or `False`. These expressions therefore have `boolean` values. Here are some examples:

In [None]:
a = 5 < 3
b = 2 != 4
c = "a"=="a"
print(a)
print(b)
print(c)

Note the use of `==` to test whether two objects are equal, and `!=` to test whether two objects are not equal. In the first statement, since 5 is not less than 3, `a` has a value of `False`. In the next two examples, the expressions are true, so the variables have the value of `True`. You can also combine conditional expressions like this:

In [None]:
# Take some time to break this apart and understand what each piece is doing
# You may find it helpful to create smaller versions of this statement to test each piece
# Pay attention to parentheses!
print( ((4%2 == 0) and (12 != 2)) or (5 < 3) )

Let's take this apart piece-by-piece. We start with the first set of parentheses: This tests whether 4 is divisible by 2 (since it tests whether 4 mod 2 is equal to 0). The second set of parentheses tests whether 12 is not equal to 2. Both of these expressions are `True`. In between them we have `and`, which gives a value of `True` only if *both* expressions on either side of it are `True`. Since this is the case, the `and` expression is true. Next, we have the `or (5<3)` portion. The part in the parentheses we know to be false. However, `or` will return true if *at least one* of the expressions on each side of it is `True`. Since the part on the left is true, the entire expression is thus `True`. Note that this useage of `or` is a bit different from how we  use the word "or" in English. Normally, when we say "or", we exclude the possibility that *both* statements are True; however in Python this is not the case. Another logical statement not used here is `not`, which gives a value of `True` if the expression to its right is `False` and vice-versa.

Returning to if statements, we use these to execute cerrtain pieces of code depending on the value of conditional expressions. For example, suppose we wanted to to print different things based on the value of a number: if the number is even we want to print "even", and if the number is odd we want to print "odd". We can do this as follows: 

In [None]:
# The number
x = 55

if x % 2 == 0: # Even
    print("even")
else:
    print("odd")

Try changing the value of x for yourself in the above code, and rerunning it. Notice the indentation between the different print statements. This is not just for stylistic reasons: Python denotes the inside of `if` statements by this indentation, and code that is not indented is considered not part of the statement. A common standard is 4 spaces of indentation for each level of indentation; Jupyter will do this for you automatically. 

Notice here that we used `else` for all cases in which the number is not even. This works here, but you may want to test for other conditions. There is a syntax for doing this in Python: `elif` (which means "else if"). For example, suppose we had the same example as above, with the added conditions that if the number is 0 we want to print "zero", and if the number is divisible by 3 we want to print "divisible by 3". We can do this as follows:

In [None]:
if x % 2 == 0: # Even
    if x == 0:
        # Note that we have nested one if statement within another
        # This is because 0 is even, so we still need to catch it here
        print("zero")
    else: 
        print("even")
elif x% 3 == 0:
    print("divisible by 3")
else:
    print("odd")

Work through the above with several numbers, and make sure that you understand how the logic flows. Note that you can have any number of `elif` statements in a row before your final `else`. You also aren't required to have any `else` statement, although you should make sure that you are able to catch all cases that you care about in the event you decide to exclude an `else` statement. 

## While Statements

**While statements** are used to loop through a set of code as long as a certain condition is `True`. For example, suppose you were asked to print out every number from 1 to 10 on a separate line. You could simply use a series of 10 print statements; however, this is extremely repetitive and doesn't scale very well (what if I had asked you to print out 100 numbers for example?). Instead, we could solve this with a `while` loop as follows:

In [None]:
# Set an initial value
a = 1
while a <= 10:
    print(a)
    a = a + 1 # Increment the value of a

Try running the code above to observe the output. The way this works is as follows:

1. First, before the loop, we *initialize* the variable `a` to the value 1.
2. We start the while loop with the condition that `a` must be less than or equal to 10.
3. We print the value of `a`
4. We *increment* the value of `a` by one, meaning we add one to it before the end of the loop. This means that `a` will increase for every iteration of the loop (every time the loop runs). Thus, the loop will not go on forever; once `a` is greater than 10, the loop will stop.

If you are not used to expressions like the one in step 4, it may seem counterintuitive to you. In this case, remind yourself that `=` in Python refers to *assigning* a variable a certain value. In this case, the expression on the right of the `=` is 1 added to the current value of `a`. This *then* gets assigned to `a`, becoming its new value. Incrementing a value is a fairly common operation, so there is a shorthand syntax for it: `a += 1` is equivalent to `a = a + 1`.

Note once again the importance of indentation

## For Loops

A **for loop** allows you to loop over a collection of items, and perform some operation *for* each member of the collection. For example, suppose we wanted to print all even numbers within a given list:

In [None]:
numbers = [1,4,6,9,23,54]

for num in numbers:
    if num % 2 == 0:
        print(num)
    else:
        continue

In the syntax of the for loop, we state that we are looping over every `num` in `numbers`. Note that within the body of the loop, we are thus able to use `num` to refer to a specific element of the list `numbers`. Within the loop body, we have an if statement to test if the number is even. Note that in the else portion, we use the word `continue`. This statement, if encountered inside a loop, will immediately cause the loop to move to the next value. It is not strictly necessary in this case (try removing the entire `else` portion and re-running the code), but is included to demonstrate how it works.

## Exercises

Once again, give the solutions to these in a code block below.

1. Look up the `range()` function. Use it along with a for loop to solve the problem I presented in the section on while loops.
2. Use a while loop to add up all numbers from 1 to 100, and print the final output. Do the same with a for loop (Hint: use the `range()` function).
3. Find the sum of every number between 1 and 100 that is divisible by 6, using the concepts covered in this section.

# Functions

Up until this point you have mostly used built-in functions to perform tasks. However, when working on a real-world problem you will often find that some of your tasks will repeat themselves. In such cases, instead of copy-pasting code, it is usually more efficient (and makes your code more readable) to write a separate function of your own, which you then can call whenever you need it.

Let's go over the basic parts of a function: A function has a **name**, **arguments** (which are the values you pass into the function within parentheses), and a **return value** (the output) which can be assigned to a variable. The basic syntax for creating a function is:

In [14]:
def mean(in_list):
    """Returns the mean for a given list of numbers"""
    return sum(in_list)/len(in_list)

# Example usage:
list1 = [1,2,3,4]
mean(list1)

2.5

The above example function is used to calculate the mean of a given list of numbers. The function starts with the keyword `def` ("define"), and then gives the name of the function (`mean`) and its arguments (`in_list`). The **arguments** are the inputs to the function. The function then performs operations on the arguments, and ends in a `return` statement which gives the output of a function. Note also the use of the triple quotes in the second line: this creates what's called a **docstring**, which is a description of the purpose of your function used in documentation (it's generally good practice to include one for each function you create).

While this function was relatively simple (we were able to return a value immediately), you can perform any computation inside a function. For example, here is a more biologically-related example: sometimes, when given a DNA sequence, you will want to know its *GC-content*, which is the proportion of its sequence that is composed of Guanine and Cytosine bases (G and C in the standard notation). To calculate this, we could define a function as follows (make sure to run the code below):

In [16]:
def gc_content(sequence):
    """Returns the GC content of a given input sequence"""
    
    # First, let's make a placeholder variable to store the counts:
    gc_count = 0
    
    # Next, let's convert all of the letters in the sequence to lowercase
    # This makes the function work on strings with any case
    # You can find this and other useful string methods in the Python official documentation
    sequence = sequence.lower()
    
    # Next, let's loop over the input sequence to count all G and C bases
    # We'll use a for loop, as in the control flow section
    for base in sequence:
        if base in "gc": # Tests whether the base is either g or c (is "in" the given string "gc")
            gc_count += 1
            # The above is the same as gc_count = gc_count + 1
        
    # Now that we've counted the number of gc bases, we want to calculate the proportion:
    content = gc_count/len(sequence)
    # Return Statement
    return content

# Example Usage
# The below is the entire sequence for the human SRY gene, the smallest human gene sequence
# Note the use of the \ character at the end to allow the string to extend multiple lines
seq = "AGAAGTGAGTTTTGGATAGTAAAATAAGTTTCGAACTCTGGCACCTTTCAATTTTGTCGCACTCTCCTTG\
TTTTTGACAATGCAATCATATGCTTCTGCTATGTTAAGCGTATTCAACAGCGATGATTACAGTCCAGCTG\
TGCAAGAGAATATTCCCGCTCTCCGGAGAAGCTCTTCCTTCCTTTGCACTGAAAGCTGTAACTCTAAGTA\
TCAGTGTGAAACGGGAGAAAACAGTAAAGGCAACGTCCAGGATAGAGTGAAGCGACCCATGAACGCATTC\
ATCGTGTGGTCTCGCGATCAGAGGCGCAAGATGGCTCTAGAGAATCCCAGAATGCGAAACTCAGAGATCA\
GCAAGCAGCTGGGATACCAGTGGAAAATGCTTACTGAAGCCGAAAAATGGCCATTCTTCCAGGAGGCACA\
GAAATTACAGGCCATGCACAGAGAGAAATACCCGAATTATAAGTATCGACCTCGTCGGAAGGCGAAGATG\
CTGCCGAAGAATTGCAGTTTGCTTCCCGCAGATCCCGCTTCGGTACTCTGCAGCGAAGTGCAACTGGACA\
ACAGGTTGTACAGGGATGACTGTACGAAAGCCACACACTCAAGAATGGAGCACCAGCTAGGCCACTTACC\
GCCCATCAACGCAGCCAGCTCACCGCAGCAACGGGACCGCTACAGCCACTGGACAAAGCTGTAGGACAAT\
CGGGTAACATTGGCTACAAAGACCTACCTAGATGCTCCTTTTTACGATAACTTACAGCCCTCACTTTCTT\
ATGTTTAGTTTCAATATTGTTTTCTTTTCTCTGGCTAATAAAGGCCTTATTCATTTCA"

gc_content(seq)

0.46618357487922707

Notice here that we returned an intermediate variable which we created within the function. What happens if we now try to access that variable outside the function?

In [17]:
content

NameError: name 'content' is not defined

As we can see, we get an error here. This is related to something called **scope**, which is the set of rules that specify where variables can be accessed. The `content` variable exists within the defined function's scope, meaning that it can only be accessed and used within the function. Effectively, the variable does not exist outside of this function. What would happen if we defined it outside the function, and then tried to use the function again?

In [19]:
content = 0.8
gc_content(seq)

0.46618357487922707

We get the same answer! This is because when you access a variable inside a function, Python will first look for the value of that variable within the function's scope (meaning the arguments to the function and the variables defined within the function). If it cannot find the variable there, it will then look outside of the function.

You can also have some arguments with *default values*, meaning that it is optional to include them as inputs. For example, suppose we wanted an extended version of the above GC-content function that had the option to return the count rather than the proportion. One solution to this is to add another argument, `return_count`, which is a boolean, and based on whether this argument is `True` or `False` we could decide within the function whether to return the count. However, this is likely a rare case, and we don't want to have to write out this second argument every time. We can thus set the *default value* of this argument to `False` (walk through and run each cell below):

In [21]:
def gc_content(sequence, return_count = False):
    """Returns the GC content of a given input sequence"""
    
    # placeholder variable to store the counts:
    gc_count = 0
    
    # convert all of the letters in the sequence to lowercase
    sequence = sequence.lower()
    
    # Next, let's loop over the input sequence to count all G and C bases
    for base in sequence:
        if base in "gc":
            gc_count += 1
        
    # Now, we can test our default argument:
    if return_count: 
        # Note that since return_count is a boolean, we don't have to test whether it's True
        return gc_count
    else:
        content = gc_count / len(sequence)
        return content
    

In [None]:
# Example Usage:
# Returning the content
gc_content(seq)

In [None]:
# Returning the count
gc_content(seq, return_count = True)

In [None]:
# Equivalently, since the order of arguments is assumed:
gc_content(seq, True)
# Generally you shouldn't do this, and you should declare each argument explicitly 
# This makes your code easier to read

# Finding Help and Problem Solving

## Python Documentation

One valuable source of information when solving problems in Python is the official Python documentation. While this can be found [online](https://docs.python.org/3/), you can also access these docs directly inside of Jupyter Lab. Simply click the "Help" button at the tom of the window, and you can see options for "Python Reference", "IPython Reference", Numpy, Matplotlib, and more. We will be focusing on the Python Reference for now; click it to open another tab within Jupyter (which you can move around like any other Jupyter tab).

When you open this page, you will see the Python documentation for the version you are using. There are a few important links to notice:
1. "Tutorial": This is a built-in Python tutorial. Especially as you are still learning Python features, this can be very useful to reference as you work. I highly recommend you read through it after completing this module. 
2. "Language Reference": This describes in detail all of the built-in language features. It is not something you will reference much at all, but if you want a description of how various parts of the language function under the hood, this is a good place to start.
3. "Library Reference": This describes the Python **standard library**, which includes all of the built-in functions and data types, as well as various pre-included libraries which are useful for various tasks. This is something you will reference very often. 

Between the Tutorial, Library Reference, and the documentation for any library you are using, you should be able to solve many of the problems you encounter while coding. You are encouraged to look around through these different sources of information, to familiarize yourself with their layout and the information they provide. 

## Searching For Help Online

Oftentimes you will be faced with situations where you cannot properly use the internal documentation to solve a problem. For example, you may know the task you need to do but not the names of functions/methods that can accomplish it, or you may be encountering an unfamiliar error message. In this case, the best resource to turn to is the internet. Oftentimes you can simply Google what you are trying to do (make sure to keep your searches simple and have "Python" or "Numpy" in the search to find Python/Numpy-specific results) and find many resources helping you. 

In addition, you can often simply copy-paste error messages into Google (or perhaps with adding a few words describing what you were trying to do) to find resources discussing causes and solutions. One particularly useful resource is StackOverflow. This is a set of forums for many different topics, which has many subforums related to programming questions. Users post questions they have on the forum, and others can pitch in with possible solutions. If you encounter an error in your code, chances are a similar error has been asked as a question there before, and it will often be among the top Google search results for your error message.

## An Example From a Previous Section

To illustrate the problem-solving process, here is a walkthrough of a problem from a previous section. Run each code chunk as you read it.

The Data Collection section in this module featured the following exercise:
    
    Look up the "join" method for strings. Use this along with a list to more elegantly solve exercise 2 in the previous section.
    
Exercise 2 from the previous section involved adding spaces between words to fix the output of the following code chunk:

In [23]:
concat = "ULAB" + "Computational" + "Biology" + "DeCal"
print(concat)

ULABComputationalBiologyDeCal


The manual method of doing this would of course be to physically add spaces between each of the words. However, we're asked in this problem to look up the `join` method for strings. We know from earlier that we are likely to find this in the *library reference*, since this is where information on built-in types (like strings) is located. Indeed, when we look for the `join` method in the **String Methods** section of the approporiate part of the documentation, we find the following description (you can see it online [here](https://docs.python.org/3/library/stdtypes.html#str.join)):

`str.`**join**(*iterable*):

    Return a string which is the concatenation of the strings in iterable. A `TypeError` will be raised if there are any non-string values in *iterable*, including `bytes` objects. The separator between elements is the string providing this method.
    
There is a fair bit of information we don't need here, but the first and last sentence of the description are crucial. Recall that we are trying to combine (or *concatenate*) multiple strings, with the output including spaces between strings. Here, we can see that `join` concatenates strings presented in the argument *iterable*, which from its name we can deduce is an iterable object like a list. However, we need to know what actual string to use the method on. This is where the last sentence becomes important: "The separator between elements is the string providing this method." In other words, to make a space separate the elements of the list, we just use the method on a space. This can be done as follows:

In [26]:
sep = " "
words = ["ULAB","Computational","Biology","DeCal"]
sep.join(words)

'ULAB Computational Biology DeCal'

In [27]:
# Equivalently, we could just do the following:
" ".join(words)

'ULAB Computational Biology DeCal'

## Exercise

Write your answer in the code chunk below:

Write a function that counts the vowels in a given input string. Test it out on the given string.

In [28]:
test_string = "This is a test string. You should count 13 vowels."

# Further Reading and Resources

Later modules will cover some more advanced topics. In the meantime, and in general when doing your projects, you may find the following useful:

- The Official Python Documentation: [https://www.python.org/doc/](https://www.python.org/doc/) This includes tutorials, as well as documentation on features of Python and the **standard library** (the collection of libraries that comes included with Python). This will be an incredibly useful resource as you work through your projects.
- Python Like You Mean It: [https://www.pythonlikeyoumeanit.com/](https://www.pythonlikeyoumeanit.com/) This resource covers the basics of Python and **Numpy**, a library used extensively in scientific computing and data science. We will be covering the basics of Numpy next week, when we look at statistical calculations in Python.