## This notebook serves as both an introduction to Jupyter notebooks *and* a brief introduction to Python.

Note that this portion is not a comprehensive discussion of the Python language.  There are many books (with many 100's of pages) on the subject, and the goal here is to introduce you to some basic concepts that will be used in this workshop.

### With jupyter notebooks, you can describe what you are doing right next to your code.  

This is **very** helpful for yourself and your collaborators.  You WILL forget what you were doing if you come back to the code in 2 weeks (or even tomorrow...).  Typically, the reading of code can be helped by including "comments"-- these are remarks or notes that go directly next to the code.  However, it is much easier to read formatted comments and equations, possibly including figures.   

As we work through this tutorial, I will show pieces of code and attempt to explain the rationale behind each decision.

In the cell below, I will show the most basic introduction to Python:

In [3]:
# You can also describe what you are doing in code-- just start the line with "#"
# These "comments" tend to be short, so more general descriptions or motivation should probably go in the "markdown" cells.

# Below, I declare a variable x and set it equal to 5
x = 5

# Now, I can perform operations on x:
y1 = x*4
y2 = x+5
y3 = x**3 # x "cubed".
print(y1,y2,y3)

20 10 125


### Above, we declared an integer `x` and performed some operations.  It appears trivial, but it covers a number of basic, but important, items.

Any time we use the `=` sign, we are performing an "assignment" of a value to a variable.  In our usual mathematical language, we might read `x=5` as "x equals 5".  While that interpretation is OK here, it becomes a little more confusing if we write,

```
x = 0
x = x + 1
```

This is perfectly valid Python, and the second line is commonly used to increment the value of x; you might use such a line if you were counting sequencing reads by scrolling through a large file.
However, if we read `x=x+1` as a typical algebraic "equation", it does not make much sense.  It is better to read it as "x is assigned the value of x plus 1".  Specifically, we see that the value of `x` starts at zero on the first line.  In the second line, we take the existing value of `x` (which is zero), add 1 to it, and re-assign it to `x`.  Thus, in the end, `x` holds the value 1.     

### In other languages, you often have to initially decide what "type" a variable is.  Not with Python.

Above, we made `x` equal to an integer.  However, we did not explicitly declare that `x` MUST be an integer.  For example, in C++, you would have to write:
```
int x = 5;
double y = 3.2;
```
(note that `double` is essentially a non-integer number).

In C++ (and other languages), once variables are declared, you *cannot* change the type.  If you later tried to set `x=4.6`, your program would generate an error, explaining that you tried to assign a "double" to a variable that only accepts integers.

Python is very relaxed about that, trading convenience for speed.  By strictly enforcing "types", software written in languages like C++ can be further optimized for high performance.  However, the loss in performance for Python is often negligible for most applications.

### Note that *for the most part* white space does not matter, EXCEPT at the start of a line:

In [4]:
# These are all the same and valid:
x=5
x   =     5
x  =5

All of the above are valid and equivalent.  The space **after** `x` does not matter.

### The reason white-space matters at the start of the line is that Python uses the "leading" white space to create "code blocks".  This whitespace can created either by spaces or with the "Tab" key.  As long as you are consistent, it is OK.  

That said, it is recommended to use typically use 4 spaces.  This makes the indents large enough to be obvious when reading the code.

For example, we show intentation in the `for` loop below:

In [6]:
# This "import" gives us access to "out of the box" code that lets us generate random numbers
import random

# Generate 5 random integers between zero and 100:
for i in range(5):
    x = random.randint(0,100)
    print(x)
print('Done')

88
46
93
49
97


Above, the `for` loop lets us do something repeatedly-- for the same amount of typing we can do this for 5 random integers, or 5 million.  Each time, the indented code (only two lines) is executed; Python analyzes the "leading" space to determine the code blocks.  

If it helps, you can imagine a small arrow going line-by-line through the `for` loop.  The arrow executes a code statement and goes to the next line.  When it reaches the bottom of the indented block, it jumps back to the start of the loop and goes again.  

Note that `range(n)` is a simple way to generate the numbers 0,1,2,3,..., n-2, n-1.  **Important**: note that `range(5)` starts at zero and ends at 4.  Most people might expect 1,2,3,4,5.

### We can indent as many times as we like to create the logic we need.  However, it is uncommon to indent more than a few times.

Below, we show nested code blocks.  As we loop through a list with a `for` loop, we check each number to determine if it is even or odd.  This introduces "conditional statements":

In [5]:
a_list = [0,5,7,4,1,2]
for x in a_list:
    print('Look at ' + str(x))
    if (x % 2) == 0:
        print(str(x) + ' is even.')
    else:
        print(str(x) + ' is odd.')
    print('...')
print('Done with loop.')

Look at 0
0 is even.
...
Look at 5
5 is odd.
...
Look at 7
7 is odd.
...
Look at 4
4 is even.
...
Look at 1
1 is odd.
...
Look at 2
2 is even.
...
Done with loop.


### Some notes about the code above:
- We declare a **list** by putting items inside the square brackets `[...]`.  Lists can be anything, even mixing "types".  For instance, 
```
a_list = [1, 'a', 2.3, 'b']
```
is valid Python.  This list mixed integers, "strings" (letters/words), and "floats" (non-integer numbers).

- The `for` loop allows us to go through the list (`a_list`) one item at a time.  The indented code (lines 3-8) is run *each time* through the loop.
  Since our list contained six items, the loop is run six times.
  
- We also showed a "conditional" statement (the `if...else`).  This allows us to take different actions depending on whether a condition is met.  For instance, if a gene is upregulated, we might take an action.  Otherwise, we might do something else.  Note that the code to be executed is indented further.  These conditional statements can be as complex as you need.

- We used the "modulo" operator (`%`) to get a division "remainder".  For example, `8 % 3` evaluates to 2.  This is because the *quotient* of 8 divided by 3 is 2, with a remainder of 2.  Similarly, `7 % 2` evaluates to 1 since the quotient is 3 with a remainder of 1.  The pattern of `x % 2 == 0` is a very common programming idiom for testing whether an integer is even or odd.  

- We used a "comparison operator" to test whether the current integer was even.
    - A single equals (`=`) means "assignment". i.e. `x=2` can be read as, "set the variable x equal to 2"
    - The double equals `==` tests whether the items are equivlant. i.e. x==y can be read as "is x equal to y?"
    - Similarly, we can test if things are not equal (`x != y`), less than (`x > y`), and so on.
    
    
- Note that to make the `print` statements, we had to "wrap" the variable `x` (which was an integer)  by writing: `str(x)`.  By using `str(x)` we were able to express the integer (e.g. 5) as a "string" (e.g. "5").  This allowed us to then "sum/add" it to another string/word.  Otherwise, Python gets confused...how can it "sum" an integer and a word?  It knows how to "sum" two strings just by putting them next to each other.  For example, `y="ABC" + "def"` gives `y="ABCdef"`.

- Note that when you write strings/words, you can use either single (`'`) or double quotes (`"`).  These are equialent:
```
x = 'abc'
y = "abc"
```

### One additional VERY useful item is the Python "dictionary".  These are essentially "lookup tables" (or "mappings") and best demonstrated with a couple examples:

In [6]:
ensg_to_genes = {
    'ENSG00000141510': 'TP53',
    'ENSG00000134323': 'MYCN',
    'ENSG00000171094': 'ALK'
}

# the 'key' can reference anything-- below it points at a list of genes in a hypothetical pathway
pathways = {
    'pathway_A': ['TP53', 'BCL2L12', 'MTOR'],
    'pathway_X': ['MYCN', 'PPARG', 'EGFR']
}

Each "key" (which is unique!) points at a "value"; you will also see these called "key-value pairs".  In the first dictionary (`ensg_to_genes`), the unique "ENSG" IDs maps to the common gene name (a string).  In the second dictionary (`pathways`), the unique pathway names point at a list of strings.  

The "keys" can be anything unique and the values can be any valid Python "thing".   

To demonstrate their use, imagine you have a list of Ensembl gene IDs and you want the common gene symbol.  Given the `ensg_to_genes` dictionary above, you can simply "address" the dictionary:

In [7]:
ensg_to_genes['ENSG00000134323']

'MYCN'

You can imagine that if you had a long list of ENSG IDs and you created a `for` loop, you can quickly convert all the ENSG IDs to their common gene names.

### Finally, we note that it is advisable to structure your code into reusable "chunks".  This is useful for both simple organization of code and for cases where you can re-use the code multiple times.  One way to create re-usable components is to declare "functions"

Breaking code into small functions makes it easier to understand and test.  If each small piece does its job correctly, then you can "guarantee" it all works. 

You can define custom functions, which are just like functions in mathematics-- they take an input and produce an output.  For example, we can write a function that takes an integer as input and tells us whether the number is even or odd:

In [13]:
def is_even(x):
    # need to check if it's an integer.  If not, raise an error
    if type(x) == int:
        if x % 2 == 0:
            return True
        else:
            return False
    else:
        print('This only works on integers')
        raise Exception('is_even only accepts integers')

The function takes a single input variable which we call `x`.  It produces an output that is either `True` or `False` (both of which are special Boolean values in Python).  The value that is produced by the function is often called its "return value" and is made explicit when we write something like `return True`. 

One drawback of Python not declaring "types" (e.g. that `x` is guaranteed to be an integer) is that we cannot guarantee `x` will be always be an integer.  Therefore, we *should* explicitly check this.  Here, we raise an "exception" which flags the error.  Depending on your needs, you may choose not do that and you may decide to handle those "edge cases" (unexpected inputs) in another way.  There is no correct or incorrect way-- just different!

Using that function, we can re-run the loop we had earlier.  Note that since the last item of our list is a string ("a"), our function raises the exception, which causes an error, as expected.

In [12]:
a_list = [0,5,6,7, 'a']
for x in a_list:
    if is_even(x):
        print('Even!')
    else:
        print('Odd')

Even!
Odd
Even!
Odd
This only works on integers


Exception: is_even only accepts integers

### This was a very trivial example, but one can imagine how this pattern is useful.  

#### By using functions, we can "package" code that is ready to use and is appropriately general.  For instance, someone could write a parser for arbitrary BAM files (a special alignment format for sequence reads) and distribute that to the community.  Then, assuming it was done correctly, anyone using Python can now use that code without having to know all the details about BAM files, their compression, storage, etc.  The `pysam` library is a popular Python package for doing exactly this.

### The VERY brief introduction above is not meant to be comprehensive and we will encounter new syntax and situations as we progress through this course.  PLEASE stop me if you have any questions.