# Python Refresher

**We will take a little time to orient you Python3, the coding language for this tutorial.**

**Goals**
<ol>
    <li> Why use Python (versus R, Ruby, Matlab, etc.) </li>
    <li> Loops </li>
    <li> List comprehension </li>
</ol>



## Why use Python (versus other languages)?
**In my experience, people tend to prefer Python for text processing and analysis over other programming languages like R, Ruby, Matlab, and others.**

While programming languages are designed to be general purpose, there are more packages and resources for text-related tasks than practically any other language. This means that if you want to do some online data mining, work with gigantic corpora, or custom semantic tagging systems (say, for happy versus sad words), you can do all of this *and* the analysis in the same language. Take the example of data mining. You could:

<ol>
    <li>Use the <code>BeautifulSoup</code> package to scrape a list of webpages and parse them into plaintext.</li>
    <li>With regular expressions (regex), <code>nltk</code>, and the <code>string</code> package you can clean the plaintext, tokenize it, create frequency distributions and ngram models.</li>
    <li><code>pandas</code> is good for data analyis tasks for the statistics that come out of NLTK.</li>
    <li>You can even train a "deep learning" neural network based on text data without leaving Python, thanks to <code>PyTorch</code></li>
</ol>

There's also the fact that Python is widely known compared to R and Matlab, which are other languages that can do "all of the above" when it comes to text analysis.

> **The next sections are all about looping and list comprehension in Python3, which is important review for understanding the upcoming code.**

## Loops in Python


### Simple "for" loop

In [None]:
# A list of lists, where each sublist contains the library name and year built
libraries = [ ["Charles Deering ", 1933], ["Seeley G. Mudd Library", 1977], ["Northwestern University Library", 1970] ]

############################
# Loop 1 - Simple for loop #
############################

# This is a "for loop" that iterates "for" a number of times, in this case, the length of 'libraries'
#      There are other types of loops, but this is the only type we will use in this tutorial

# 'lib' acts as a variable that takes the identity of each sublist, one at a time
for lib in libraries:
    
    # Print the sublist
    print( lib )
    
    # Index the first item in the sublist, the name of the library
    #      Never forget! Python counts from zero!
    print( lib[0] )
    
    # Index the second item in the sublist, the year the library was built
    print( lib[1] )


### "For" loop with conditional

Very often, we do not want to perform the exact same task with every element we have in a loop. In these cases we use **conditionals** to only run a script under a certain condition, often in the form <code>if X = TRUE, do Y</code>.

> **In the next code chunk we take the simple "for" loop from above and add a condition: we will only print the names of libraries built after 1950.** So, for each sublist in 'nulibs', we will check the year (the second element in the sublist) and if it is *greater than* 1950, we will print the name (the first sublist element).

In [None]:
######################################
# Loop 2 - For loop with conditional #
######################################

# Our simple for loop again
for lib in libraries:
    
    # If the year built is greater than 1950...
    if lib[1] > 1950:
        
        # ...print the name of the library.
        print ( lib[0] )


### It works... BUT

**This kind of loop exists in every modern programming language, but they are slow and the code takes up a lot of space.** The next section talks about a *faster* and *more compact* way to do the same thing.

## List comprehension in Python

Some facts:
* Loops (especially with many embedded loops) can be hard to read and understand.
* Text processing is computationally intensive, compared to numeric tasks.

Luckily, we do not need to loop over lists with <code>for</code> loops, we can just tell Python to process the list directly using **list comprehension**. 

> In the next line, we reproduce the "For loop with conditional" with list comprehension. Read it like this: **"Return the first element** (lib[0]) **for each element** (lib) **in 'libraries' if the second element** (lib[1]) **is greater than 1950"**.


In [None]:
print( [lib[0] for lib in libraries if lib[1] > 1950] )

### List comprehension is fast and clean...
... and the output is a list, so you can write <code>my_list = </code> before it and have a new list. Then you can do list comprehension again, if you want. Convenient!


# Next: Text cleaning

In the next section, we load a text and "clean" it to prepare it for analysis.


# Code it: Looping

Write a loop that writes a helpful message


In [None]:
#############################################
## ## ## ## > Code it < ## ## ## ##         #
################################### Looping #
## Sample answer in /answer_keys ##         #
#############################################

# A list of lists, where each sublist is the name and year built of a Northwestern academic building.
academics = [ [ "James L Allen Center", 1979], [ "Annie May Swift Hall", 1895], [ "Cresap Laboratories", 1949],  [ "Fisk Hall", 1899], [ "Harris Hall", 1915], [ "Kresge Hall", 1955], [ "Locy Hall", 1928], [ "Lunt Hall", 1894], [ "McCormick Center", 2002], [ "Scott Hall", 1940], [ "Swift Hall", 1909], [ "University Hall", 1869] ]

# For each element in 'element' (as 'building')...
for building in academics:
    continue
    
    # You code here:
