<a href="https://colab.research.google.com/github/upenndigitalscholarship/p4h/blob/main/p4h.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python for Humanists

We're going to jump right into creating a script to teach you the basics of Python syntax and logic.

This particular script will walk you through a script that analyzes the most frequently used words in one of the works of William Shakespeare.

**Important notes**:
- You can edit and save this notebook right in your browser. If you want to, you can download it and use it somewhere else.
- Python reads line-by-line, top-to-bottom, and some things in a script must take place before others. For that reason, during the workshop, we will jump around a little bit. I will always let you know what line we're working on.
- To get the notebook to work, you must "play" each line. If it doesn't work, it will give you an error message to help you understand why. Oftentimes this is just due to a typo. Don't get frustrated or give up! This happens to everyone.

## Imports

This chunk of text imports behaviors and code from a library, which extend the use of Python. There are a lot of libraries out there (you can easily find them at [The Python Package Index (PyPI)](https://pypi.org/)). This will not be the first code block we work on in the workshop, but imports always belong at the beginning of the script.

- The module ['re'](https://docs.python.org/3/library/re.html) stands for Regular Expressions, and it helps us match patterns in text strings.
- The ['Counter'](https://docs.python.org/3/library/collections.html#collections.Counter) function is part of a module called 'collections'. It helps us count distinct things.

Enter into the code block below:
```
import re
from collections import Counter
```


## Variables and the Print function

In Python, variables are used to store values. They are kind of like sticky notes. Let's create a basic variable.

Enter in the code block below:

```
name = "William Shakespeare"
current_age = 2025 - 1564

print(name , "was a famous playwright. He would be" , current_age , "years old today.")
```

## Getting data

You can do a lot of things with just basic Python, but if you want to use it for your research, you're probably going to want to bring in some data to work with.

In this script, we will pull in the text of some of Shakespeare's plays. They are stored in the 'data' folder to the left of the screen. There you will find:
- "romeo.txt", the full text of Romeo and Juliet (premiered 1597), sourced from Project Gutenberg.
- "julius.txt", the full text of Julius Caesar (premiered 1599), sourced from Project Gutenberg.
- "hamlet.txt", the full text of Hamlet (premiered c. 1599-1601), sourced from Project Gutenberg.
We'll just use Romeo and Juliet today, but feel free to try out any othere by replacing their filepath in the code block below.

In the code block below, enter:
```
filepath_of_text = "../data/romeo.txt"

#let's make sure it finds the text file
full_text = open(filepath_of_text, encoding="utf-8").read()
print(full_text)

```

**Don't forget to delete the last 3 lines of your code block above before moving on!** We'll reinsert them later.

## Setting the parameters for our analysis with a list
We only want to analyze the 40 most common words, and we don't want the analyzer to count common meaningless words like "the". We have to give the computer directions for these things. We'll prepare to do this by setting a variable and creating a [**list**](https://www.w3schools.com/python/python_lists.asp), which lets you assign multiple items to a single variable.

In the code block below, enter:

```
number_of_desired_words = 40

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']
```

## Writing a function
Now we want to write a mini chunk of code to give instructions to the computer to do something specific. This is called a **function**. We'll define a function that will look at the text file of Romeo and Juliet and split it into individual words so that we can count them, and also make them lowercase.

In the code block below, enter:
```
def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words
```

## Running the function on text and iterating using for loops
Now that we've defined our function, we want to use it on the text itself. Since our function split the text up into individual words, we will also need to iterate through them using what's called a [**for loop**](https://www.w3schools.com/python/python_lists_loop.asp). This performs an operation on each item in a list.

In the code block below, enter:
```
#bringing back the line reading the text file
full_text = open(filepath_of_text, encoding="utf-8").read()

#performs 'split_into_words' on 'full_text'
all_the_words = split_into_words(full_text)

#goes through all the words one at a time and adds them to a 'meaningful_words' list if they don't appear on the 'stopwords' list
meaningful_words = [word for word in all_the_words if word not in stopwords]

#counts the words in 'meaningful_words'
meaningful_words_tally = Counter(meaningful_words)

#tallies the most common words
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

most_frequent_meaningful_words
```

## Congratulations!
You've written a Python script!