# 4. Variables

<img src="https://cdn.pixabay.com/photo/2016/09/14/20/48/birthday-1670415_960_720.png" style="width:100px;float:left;margin-right:1rem; border-radius:50%"> Variables are one of the fundamental building blocks of Python. A variable is like a tiny container where you store values and data, such as filenames, words, numbers, collections of words and numbers, and more.

## 4.1. Assigning Variables

The variable name will point to a value that you "assign" it. You might think about variable assignment like putting a value "into" the variable, as if the variable is a little box 🎁

You assign variables with an equals `=` sign. In Python, a single equals sign `=` is the "assignment operator." A double equals sign `==` is the "real" equals sign.

In [2]:
new_variable = 100
print(new_variable)

100


In [3]:
2 * 2 == 4

True

In [4]:
different_variable = "I'm another variable!"
print(different_variable)

I'm another variable!


Let's look at some of the variables that we used when we counted the most frequent words in Charlotte Perkins Gilman's "The Yellow Wallpaper."

In [5]:
# Import Libraries and Modules

import re
from collections import Counter

# Define Functions

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

# Define Filepaths and Assign Variables

filepath_of_text = "../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt"
number_of_desired_words = 40

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

# Read in File

full_text = open(filepath_of_text, encoding="utf-8").read()

# Manipulate and Analyze File

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

# Output Results

most_frequent_meaningful_words

[('john', 45),
 ('one', 33),
 ('said', 30),
 ('would', 27),
 ('get', 24),
 ('see', 24),
 ('room', 24),
 ('pattern', 24),
 ('paper', 23),
 ('like', 21),
 ('little', 20),
 ('much', 16),
 ('good', 16),
 ('think', 16),
 ('well', 15),
 ('know', 15),
 ('go', 15),
 ('really', 14),
 ('thing', 14),
 ('wallpaper', 13),
 ('night', 13),
 ('long', 12),
 ('course', 12),
 ('things', 12),
 ('take', 12),
 ('always', 12),
 ('could', 12),
 ('jennie', 12),
 ('great', 11),
 ('says', 11),
 ('feel', 11),
 ('even', 11),
 ('used', 11),
 ('dear', 11),
 ('time', 11),
 ('enough', 11),
 ('away', 11),
 ('want', 11),
 ('never', 10),
 ('must', 10)]

We made the variables:
- `filepath_of_text`
- `stopwords` 
- `number_of_desired_words` 
- `full_text` 

In [6]:
filepath_of_text = "../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt"
number_of_desired_words = 40
stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

full_text = open(filepath_of_text, encoding="utf-8").read()

## 4.2. Jupyter Display vs `Print()`

We can check to see what's "inside" these variables by running a cell with the variable's name. This is one of the handiest features of a Jupyter notebook. Outside the Jupyter environment, you would need to use the `print()` function to display the variable.

In [7]:
filepath_of_text

'../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt'

In [8]:
stopwords

['i',
 'me',
 'my',
 'myself',
 'we',
 'our',
 'ours',
 'ourselves',
 'you',
 'your',
 'yours',
 'yourself',
 'yourselves',
 'he',
 'him',
 'his',
 'himself',
 'she',
 'her',
 'hers',
 'herself',
 'it',
 'its',
 'itself',
 'they',
 'them',
 'their',
 'theirs',
 'themselves',
 'what',
 'which',
 'who',
 'whom',
 'this',
 'that',
 'these',
 'those',
 'am',
 'is',
 'are',
 'was',
 'were',
 'be',
 'been',
 'being',
 'have',
 'has',
 'had',
 'having',
 'do',
 'does',
 'did',
 'doing',
 'a',
 'an',
 'the',
 'and',
 'but',
 'if',
 'or',
 'because',
 'as',
 'until',
 'while',
 'of',
 'at',
 'by',
 'for',
 'with',
 'about',
 'against',
 'between',
 'into',
 'through',
 'during',
 'before',
 'after',
 'above',
 'below',
 'to',
 'from',
 'up',
 'down',
 'in',
 'out',
 'on',
 'off',
 'over',
 'under',
 'again',
 'further',
 'then',
 'once',
 'here',
 'there',
 'when',
 'where',
 'why',
 'how',
 'all',
 'any',
 'both',
 'each',
 'few',
 'more',
 'most',
 'other',
 'some',
 'such',
 'no',
 'nor',
 '

In [9]:
number_of_desired_words

40

Your turn! Pick another variable from the script above and see what's inside it below.

In [10]:
print(number_of_desired_words)#your_chosen_variable

40


You can run the `print` function inside the Jupyter environment, too. This is sometimes useful because Jupyter will only display the last variable in a cell, while `print()` can display multiple variables. Additionally, Jupyter will display text with `\n` characters (which means "new line"), while `print()` will display the text appropriately formatted with new lines.

For example, with the `print()` function, each of the variables are printed, and the "The Yellow Wallpaper" is properly formatted with new lines.

In [11]:
print(filepath_of_text)
print(stopwords)
print(number_of_desired_words)
print(full_text)

../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt
['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than'

## 4.3 Variable Names

Though we named our variables `filepath_of_text`, `stopwords`,`number_of_desired_words`, and `full_text`, we could have named them almost anything else.

Variable names can be as long or as short as you want, and they can include:
- upper and lower-case letters (A-Z)
- digits (0-9)
- underscores (_)

However, variable names *cannot* include:
- ❌ other punctuation (-.!?@)
- ❌ spaces ( )
- ❌ a reserved Python word

Instead of `filepath_of_text`, we could have simply named the variable `filepath`.

In [12]:
filepath = "../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt"
filepath

'../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt'

Or we could have gone even simpler and named the filepath `f`.

In [13]:
f = "../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt"
f

'../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt'

### 4.3.1 Striving for Good Variable Names

As you start to code, you will almost certainly be tempted to use extremely short variables names like `f`. Your fingers will get tired. Your coffee will wear off. You will see other people using variables like `f`. You'll promise yourself that you'll definitely remember what `f` means. But you probably won't.

So, resist the temptation of bad variable names! Clear and precisely-named variables will:

* make your code more readable (both to yourself and others)
* reinforce your understanding of Python and what's happening in the code
* clarify and strengthen your thinking


**Example Python Code ❌ With Unclear Variable Names❌**

For the sake of illustration, here's some of our same word count Python code with poorly named variables. The code works exactly the same as our original code, but it's a lot harder to read.

In [14]:
def sp(t):
    lt = t.lower()
    sw = re.split("\W+", lt)
    return sw

f = "../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt"
ft = open(f, encoding="utf-8").read()

words = sp(ft)

**Example Python Code ✨ With Clearer Variable Names ✨**

In [15]:
def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = "../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt"
full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)

### 4.3.2 Off-Limits Names

The only variable names that are off-limits are names that are reserved by, or built into, the Python programming language itself — such as `print`, `True`, and `list`.

This is not something to worry too much about. You'll know very quickly if a name is reserved by Python because it will show up in green and often give you an error message.

In [17]:
true = "../texts/literature/The-Yellow-Wallpaper_Charlotte-Perkins-Gilman.txt"

## 4.4 Re-Assigning Variables

Variable assignment does not set a variable in stone. You can later re-assign the same variable a different value.

For instance, I could re-assign `filepath_of_text` to the filepath for the lyrics of Beyonce's album *Lemonade* instead of Perkins-Gilman's "The Yellow Wallpaper."

In [18]:
filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"

If I change this one variable in our example code, then we get the most frequent words for *Lemonade*.

In [19]:
import re
from collections import Counter

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
number_of_desired_words = 40

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

most_frequent_meaningful_words

[('love', 93),
 ('like', 50),
 ('ain', 50),
 ('slay', 49),
 ('sorry', 44),
 ('okay', 42),
 ('oh', 38),
 ('m', 37),
 ('get', 32),
 ('daddy', 28),
 ('let', 28),
 ('back', 24),
 ('said', 22),
 ('work', 21),
 ('cause', 21),
 ('ft', 21),
 ('hold', 20),
 ('night', 19),
 ('feel', 19),
 ('hurt', 19),
 ('best', 19),
 ('winner', 19),
 ('every', 18),
 ('bout', 18),
 ('money', 17),
 ('baby', 16),
 ('boy', 16),
 ('long', 16),
 ('shoot', 16),
 ('good', 16),
 ('catch', 16),
 ('know', 15),
 ('ooh', 15),
 ('got', 14),
 ('come', 14),
 ('pray', 14),
 ('way', 13),
 ('gon', 13),
 ('kiss', 13),
 ('re', 12)]

## 4.5 Application: Naming Variables

Ok now it's your turn to change some variables and calculate a new word frequency! First, pick a new text file from the list below:
- `"../texts/music/Carly-Rae-Jepsen-Emotion.txt"`
- `"../texts/music/Mitski-Puberty-2.txt"`
- `"../texts/literature/Dracula_Bram-Stoker.txt"`
- `"../texts/literature/Little-Women_Louisa-May-Alcott.txt"`
- `"../texts/literature/Alice-in-Wonderland_Lewis-Carroll.txt"`

**To choose from a wider list of files...**:

You can list all available files by running a command line function in a code cell with the `ls` (list) command: `!ls ../texts/music/`

In [20]:
!ls ../texts/music/

'ls' is not recognized as an internal or external command,
operable program or batch file.


Then assign `filepath_of_text` to the corresponding filepath below. Be sure to explore what happens when you change the values for `number_of_desired_words` and `stopwords`, as well.

In [22]:
import re
from collections import Counter

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text =  "../texts/music/Taylor-Swift-Red.txt"
number_of_desired_words = 35

#Explore how the stopwords below affect word frequency by adding or removing stopwords
stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don']

full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

most_frequent_meaningful_words

[('oh', 131),
 ('like', 91),
 ('know', 82),
 ('m', 70),
 ('never', 68),
 ('time', 67),
 ('trouble', 63),
 ('ooh', 61),
 ('re', 60),
 ('now', 59),
 ('back', 44),
 ('ever', 36),
 ('one', 34),
 ('ll', 32),
 ('stay', 32),
 ('last', 32),
 ('yeah', 31),
 ('ve', 30),
 ('got', 26),
 ('love', 25),
 ('cause', 25),
 ('asking', 25),
 ('home', 25),
 ('knew', 25),
 ('everybody', 25),
 ('d', 24),
 ('better', 24),
 ('think', 23),
 ('knows', 23),
 ('tell', 22),
 ('wanna', 22),
 ('starlight', 22),
 ('look', 21),
 ('right', 20),
 ('walked', 20)]