# 5. Data Types

There are four essential kinds of Python data with different powers and capabilities:

- Strings (Text)
- Integers (Whole Numbers)
- Floats (Decimal Numbers)
- Booleans (True/False)

<img src="https://hips.hearstapps.com/digitalspyuk.cdnds.net/16/08/1456483171-pokemon2.jpg?resize=768:*" style="width:100px;float:left;margin-right:1rem; border-radius:9px"> They're sort of like starter pack Pokémon!

## 5.1 Spotting the Difference Between Data Types

Take a look at the variables `filepath_of_text` and `number_of_desired_word` in the word count code below.

**What differences do you notice between these two variables and their corresponding values?**

In [47]:
# Import Libraries and Modules

import re
from collections import Counter

# Define Functions

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

# Define Filepaths and Assign Variables

filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
number_of_desired_words = 40

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']

# Read in File

full_text = open(filepath_of_text, encoding="utf-8").read()

# Manipulate and Analyze File

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

# Output Results

most_frequent_meaningful_words

[('love', 93),
 ('like', 50),
 ('ain', 50),
 ('slay', 49),
 ('sorry', 44),
 ('okay', 42),
 ('oh', 38),
 ('m', 37),
 ('get', 32),
 ('daddy', 28),
 ('let', 28),
 ('back', 24),
 ('said', 22),
 ('work', 21),
 ('cause', 21),
 ('ft', 21),
 ('hold', 20),
 ('night', 19),
 ('feel', 19),
 ('hurt', 19),
 ('best', 19),
 ('winner', 19),
 ('every', 18),
 ('bout', 18),
 ('money', 17),
 ('baby', 16),
 ('boy', 16),
 ('long', 16),
 ('shoot', 16),
 ('good', 16),
 ('catch', 16),
 ('know', 15),
 ('ooh', 15),
 ('got', 14),
 ('come', 14),
 ('pray', 14),
 ('way', 13),
 ('gon', 13),
 ('kiss', 13),
 ('re', 12)]

You might be wondering...

Why is `"../texts/music/Beyonce-Lemonade.txt"` emphasized with one color and surrounded by quotation marks while `40` is differently colored and not surrounded by quotation marks? 

It is because these are two different "types" of Python data.

| Data Type       | Explanation          | Example  |
| ------------- |:-------------:| :-----|
| String     | Text | ```"Beyonce-Lemonade.txt","lemonade"``` |
| Integer     | Whole Numbers      |   ```40``` |
| Float | Decimal Numbers      |   ```40.2``` |
| Boolean | True/False     |   ```False``` |

## 5.2 Check Data Types

You can check the data type of any value by using the function `type()`.

In [2]:
type("lemonade")

str

In [2]:
type(filepath_of_text)

str

In [1]:
type(40)

int

In [15]:
type(number_of_desired_words)

int

## 5.3 Strings

A *string* is a Python data type that is treated like text, even if it contains a number. Strings are always enclosed by either single quotation marks `'this is a string'` or double quotation marks `"this is a string"`.

In [None]:
'this is a string'

In [None]:
"this is also a string, even though it contains a number like 42"

In [None]:
this is not a string

It doesn't matter whether you use single or double quotation marks with strings, as long as you use the same kind on either side of the string.

If you need to include a single or double quotation mark *inside* of a string, then you need to either:
- use the opposite kind of quotation mark inside the string
- or "escape" the quotation mark by using a backslash `\` before it

```{margin} Escape characters
A backslash character `\` tells Python to treat the next character like a normal character and to ignore any special meaning
```


In [None]:
"She exclaimed, 'This is a quotation inside a string!''"

In [None]:
"She exclaimed, \"This is also a quotation inside a string!\""

### 5.3.1 String Methods

Each data type has different properties and capabilities. So there are special things that only strings can do, and there are special ways of interacting with strings.

For example, you can *index* and *slice* strings, you can *add* strings together, and you can transform strings to uppercase or lowercase. We're going to learn more about [string methods](https://melaniewalsh.github.io/Intro-Cultural-Analytics/Python/String-Methods.html) in the next lesson, but here are a few examples using a snippet from Beyoncé's song "Hold Up."

In [2]:
from IPython.display import IFrame
IFrame("https://www.youtube.com/embed/PeonBmeFR8o?start=95", width='500', height='400')

In [3]:
lemonade_snippet = "Hold up, they don't love you like I love you"

#### 5.3.1.1 Index characters in a string

In [4]:
lemonade_snippet[0]

'H'

#### 5.3.1.2 Slice strings

In [6]:
lemonade_snippet[0:20]

"Hold up, they don't "

#### 5.3.1.3 Add (i.e., concatenate) strings together

In [11]:
lemonade_snippet + " // Slow down, they don't love you like I love you"

"Hold up, they don't love you like I love you // Slow down, they don't love you like I love you"

#### 5.3.1.4 Make string characters uppercase

In [12]:
lemonade_snippet.upper()

"HOLD UP, THEY DON'T LOVE YOU LIKE I LOVE YOU"

### 5.3.2 f-Strings

A special kind of string that we're going to use in this class is called an *f-string*. An f-string, short for formatted string literal, allows you to insert a variable directly into a string. [f-strings were introduced with Python version 3.6](https://docs.python.org/3/whatsnew/3.6.html#new-features).

An f-string must begin with an `f` outside the quotation marks. Then, inside the quotation marks, the inserted variable must be placed within curly brackets `{}`.

```{margin} What does \n mean?
\n = new line
```

In [13]:
print(f"Beyonce burst out of the building and sang: \n\n'{lemonade_snippet}'")

Beyonce burst out of the building and sang: 

'Hold up, they don't love you like I love you'


## 5.4 Integers & Floats

An *integer* and a *float* (short for *floating point number*) are two Python data types for representing numbers. Integers represent whole numbers. Floats represent numbers with decimal points. They do not need to be placed in quotation marks.

In [1]:
type(40)

int

In [2]:
type(40.5)

float

In [3]:
type(40.555555)

float

You can do a large range of mathematical calculations and operations with integers and floats. The table below is taken from Python's documentation about [Numeric Types](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex).

| Operation       | Explanation                                                                      |
|-----------------|-----------------------------------------------------------------------------|
| `x` + `y`           | sum of `x` and `y`                                                              |
| `x` - `y`           | difference of `x` and `y`                                                       |
| `x` * `y`           | product of `x` and `y`                                                          |
| `x` / `y`           | quotient of ``x`` and `y`                                                         |
| ``x`` // `y`          | floored quotient of `x` and
`y`                                                 |
| `x` % `y`           | remainder of `x` / `y`                                                          |
| -`x`              | `x` negated                                                                   |
| +`x`              | `x` unchanged                                                                 |
| `abs(x)`          | absolute value or magnitude of `x`                                         |
| `int(x)`          | `x` converted to integer                                                      |
| `float(x)`        | `x` converted to floating point                                               |
| `pow(x, y)`       | `x` to the power `y`                                                            |
| `x` ** `y`          | `x` to the power `y`                                                            |


### 5.4.1 Multiplication

In [8]:
variable1 = 4
variable2 = 2
variable1 * variable2

8

#### 5.4.2 Exponents

In [9]:
variable1 ** variable2

16

### 5.4.3 Remainder

In [2]:
72 % 10

2

## 5.5 Booleans

Booleans are "truth" values. They report on whether things in your Python universe are `True` or `False`. There are the only two options for a boolean: `True` or `False`. 

For example, let's assign the variable `beyonce` the value `"Grammy award-winner"`

In [17]:
beyonce = "Grammy award-winner"

<div class="admonition pythonreview" name="html-admonition" style="background: lightgreen; padding: 10px">
<p class="title">Python Review</p>
<p>Remember the difference between a single equals sign `=` and a double equals sign `==`?</p>
 
<ul>
<li>A single equals sign `=` is used for variable assignment </li> 
    <li>A double equals sign `==` is used as the equals operator </li>
</div>

We can "test" whether the variable `beyonce` equals `"Grammy award-winner"` by using the equals operator `==`. This will return a boolean.

In [18]:
beyonce == "Grammy award-winner"

True

In [19]:
type(beyonce == "Grammy award-winner")

bool

If we evaluate whether `beyonce` instead equals `"Oscar award-winner"`, we will get the boolean answer.

In [20]:
beyonce == "Oscar award-winner"

False

## 5.6 TypeError

If you don't use the right data "type" for a particular method or function, you will get a `TypeError.`

Let's look at what happens if we change the data type `number_of_desired_words` to a string `"40"` instead of an integer.

In [7]:
import re
from collections import Counter

def split_into_words(any_chunk_of_text):
    lowercase_text = any_chunk_of_text.lower()
    split_words = re.split("\W+", lowercase_text)
    return split_words

filepath_of_text = "../texts/music/Beyonce-Lemonade.txt"
number_of_desired_words = "40"

stopwords = ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours',
 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', 'her', 'hers',
 'herself', 'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves',
 'what', 'which', 'who', 'whom', 'this', 'that', 'these', 'those', 'am', 'is', 'are',
 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does',
 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until',
 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into',
 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down',
 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here',
 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more',
 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so',
 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', 'should', 'now', 've', 'll', 'amp']


full_text = open(filepath_of_text, encoding="utf-8").read()

all_the_words = split_into_words(full_text)
meaningful_words = [word for word in all_the_words if word not in stopwords]
meaningful_words_tally = Counter(meaningful_words)
most_frequent_meaningful_words = meaningful_words_tally.most_common(number_of_desired_words)

most_frequent_meaningful_words

TypeError: '>=' not supported between instances of 'str' and 'int'

## 5.7 Application - Write a short bio!

Here's an example of data types in action using some biographical information about me.

In [21]:
name = 'Prof. Walsh' #string
age = 1000 #integer
place = 'Chicago' #string 
favorite_food = 'tacos' #string
dog_years_age = age * 7.5 #float
student = False #boolean

In [43]:
print(f'✨This is...{name}!✨')

print(f"""{name} likes {favorite_food} and once lived in {place}.
{name} is {age} years old, which is {dog_years_age} in dog years.
The statement '{name} is a student' is {student}.""")

✨This is...Prof. Walsh!✨
Prof. Walsh likes tacos and once lived in Chicago.
Prof. Walsh is 1000 years old, which is 7500.0 in dog years.
The statement 'Prof. Walsh is a student' is False.


In [44]:
print(f"""
name = {type(name)}
age = {type(age)}
place = {type(place)}
favorite_food = {type(favorite_food)}
dog_years_age = {type(dog_years_age)}
student = {type(student)}
""")


name = <class 'str'>
age = <class 'int'>
place = <class 'str'>
favorite_food = <class 'str'>
dog_years_age = <class 'float'>
student = <class 'bool'>



## Application

### Short bio

Let's do the same thing but with biographical info about you! Fill in the variables below accordingly.

In [9]:
name = "Thomas"#Your code here
age = 20#Your code here
home_town = "Chester"#Your code here
favorite_food = "Brookies"#Your code here
dog_years_age = age * 7.5#Your code here * 7.5
student = True #boolean
# Add 2-3 new variables of your own below
middle_name = "Gregory"
last_name = "Moore"

In [12]:
print(f'✨This is...{name}!✨')

# Add your variables in the below print statement
print(f'{name} likes {favorite_food} and once lived in {home_town}.{name} is {age} years old, which is {dog_years_age} in dog years. The statement {name} is a student is {student}.')
# YOUR NEW SENTENCES HERE WITH F-STRINGS""")

✨This is...Thomas!✨
Thomas likes Brookies and once lived in Chester.Thomas is 20 years old, which is 150.0 in dog years. The statement Thomas is a student is True.


### Make your own!

Make your own print statement with variables below. 

In [13]:
print(f'My middle name is {middle_name} and my last name is {last_name}')

My middle name is Gregory and my last name is Moore
