# Data Types - Synopsis

In this unit we will learn that:

1. "Simple" variables in Python can belong to one of three data types: Boolean variables, numbers, or strings:

    1. Boolean variables are True and False and can be acted upon with logical operator (and, not, or, ...). Like Turning on and off a light!
    
    2. Numbers can be integers (1, 2, ...) or floats (1.2, 0.333333, ...) and can be acted upon with mathematical operators (+. -. *, /, %%, ...)
    
    3. Strings are sequences of characters ('a', 'aaa', 'no', ...) that can be acted upon with string operators (+, *, ...)  
    
2. Python data types can be acted upon with functions. We will cover the following functions

    1. `print()`, `type()`
    
    2. Several functions that act upon string variables such as `reverse()`, `sort()`, `strip()`, and so on

# Computer programs operate on data 

A computer program is a set of statements (i.e., intructions) to accomplish one or more of the following: read, create, calculate, transform, organize, and store data.

In order to operate on data, a computer program must have a way to store and retrieve it. This goal is achieved through the use of variables. If you think of data as pieces of paper where you wrote something, then a variable is a folder to which you afix a sticker with a name and where you may store one or more pieces of paper.

Python has basically only three rules about naming variables:

* names you define must start with a letter (a-z,A-Z) or underscore (_) and can be followed by any number of letters, digits (0-9), or underscores


* names you define cannot be the same as any of Python's reserved words (see handout)


* names are case-sensitive: 'YOU', 'you', 'You', and 'yOu' are all different names in Python


Note that '-', '+', '*', and '/' are used by Python for defining operations on data and cannot be used in names. 

Note also that that the characters '@', '$' and '?' are not used in Python syntax.


## Different data, different variable types

Data come in many formats. It can be a number, a piece of text, an image.  In order to write readable, efficient code it is important to create variables that match the nature of the data.

For simple data types, Python figures out what data type is best when you create a new variable. An important thing to keep in mind when naming variables is that it's up to you to name them well. There are a number of different data types in Python, but there's no distinction as to how you must name them. This means that it is up to you to give good, descriptive names to your variables.

### Why is that important?

The point of descriptive variable names is to improve readability and understanding of code for both yourself and others. While you may believe that you will never re-use a piece of code, or that if you do you will remember what you were doing, the truth is that you won't. **Good naming practices make all the difference.**


## Creating variables

One creates a variable by assigning a value to it:

In [2]:
num=2
word="credit card"
# python does NOT distinguish between '' and "" for strings

Now we have created `a_number` as the **variable** and assigned `2` as its **value**. As a hint, the construction

*variable* = *value*

holds across most programming languages. Now at any time we can use our variable `a_number` again, or just look at its value. 

We can look at the content of a variable using the `print()` function.

In [3]:
print(num)

2


We can also determine the data type that Python assigned the variables

In [4]:
type(num)

int

In [5]:
type(word)

str

Once a variable is created, we can perform operations on it:

In [7]:
(num*5)/2
# / is a FLOAT division

5.0

In [8]:
(num*5)//2
# // is a INT or FLOOR division

5

But only those operations which are appropriate to the data type

In [10]:
word + " " + str(num)
# you CANNOT subtract a string from a string

'credit card 2'

As its name suggest, a variable can have its associated value **changed**. 

In [11]:
print(num)
num=5
print(num)

2
5


In [12]:
print(num)

5


**Exercises**
1. Add 5 to 10
2. Subtract 8 from 2
3. Multiply 4 by 7
4. Find the remainder of 15 divided by 4
(HINT: Look at the FULL LIST of python operations)
5. Raise 2 to the power of 5 WITHOUT using single multiplications.

Use Shift+Tab to find documentation of built-in functions

In [13]:
5 + 10

15

In [15]:
2 - 8

-6

In [16]:
4 * 7

28

In [23]:
divmod(15,4)
# divmod is to return both quotient and remainder
# If a number is evenly diivisble by another number, the mod should return 0.

(3, 3)

In [24]:
15 % 4
# mod is to return remainder only

3

In [25]:
2 ** 5

32

In [26]:
2 ** 1/2

1.0

In [20]:
pow(2,5)

32


## Basic data types

Python has eight built-in data types. Four of those are quite simple, in the sense that they can **store a single value**:

* Integers
* Floats
* Booleans
* Strings


The other four are denoted **collections** because they can **store arbitrary numbers of values**. Python's four collection data types are:

* Lists
* Tuples
* Sets
* Dictionaries

Now, let's start with one of the most basic data types, the integer.

## Integers

An Python integer is what in Math is called a **natural number**. They are the numbers you count.

Python allows you to do basic arithmetic with integers whether you define variables or not. Those operation are represented using the same notation you saw on a calculator.

In [29]:
# Multiple assignment
# Make sure to match counts left and right of the = operator
i,j = 3, 2
print(i)
print(j)

3
2


In [30]:
# Multi assignments - common mistakes
# 1) Insufficient values, excess variables (too many things on the left)
# 2a) Insufficient variables, excess values (too many things on the right)

In [31]:
x, y, z = 5

TypeError: cannot unpack non-iterable int object

In [32]:
# Multi assignments - common mistakes
# 2b) Just ONE variable and multiple values (DANGEROUS!)

In [34]:
x = 1,2,3
print (x)
# you may accidentally create a tuple instead of an int
# there will be NO error message

(1, 2, 3)


In [35]:
type(x)

tuple

In [37]:
4 - j

2

We can also store the result of an operation into a variable. The variable will store the evaluated answer, not the arithmetic expression.

In [38]:
8 / i

2.6666666666666665

In [39]:
result = 8 / i
print(type(result))
print(result)
# you have to wrap type in print if not it will disappear

<class 'float'>
2.6666666666666665


We see that the division operator stores the answer that we are used to, which is $2.6\bar{6}$. This behavior for the division operator is actually new in Python 3! Before in Python 2 when we would do the operation 

`first_result = 8 / 3`

We would get the result:

`print( first_result ) ==> 2`

This was because it was thought that if we divide one integer by another integer, the operation should also return an integer in order to keep all the variable types the same. 

What if we want to get the remainder? The symbol is:

In [40]:
7 % 4

3

It even works with a decimal remainder:

In [41]:
4.2 % 2

0.20000000000000018

## Floats

A Python float is what in Math is called a **rational number**. While floats are meant to replicate on the computer **real numbers** the fact is that one can only use a limited amount of storage to keep a number so it is impossible to store an **irrational number** such as **pi**. 

In [42]:
new_float = 4.2
print(type(new_float))
print(new_float)

<class 'float'>
4.2


Now something that you should notice here is that `float` and `int` are colored green (as are `print` and `type`). That's because these are words in Python that are already defined by the language. 

**Python will let you overwrite them**. However, you should wait until you are a programming god to do it (or just don't do it. ever. either way). 

If you ever accidentally do it, like so:

## Comparing numbers

As important as being able to calculate something, is to be able to compare the result of several computations.  

Most of the symbols used for comparison are quite standard and just what you would expect.  The exceptions are the symbols for 'different' and 'equal to'.

The $==$ allows us to check if one side of the operator is equal to the other side.

In [43]:
# == check the identity, are things the same?
# Is 5 equals to 5, is it true or false?
5 == 5

True

In [44]:
5 == 17

False

Python evaluates the expression and tells us that it is `True` if it is correct or `False` if it is incorrect.

The $!=$ operator allows us to check if one side does not equal the other side:

In [45]:
# != the non-identity operator, are things not the same?
5 != 5

False

In [46]:
5 != 17

True

In the above we were checking for equivalance!

The greater than, less than, greater than or equal to, and less than or equal to operators all work as we would expect.

In [47]:
5 < 5

False

In [48]:
5 <= 5

True

In [49]:
5 >= 5

True

In [50]:
99 > 1

True

## Booleans

A Python Boolean is what is math is called a **logical variable**. The name Boolean refers to **George Boole** who first defined an algebraic system of logic in the mid 19th century. The Boolean data type is primarily associated with conditional statements, which allow different actions and change control flow depending on whether a programmer-specified Boolean condition evaluates to `True` or `False`.

With just these two variable values we can implement basic logic and check for truth in a programming language. Let's say that I am taking one loan. I will say that the variable loan is True.

In [51]:
loan = True
type (loan)

bool

In [52]:
print(loan)

True


We can see that when we print `loan` it says `True` and that the type is `bool`. 

Since I only have one puppy, I'm going to say that `loans` is `False`.

In [54]:
loans = False
type(loans)

bool

To implement logic we have three basic operations: `and`, `not`, and `or`.

These can be used to create the most basic statements. Here's how they work.

If I use the `and` operator, then both sides of the `and` expression need to be True for the expression to be true.

AND is strict: will allow a max of 0, F input for T output

OR is permissive: will allow a max of 1, F input for T output

In [55]:
print(loan)
print(loans)

True
False


If one side of the expression is `False`, then the whole expression will be `False`

In [56]:
True and loan

True

In [57]:
True and loans

False

In [60]:
(5 > 5) and loans
# 5 > 5 is false

False

As you would expect, we can perform these expressions with variables (remember that I only have one puppy).

In [62]:
loan and loans
# ordering does not matter

False

The `not` operator expects that the following value should be `False` for the expression to be true. If the following value is `True` or exists then it will say that the expression is `False`

'not' is a Boolean inverter

In [63]:
not loans

True

In [64]:
not (5 == 5)

False

We can combine this with the `and` operator to make our entire previous statement about my pets `True`

In [65]:
loan and not loans

True

Finally, the `or` operator only requires that **at least one** side of the expression is `True` for the expression to be `True`

In [66]:
loan or loans

True

But, we still need at least one side to be `True` !

In [67]:
loans or False

False

In [69]:
x, y, z = 3, 2, 25

In [70]:
# Nesting logic tests 
z is (5**2) or not (x != y)

  z is (5**2) or not (x != y)


True

In [1]:
True or False

True

**Exercise**
* Declare variables i=4, j=17, k=31
* Evaluate whether i and x are less than j
* Evaluate whether k or y are greater than z
* Return the reverse truth for whether z is exactly equal to k

Hint: write full expressions

In [3]:
x, y, z = 3, 2, 25
i, j, k = 4, 17, 31

In [4]:
i + x < j

True

In [8]:
i < j and x < j

True

In [10]:
# why you must write out full Boolean expressions
q = 19
q and x < j
# in python, asking "q and x < j" means: " does q exist and is x less than j?"
# This does NOT traverse the comparative and to be evaluated against the <
# So you MUST write full Boolean expressions

True

In [12]:
q < j
# the above boolean test expression is wrong

False

In [6]:
k > z or y > z

True

In [7]:
not z == k

True

In [13]:
z != k

True

In [14]:
z is not k

True

## Strings

A Python string is an **ordered sequence of characters**. Python strings are very powerful and enable us to deal with text even if there is a lot of it and even if we don't know its structure.

To start off let's make some variables.

In [15]:
hello = 'Today I am in class'
print(type(hello))
print(len(hello))
print(hello)

<class 'str'>
19
Today I am in class


You can use basic math operators to add strings together and make a longer string.

In [16]:
# you can concatenate strings using "+"
print(hello + " for Python")

Today I am in class for Python


You can even just multiply a string to make it longer. Can you say `loans` seven times fast?

In [17]:
"loans " * 5

'loans loans loans loans loans '

Python can!

However, you can only use mathematical operations that are unambiguous. Since it is unclear what dividing or subtracting strings should entail, those operations have not been built-in.

You can define your own interpretation of those operations, though!

We can add strings and variables that have string values together to create a longer string, then assign that longer string to a variable.

In [18]:
foo = "I can "
bar = "combine "
print(foo + bar + "strings with +")

I can combine strings with +


### Strings are actually collections!

Strings are collections of characters.  Because they are collections, you are able to access its elements individually or in groups. 

Accessing a single element of the string is called **indexing**. 

**To index a single element, you add `[ ]` after the variable name** and tell it the numerical index of the element you want to access.

In [1]:
word = "loan"
print(type(word))
print(word)

<class 'str'>
loan


In [5]:
word[1]
# Python is a 0-indexed language
# Python starts counting with 0,1,2,3..
# The indices is always -1 from element count
# Collection of N objects always take N-1 final index
# RHS of ranges and slices will be affected

'o'

Huh? I said that I wanted the first element but Python returned `y` which is the second character in the `loan` variable. 

Why is that???

In Python, like in many other programming languages, all sequences are **zero-indexed**. That means the numerical index for the first element is actually `0`

The counting after that position is normal. So if want the letter `a` which is the **third** letter in the word `loan`, then the index will be **`2`**

For longer strings it is hard to figure out the index of the final elements. To circumvent that difficult, Python allows you to access elements by counting from the end of the string too.

When counting from the end, you use negative indicies. A way to figure this out is to see a string a written on a ring where the last element comes just before the first element.  So, if the first element has index 0, the one before must have index **`-1`**. 

In [6]:
long_last_word = 'pneumonoultramicroscopicsilicovolcanoconiosis'

In [7]:
len(long_last_word)

45

In [8]:
# Grabbing the last character via various methods
# 1) Hard-coded explicit index 
# Recommendation: Don't do this!

long_last_word[44]

's'

In [9]:
# 2) Progammatic method
long_last_word[len(long_last_word)-1]

's'

In [11]:
# 3) Negative Indexing
long_last_word[-1]

's'

From the end, the counting works just the same as from the start

In [12]:
long_last_word[-2]

'i'

In [13]:
long_last_word[-45]

'p'


Something to be aware of, though, is that if you try to access an element **it must exist**. That means that since `gyro` is four characters long, you cannot try to access the element with index `4`.

In [14]:
word[4]

IndexError: string index out of range

### Slicing a string

What if we wanted to get out more than one element from a string? We can do that too, it's called slicing.

The syntax for slicing is deceptively simple, the simple syntax is:
`variable[start_index : stop_index]`

You'll see that all of the inputs go within the `[]` and the `:` separates each input. 

The `start_index` tells python which index we want to start getting elements from.

The `stop_index` tells python which index we want elements **up to but not including**


In [15]:
sen = "Hello this is a good loan"

If we want to access the 'hello this' part of the string

In [16]:
print(sen)
print(len(sen))

Hello this is a good loan
25


In [20]:
# Slicing with explicit start and stop indices
sen[6:10]
# RIGHT HAND SIDE WILL NEED ADJUSTING B/C of 0-based indexing

'this'

In [21]:
#Implicit start: Put nothing left of the :, the slice will begin at index 0
sen[:6]

'Hello '

In [22]:
sen[6:]

'this is a good loan'

In [23]:
# Negative index ranging
sen[-9:-5]

'good'

# Exercises

I have a string called bank_loans that has all of the different loan varieties in a bank.

In [1]:
bank_loans = 'home fire property car'

In [8]:
# Question 1: Print/Extract just 'car'
bank_loans[-3:]

'car'

In [9]:
bank_loans[19:]

'car'

In [None]:
# Question 2: Print out a single `m` from `bank_loans`

In [10]:
bank_loans[2]
# third letter, 3-1=2

'm'

In [None]:
# Question 3: Print out just `fire`

In [11]:
bank_loans[5:9]
# Adjust RHS by adding 1

'fire'

### Exercises completed!