# Python fundamentals

This notebook introduces the core capabilities of the Python language and gives you a chance to practice applying them.

This and the next lesson are the most coding-heavy lessons; the focus of the remainder of the course will be on real data analysis as opposed to coding concepts (though we will revisit coding concepts as necessary)

## Calculations

Python is good at math. You can use these symbols, or "operators," to perform arithmetic calculations just as you would in Excel:
- \+ for addition
- \- for subtraction
- \/ for division
- \* for multiplication

Apply this below to answer these questions:
- What is 83810205 divided by 6789?
- If one cat has 78 whiskers, and one shelter has 112 cats, and there are 4,019 shelters in the US, how many whiskers are on cats in shelters in the entire US?

In [6]:
## write your code here

Exponentiation - raising a number to a power - uses the ** symbol. In Excel, this would be done with the POWER() function, or with the ^ symbol, but the ^ symbol in Python is reserved for an operator called XOR, which we won't use in this class.

- To what power does 2.7 have to be raised to produce a number greater than one million?

In [None]:
## write your code here

There is one more important arithmetic operator, represented by the % sign. This operator is referred to as "modulo," and it returns the remainder after dividing the number on its left by the number on its right. For example, 5 % 2 returns 1, the remainder after dividing 5 by 2.

Modulo will come up again and again in your coding journey as it is necessary for many algorithms to calculate remainders.

- What is the remainder after dividing 10203040506070809 by 987654321?
- One canonical use case for the modulo operator is the divisibility test, determining whether one number is evenly divisible by another. How could you use modulo to test this?

In [65]:
## write your code here


import pandas as pd

df = pd.DataFrame([[1,2,"Series A"],[2,3,4],["Series A","4","5"]],columns = ["month","year","financing"])
print(df)

import statistics as st
st.median(df[
    (df["month"] == 1) & 
    (df["year"] == 2) & 
    (df["financing"] == "Series A")
]["month"])

      month year financing
0         1    2  Series A
1         2    3         4
2  Series A    4         5


1

## Variables

It's useful to store data for later use, and to reference it by a descriptive name instead of explicitly writing out "2.7" (or whatever) every time you need to do math. To do this you use the "=" symbol, known as the assignment operator.

The assignment works by assigning what's on the left to what's on the right:
- [your variable name] = [your data]


You can use this to store multiple model inputs:
- whiskers_per_cat = 78
- cats_per_shelter = 112
- shelters_in_us = 4019

You can also do math to variables just like with the numbers themselves:
- whiskers_grand_total = whiskers_per_cat * cats_per_shelter * shelters_in_us

Try it below. Note that once you assign the "whiskers_grand_total" variable, you'll have to 
call "print" if you want to see anything. At the end of your code, type "print(whiskers_grand_total)."

<div class="alert alert-block alert-danger">
You can name variables <em>almost</em> anything you want, 
but there are some "reserved words" in the Python language that can never be used as variable names. They include "True," "False," "None," "if," "and," and others.
    You can browse the full list <a href="https://realpython.com/lessons/reserved-keywords/">here.</a> Variable names also cannot begin with anything other than a letter, and cannot contain spaces.
</div>




In [21]:
## write your code here

Variable assignment intuitively reads left to right, but in fact variables are assigned _right to left_ - this means something like this is legal:
- x = 5
- x = x * 6

The variable x comes out the other side with a value of 30, and this is perfectly legal because Python reads the right side of the second line first.

## Data types

Although everything we've done so far has involved numbers, you've actually already encountered _two_ data types in your python journey: integers and floats.

Integers, or ints, are whole numbers with no decimals, positive or negative, including zero. Floats, or "floating point numbers," are numbers with decimals. Fractions are always represented as floats in Python.

The reason Python explicitly separates ints and floats is because they are represented differently in your computer's memory, a distinction that is unimportant for most practical coding. Python will automatically interconvert between ints and floats so you won't often have to think about the difference. Some hardcore coders consider this a drawback because it excuses developers from the need to think hard about the type of all their data, but it does make life a lot easier.

If needed, you can manually interconvert using the int() and float() functions.

What is produced by the following expressions?
- 2 * 2
- 2.0 * 2.0
- 2 * 2.0
- int(2.999999)
- float(5)

In [None]:
## write your code here

In addition to ints and floats, you'll often need to use text in data analysis - for example, a housing dataset might have numeric columns representing sale price and years on market, and also text columns representing neighborhood name, address, current owner name, etc.

There are three different types of characters that can be used to create strings. These are called "string delimiters":
- Single quotes, e.g., 'abc'
- Double quotes, e.g., "abc"
- Triple double quotes, e.g., """abc"""
    - This last method is the only legal way to create a string that crosses over multiple lines.

Strings are allowed to use the "+" operator, which concatenates (combines) two strings together. Strings can't use any of the other arithmetic operators.

Try it yourself - assign the following strings to three different variables:
- I go to business school
- I'm a coder
- I use a language called "Python"

Print each one on its own.

In [1]:
## write your code here

In writing code, you will often need to check whether something is true or false:
- Does one number equal another?
- Is a string contained in another string?
- Does a variable exist?

To test these types of questions, Python has a built-in data type called `boolean`. Booleans can only be True or False. True and False are Python keywords (note the capitalized first letters); you'll notice them turn green when you type them into Jupyter notebooks.

Booleans can be directly created (e.g., `x = True`), but they also result from applying tests to data in Python. Say we have two variables: `x = 5; y = 6` - how do we test whether they are equal? This is done with two equals signs - the equality operator.

`x == y #returns False`
`x == y - 1 #returns True`
`False == False #returns True`
`2 > 1 #returns True`
`2 > 2 #returns False`

Try it yourself:
- Is the remainder when dividing 838114395 by 67890 greater than 12000?
- Does 8 equal "8"?
- Does 8.0 equal 8?
- Does [] equal []?
- Does [0,1] equal [1,0]? Why?

In [43]:
## write your code here

True

## Getting help

Now print the three strings together on one line using the "+" operator. Is anything different? What's going on?

There are three ways to get help on a coding issue:
- Ask someone with more experience (that's what I'm here for)
- Ask Google. There are two types of results that are the most useful:
    - StackOverflow threads. StackOverflow (SO) is an engineering Q&A site that contains answers to almost every question you could ever ask. The community can be kind of spicy if you don't ask questions just the right way, so beware.
    - Official documentation made by the person who created whatever tool is giving you trouble
    - Almost everything else (W3School, GitHub threads, Mozilla Developer Network, etc) is second-best (in my opinion)
- ChatGPT. I suggest asking Chat GPT our question about the single quote and seeing what you get back.
    - ChatGPT can also write code for you. This is useful as a first draft and I encourage you to use it for that. However, for anything you turn in, the code needs to be 100% yours, even if you use ChatGPT as a jumping-off point.
    - Within two weeks of ChatGPT's release, detectors were built and publicly deployed to detect whether text was written by ChatGPT. These tools are in wide use, so plagiarize at your own risk.
    
We're going to introduce a lot of coding concepts here in class, and I of course am always available for help. However, the assignments for class intentionally rely on concepts we will not have covered - if and when this happens, know that getting help in unfamiliar challenges is part of every coder's life (including mine).

## Type coercion

Now you know about three data types: ints, floats, and strings. These can be freely interconverted using the int(), float(), and str() methods. However, note that converting a float to an int executes a "floor" or "round down" operation, so int(2.9) will return 2.

Try it below:
- Assign a variable to contain the result of dividing 12345 by 54321.
    - What data type will this variable contain?
- Now assign a new variable the value "The answer is: "
- Use the "+" operator to combine the result of the division with the string you created.
    - What method do you have to use to make this legal?

<div class="alert alert-block alert-info">
As with dynamic typing, some hardcore coders consider type coercion a crutch for weak developers. In older languages like C, all variables must be explicitly assigned a data type when they are created, and this type cannot be changed. There are good things about this (it is one reason C is faster than Python), but dynamic typing does make development easier.
</div>

## Comments

Comments are among the most critical, controversial, inconsistent, and abused concepts in the world of coding. The idea is simple: comments are lines of code that are not executed by Python, and only exist to be read by other coders (or by you, six months later, when you have forgotten what you were thinking when you wrote your code).

Comments should clarify your thinking and help people understand why you made the choices you made. All code can be written multiple ways, so clarifying the tradeoffs you decided to make is critical. Comments can be made in two ways:
- Using the # symbol - anything after this symbol will not be read by Python
- Using triple-double quotes

You'll be graded in every assignment and presentation on the presence and quality of your code comments. Build a habit right now, today, of commenting your code liberally.

In [None]:
# this is a comment
x = 5 # you can initiate a comment even on a line that also contains code

"""
this is also a comment.

it's often useful to comment across multiple lines, and the triple-double quote
makes that easier.
"""

<div class="alert alert-block alert-info">
There is an informal rule in the developer world that lines of code should never exceed 80 characters. In Jupyter, you'll know your lines are too long if a scroll bar to scroll left and right appears at the bottom of your code cell. If this happens, find ways to divide your longest lines into multiple lines.
</div>

## Data structures

So far we've dealt with singleton data types: one int, one float, one string at a time. But datasets are not singletons; they're large structures of many pieces of data. One way to represent many pieces of data is to use a _list_.

Lists are declared with either the "list()" function, or by using square brackets ([]). The following are equivalent:
- list((1,2,3))
    - Note the double parentheses - we'll understand this in a moment.
- [1,2,3]

Lists are exactly what they sound like: ordered collections of data. In some languages, lists (often called "arrays" in other settings) can contain only one data type, e.g., all ints, or all floats, or all strings. However, in Python they can contain anything, including mixed data. The following are legal lists - though I will say, while there's nothing wrong with them, these grab-bags of data do make most developers feel icky:
- [1,1.5,2] (ints mixed with floats)
- ["Pineapple", 2, 3.14159, ["another","list",2], "Jelly"] (ints, floats, strings, and lists)
- [1,1,2,3,5,8] (ints only)

Try it below:
- Create a list with three elements: your first name, your last name, and your age.
- Assign the list to a variable with whatever name you want.
- Print the list.

In [None]:
## write your code here

Now you know how to put data in lists. How do you get it out? For example, what if you want to access just your last name from the list above?

Python uses bracket notation to access entries in lists. So if you have a list called "my_list," you can access the first element of that list by typing:
- my_list[0]

Note that the first element of the list is element _zero_, not element one - if you asked for my_list[1], you'd get the _second_ element. This is because Python is "zero-indexed." This takes some getting used to conceptually, but is convenient for a lot of reasons which we'll find out about later.

Try it below:
- Create the list ["the","quick","brown","fox","jumped","over","the","lazy","dog"].
- Assign the last element of the list to a new variable and print it out.
- Create a list with the first 10 Fibonacci numbers and assign it to a variable.
- Modify the list so the last number is divided by 3.


In [None]:
## write your code here

## Tuples

Similar to lists, tuples store multiple pieces of data. Unlike lists, tuples cannot be modified.

Tuples are created the same way as lists, except they are wrapped in regular parentheses instead of square brackets. Everything else about tuples is the same as with lists: they can contain different types of data; they are accessed by bracket notation; they are zero-indexed. Tuples are useful for more advanced programming tasks, which we'll encounter in future weeks. For now, just remember they exist.

Try it below:
- Create a tuple with the first 10 Fibonacci numbers and assign it to a variable.
- Try to modify the list so the last number is divided by 3. What happens?


In [None]:
## write your code here

<div class="alert alert-block alert-info">
Changing data is also technically known as "mutating" it; therefore, data types that can be modified, like lists, are called "mutable," as in, "able to be mutated." Data types like tuples that can't be changed are called "immutable."
</div>

## Dictionaries

There's one more important "collection" type object in Python: the dictionary. Dictionaries are also known as collections "key-value pairs," and they are used as lookup lists for pieces of related data. Think about a collection of student ID -> student name mappings:

- "1030204": "Penk, Toby"
- "1030205": "Doe, John"
- ... etc

So now if you have a program that knows student ID's, you can look up the student names here to display it ona webpage, print an ID badge, etc etc. The syntax of dictionaries uses curly braces ({}), commas, and colons:

```
{
    "1030204": "Penk, Toby",
    "1030205": "Doe, John"
}
```

All of the keys (the entries on the left) and the values (the entries on the right) are separated by colons, and the key-value pairs themselves are separated by commas. Strictly speaking, the line spacing and visual formatting doesn't matter - you could create this dictionary all one line, or with different indentation, etc. But the way I've declared it is a standard because it's easy to read.

Keys always have to be ints or strings, whereas values can be any data type at all, including other dictionaries. Dictionaries can be deeply nested, with dictionaries being the keys of dictionaries being the keys of dictionaries.... data types like this are the nightmares of CS students, and we will not discuss them in this class.

Try it yourself:
- Create a dictionary where the keys are your class names, and the values are the names of the teachers of those classes.
- Print out the name of your favorite teacher by calling the appropriate key-value pair from the dictionary.


## An inflection point

You've learned a ton of new concepts, and every single one - even the basic idea of integers as a Python data type - has conceptual depth we couldn't hope to cover in this class. However, all coders are on a lifelong journey of going deeper with these concepts, and you have now fully begun yours. You are a coder.

## Exercise

Now it's time to practice. A lot of the stuff you have to do is stuff we didn't cover together, but you should be able to find all the answers on Google or ChatGPT. I'm also available if you're stumped.

This is not a graded assignment but it contains the same concepts that will be on your first graded assignment, so you might as well do it now, when the stakes are low, so the real assignment with its real stakes will be a breeze.


## Assignment

- Make a GitHub
- Install Python and Jupyter on your machine; run your first notebook on your browser
- Write five ideas for a dataset you might perform your midterm or final analysis on

In [28]:
import math

prime_cache = {
    "max_checked": -1,
    "primes": []
}

def is_prime(n):
    
    if n < 2: return False
    if n < 4: return True
    
    sqrt_cache = math.floor(math.sqrt(n))
    if sqrt_cache <= prime_cache["max_checked"]:
        for p in prime_cache["primes"]:
            if n % p == 0:
                return False
            if p > sqrt_cache:
                return True
    else:
        for i in range(2, sqrt_cache+1):
            if n % i == 0: return False
        
    return True


for i in range(1000000):
    if is_prime(i):
        prime_cache["primes"].append(i)
prime_cache["max_checked"] = 1000000



In [29]:
for i in range(1000000):
    if is_prime(i):
        pass
        


In [19]:
prime_cache

{'max_checked': -1,
 'primes': [2,
  3,
  5,
  7,
  11,
  13,
  17,
  19,
  23,
  29,
  31,
  37,
  41,
  43,
  47,
  53,
  59,
  61,
  67,
  71,
  73,
  79,
  83,
  89,
  97,
  101,
  103,
  107,
  109,
  113,
  127,
  131,
  137,
  139,
  149,
  151,
  157,
  163,
  167,
  173,
  179,
  181,
  191,
  193,
  197,
  199,
  211,
  223,
  227,
  229,
  233,
  239,
  241,
  251,
  257,
  263,
  269,
  271,
  277,
  281,
  283,
  293,
  307,
  311,
  313,
  317,
  331,
  337,
  347,
  349,
  353,
  359,
  367,
  373,
  379,
  383,
  389,
  397,
  401,
  409,
  419,
  421,
  431,
  433,
  439,
  443,
  449,
  457,
  461,
  463,
  467,
  479,
  487,
  491,
  499,
  503,
  509,
  521,
  523,
  541,
  547,
  557,
  563,
  569,
  571,
  577,
  587,
  593,
  599,
  601,
  607,
  613,
  617,
  619,
  631,
  641,
  643,
  647,
  653,
  659,
  661,
  673,
  677,
  683,
  691,
  701,
  709,
  719,
  727,
  733,
  739,
  743,
  751,
  757,
  761,
  769,
  773,
  787,
  797,
  809,
  811,
  821,
  823

In [None]:
slices
appending
popping

In [None]:
tuples

In [None]:
dicts

In [None]:
control flow - if statements

In [None]:
loops

In [None]:
functions

In [None]:
map, filter, and reduce