# Tutorial 1: Introduction to Python Language and Jupyter Notebooks

Welcome to Python! This tutorial is aimed at getting you adjusted to using Jupyter notebooks and Python. By the end of this tutorial, you will be able to:
* define basic Python language structure and data types
* understand simple code examples
* operate a notebook and write your own code


## What is Jupyter Notebook?
You are currently reading information in a Jupyter notebook. A notebook is like a document, but it is specific to Jupyter. It's like a Word docx or a Google Doc. The .docx file extension is specific to Word, the Google Doc is specific to Google Drive, the notebook is specific to Jupyter. You can open and read the files using other programs, but it works best in Jupyter, just like .docx work best with Word and Google Docs work best in Google Drive. 

A Jupyter notebook has a couple of different cell types to write in. This cell is a Markdown. The cell below is a code cell, the default cell type. You can change the cell type in Jupyter Notebook using the Toolbar. There are also Raw NBConverts, but you don't need to worry about those. To edit a cell, double click on it. When you are done editing, press Shift+Return on your keyboard. Try this with the code cell below:

In [None]:
x = 3 + 7
print(x)

Pretty easy, right? If you want to make a new cell, there are a few options. You can click the **+** button in the top left corner of this screen (under File). This makes a new code cell appear below the current working cell. If you want a cell to appear above, you can click out of the current cell in a grey margin (the highlight color will turn blue instead of green) and press A on your keyboard. To make a cell below, press B. You can also click "Insert" on the menu bar and choose either "Insert Cell Above" or "Insert Cell Below". 

You can copy and paste text and code just like you would in a Word document or Google Doc. You can also copy and paste entire cells, using the Edit dropdown menu or the shortcut buttons in the tool bar. Hover your mouse over the images to learn what they do. 

Let's try out these tools below!

**Knowledge Check**
Try out each task listed in the cells below. Some are in markdown and some are in code, but try to do all of them.

In [None]:
# Run me
print('Mission Achieved!')

Delete me!

Insert cell below me!

In [None]:
# Insert Markdown cell above me! Then write in a description!
y = 13

Copy me and paste me somewhere else!

Hopefully that quick activity helped you acclimate to Jupyter! There are a few other buttons of importance in the Menu: File -> Save Notebook, Kernel -> Restart, and File -> Close and Shut Down Notebook. 
1. Save Notebook is pretty straightforward. Click whenever you need to make sure your data is saved! 
2. Kernel -> Restart will clear the memory of everything you've run so far. This is most useful if you've been testing out new code and want to run the whole script through to make sure it works. 
3. File -> Close and Shut Down Notebook should be used to exit each notebook. **Make sure you click Close and Shut Down Notebook whenever you are done with a notebook.** Otherwise, the scripts will keep running in the background, which may take up a lot of memory on your computer!

## What is Python?
Python is a type of programming language. It is often described as having a "simple, easy-to-learn syntax". This basically means that writing in Python is logical for humans. We write Python code like how we would talk or form a logical argument. But before we can write in Python, there is some basic vocabulary to learn.

## Packages and Modules
Python is a language, and inside every language there are different types of words. In English, we have parts of speech like nouns, verbs, adjectives, etc. In Python, we have *packages* like Numpy, Scipy, pandas, etc. Each package has a specific purpose, Numpy = matrix math, Scipy = Science-based analysis, pandas = spreadsheet operations (like Excel). Each part of speech has many different categories and words. For example, we have abstract nouns, proper nouns, collective nouns, common nouns, but they are all types of nouns. In a Python package like Numpy, we have different *commands* that tell us what is happening in our Python sentence. 

The code below is an example of a numpy sentence (ignore the import statement for now).

In [None]:
import numpy

x = numpy.arange(5)
print(x)

Numpy is all about matrices, lists, and math operations on those matrices and lists. In this example, I said "numpy." which means I was using Numpy. Then I specified my command as "arange". This command makes a list, or a range of numbers. I said "(5)", so Numpy makes a range of numbers that 5 digits long, starting at 0 (**Note that in Python, indexing for sequences starts at 0**). The print statement below shows us the result of this command. 

You can tell I made a list because the numbers are surrounded by straight brackets [ ]. Arrays, matrices, and lists use brackets to separate out the columns and rows. 

But why did I need the import statement? Python doesn't automatically open all packages when you start a notebook or script. Instead, you need to tell Python which packages you want. You do this with the import statement. You can also give each package a nickname, like "pd" in the example below. Here are some more examples of importing packages.

In [None]:
import numpy as np
import pandas as pd
import scipy
from matplotlib import pyplot as plt

Wait, what happened with that last line?!

Matplotlib is a plotting package. But I didn't import all of Matplotlib, I only imported one *module* from Matplotlib called pyplot. This would be like me writing a sentence using only proper nouns. I could use any type of noun, but I'm only using the proper noun module in my sentence. Another way to write this import statement is below:

In [None]:
import matplotlib.pyplot as plt

Both examples do the same thing, so it totally depends on which you prefer!

Sometimes you will see "import * " statement. This statement imports all the functions and classes from the selected package. This is actually a *bad* habit and you should not use this form of import statement. Instead, specify which modules you want explicitly or just import the entire package. But it is good to know what other people might do in their code.

In [None]:
from matplotlib import * # bad code!

## Data types

Python has many different types of data. We've already talked about one, a list. And we've seen another, an integer. In the code below, I show several common data types that you will see in your research.

### Part 1: str, int, float

In [None]:
# strings: used to represent sequences of characters, essentially textual data. 
# strings are enclosed within either single quotes '' or double quotes "".
str1 = 'apple'
str2 = 'PM2.5'
str3 = 'On Wednesdays, we wear pink'

# integers: is a whole number, positive, negative, or zero, without any fractional or decimal component.
int1 = 12
int2 = 100043
int3 = -17

# floats (floating-point number): is a numerical data type used to represent real numbers, 
# floats include a fractional or decimal part. 
float1 = 0.0
float2 = 19.807
float3 = 2.4e-1

If you want to see what these different data types look like in Python form, you can use a print statement. Try it out below by changing what is in the print statement. You can also see what kind of object you have by using the type() command. 

In [None]:
print(str1)
print(type(str1))

In [None]:
print(int1)
print(type(int1))

What if we want to change data type? We can do this for some variables and not for others. For example, we can always turn an int or float into a str. But we can't always turn a str into an int or float. This is because not all str are numbers. Try it out below.

In [None]:
print(int1)

str4 = str(int1)
print(str4)
print(type(str4))

Note here: although print(int1) and print(str4) show the same result of 12, the data type are different. You can use int1 for mathematic calculation, but you will not be able to use str4 for any calculations because str4 is a string type - basically, it is a word, not a number! See examples below:

In [None]:
result = int1 + 1
print(result)

In [None]:
result = str4 + 1
print(result)

This error message is saying the data type of str4 is not a number. Be careful when you have printed it as 12, it is actually word "12", not number 12.

In [None]:
int4 = int(str1)

This error message is saying we can't turn str1 into an int type because "apple" doesn't fit in any base 10 number. 

In [None]:
float4 = float(int1)
print(float4)

What happened in that last cell? We turned an int, a whole number, into a float, a floating-point number. Basically, we added a decimal place. 

Ok, so str, int, and floats make sense. What about lists or dicts? Those are really useful for doing repeat calculations or making tables and spreadsheets. We can use Numpy, Scipy, or pandas for all these actions!

### Part 2: list

In [None]:
# lists: can contain elements of different data types 
list1 = ['apples','bananas','oranges']
list2 = np.arange(15) # we already imported numpy in this notebook, so we don't need to do it again!
list3 = [16.0,12.0,23.7,18.2]
list4 = ['apples', 2]

Use the print statement below to see what a list looks like.

In [None]:
print(list1)

If you want to see how long your list is without printing it and counting each element, you can use the len() command.

In [None]:
print(len(list2))

Each thing in a list is called an element. Each element in a list has an index, or location in the list. To select certain elements from the list, you can use list_name[location] for one element and list_name[start:end] for a slice of elements:

In [None]:
# to get the first element of list1:
list1[0]

**Note:** In Python, the index of the first element is 0. If you have experience using Matlab, the first element is usually 1, so pay attemtion here. It is easy to mess up if you are used to the rule of another programming language.

In [None]:
# to get the first three element of a list:
# **.
list3[0:3]

**Note:** the element at the end index is not included in the resulting slice.

In [None]:
# to get the last element of a list:
list2[-1]

### Part 3: dict
We can add lists together with labels (called keys) to make a dict. In a dictionary, the key is the column header and the data are values. They go together in a {key:value} pair. You can think of this like a table where the key is the column label and the values are the entries in that column. Values can be anything, any type of data, and they don't have to be the same in a {key:value} pair. 

You can tell something is a dict and not a list because curly brackets are used { }. There is also always a colon : between the key and the values. 

In [None]:
# dictionaries
dict1 = {'column1':[1,2,4,8,16,32]}
dict2 = {'Numbers':[1,2,3,4],'Fruit':['apples','bananas','oranges','lemons'],'Randoms':list3}
print(dict1)

Ok, let's break down the printed dict a little more. The curly brackets { } tell us everything we printed is a dict. Inside that dict, we have a str that is the key. Then we have a colon : that tells us the values for the key are coming up. Next, we have straight brackets [ ] surrounding the values. The straight brackets indicate a list, and the values are elements in that list. 

If you want to know what your keys or values are without printing the whole dict, you can call these items.

In [None]:
# The keys in dict1:
print(dict1.keys())

# The corresponding values:
print(dict1.values())

Easy, right? Now, what if you want to know the values of a certain key? Try this statement: dict_name['key_name']:

In [None]:
print(dict1['column1'])

Since the values in our dict are elements of a list, we can call a specific element using its index. But first we must indicate which column of data we want by using the key. 

In [None]:
print(dict1['column1'][1])
# this code is saying find the 2nd element in column1 of dict1, remember indexing starts at 0

Now that we understand all the {key:value} pairs, we can use pandas (a Python package) to do fancy stuff to our dict.

### Part 4: DataFrames


Let's make a table using pandas! (User Guide of pandas can be found here: https://pandas.pydata.org/docs/user_guide/index.html)

In [None]:
df1 = pd.DataFrame(data=dict1) # we already imported pandas, so we don't need to do it again
print(df1)

df1 is our first table! In pandas, these are called DataFrames. We can make any DataFrame using the pd.DataFrame command and inputing a dictionary for the data argument. The list of numbers on the left side are the indexes. The default, if you don't specify the index when you make the DataFrame, is to number the rows starting at 0. We can change the column names or the indexes using pandas commands. 

In [None]:
df1 = pd.DataFrame(data=dict2)
print(df1)

In [None]:
df2 = df1.set_index('Numbers')
print(df2)

Check in!! Yes, in the example above, the first column of df1, named "Numbers", became the index of df2. But let's break this down a bit. 

First, we made a new DataFrame called df2. We said df2 is equal to df1 BUT changing the indexes for the values in  column 1. This also means that the index label is the key from column 1 in df1. By making column 1 our index, we removed this column from the table. Now there are only two {key:value} data pairs in the DataFrame. 

When we say there are only two {key:value} pairs in the DataFrame, we mean that Numbers is no longer a key! It has become the index label. We can still call the Fruit and Randoms keys, but we cannot treat Numbers the same way.

In [None]:
print(df2['Fruit'])

In [None]:
print(df2['Numbers'])

This long error message ends with "<font color=red>KeyError</font>: 'Numbers'". This is saying 'Numbers' is not a key, so you can't use it to select those values. Instead, we must call the DataFrame indexes.

In [None]:
print(df2.index)

Now let's change the column labels!

In [None]:
df3 = df1.rename(columns={'Numbers':'Friday','Fruit':'Monday','Randoms':'Thursday'})
print(df3)

In the example above, we renamed all our columns using the rename command. We started with df1, which had 3 columns. Columns 1, 2, and 3 of df1 now have new names: Friday, Monday, Thursday. 

We'll explore pandas in more detail in a later module. Hopefully, though, you've seen why lists and dicts are important data types!

### Bonus: Operators

In [None]:
# Example reference: https://www.w3schools.com/python/python_operators.asp

######### Assignment Operators #########
# assign value to a variable using "="
x = 3

## Try uncommenting each line one at a time, and print the result to see what it does ##
# x += 2  # x = x + 2
# x -= 2  # x = x - 2
# x *= 2  # x = x * 2
# x /= 2  # x = x / 2
print(x)

In [None]:
######### Arithmetic Operators #########
x = 3
y = 2

## Try uncommenting each line one at a time, and print the result to see what it does ##
# addition
# z = x + y

# subtraction
# z = x - y 

# multiplication
# z = x * y

# division
# z = x / y

# modulus: the remainder after dividing one number by another
# z = x % y 

# exponentiation: multiplying the base n times (n: the exponent or power)
# z = x ** y

# floor division: divides two numbers and rounds the result down to the nearest whole number (or integer)
# z = x // y

print(z)

In [None]:
######### Comparison Operators (Return to boolean) ######### 
x = 3
y = 2

## Try uncommenting each line one at a time, and print the result to see what it does ## 
# equal
# x == y 

# not equal
# x != y   

# greater than
# x > y

# less than
# x < y

# greater than or equal to
# x >= y

# less than or equal to
# x <= y

In [None]:
######### Logical Operators ######### 
x = 3
y = 2
l = [1, 2, 3]

## Try uncommenting each line one at a time, and print the result to see what it does ## 
# returns True if both statements are true
# x < 5 and x < 2

# returns True if one of the statements is true
# x < 5 or x < 2

# reverse the result, returns False if the result is true
# not(x < 5 and x < 2)

######### Identity Operators ######### 
# returns True if both variables are the same object
# x is y

# returns True if both variables are not the same object
# x is not y

######### Membership Operators ######### 
# returns True if a sequence with the specified value is present in the object
# x in l

# returns True if a sequence with the specified value is not present in the object
# x not in l

### Bonus: array
Elements must be of the same data type;
Arrays are particularly optimized for numerical operations and are more memory-efficient when storing large collections of data of the same type.

In [None]:
import array as arr
a = arr.array('i', [1, 2, 3]) # 'i' define typecodes of a, which is signed int
print('a: ', a)               # Type code: https://docs.python.org/3/library/array.html

# access the elements of an array
## Try uncommenting each line one at a time, and print the result to see what it does ## 
# print(a[0]) # !!! index starts from 0
# print(a[-1]) # the last one of the variable
# print(a[1:3]) # second to third element of the variable
# print(a[:2])  # first to second position
# print(a[1:])  # second to last

**Note:** When you access a specific index, like a[0], Python retrieves just that integer value. When you use slicing, like a[1:3], it returns a new array object containing the sliced elements, maintaining its type code and array structure.

In [None]:
# search for the index (location) of the specific value 
print(a.index(3))

In [None]:
# length of an array
print(len(a))

In [None]:
# looping array elements
for i in range(0, len(a)):
    print(a[i])

In [None]:
# change the value of an item in an array
a[0] = 5
a

In [None]:
# add a new value to an array
a.append(8)
#a.append(8.0)  #Try uncommenting this line and commenting out the one above it to see what you get
a

In [None]:
# Remove a value from an array
a.remove(2)
a

### Exercises
Ok, now that you've been introduced to the basics of Python, you should put your skills to the test! Below are a few code cells with some different types of data. First, read through the cells and try to understand what they are doing. The # signs are comment lines (code that doesn't run) and they should help clarify any issues. Second, try to edit the code to make your own data. You can start with one simple change and then progress to larger changes if you feel comfortable!

In [None]:
# import the packages we want
import numpy as np
import pandas as pd

In [None]:
# define our variables
x = np.arange(15)
y = x * 17 + (x - 12)
z = np.sin(x)

In [None]:
# print the data 
print(x)
print(y)
print(z)

In [None]:
# use the data to make a DataFrame
df = pd.DataFrame(data={'y':y,'z':z}, index=x)
df

**Knowledge Check** What are the data types of x, y, and z? How do you know?

In [None]:
# make a new list
# the \ in line 3 means I can continue the list on the next line
cities = ['Boston','New York City','Newark','Baltimore','D.C.','Atlanta','Chicago','Kansas City',\
          'St. Louis','Minneapolis','Salt Lake City','Las Vegas','Houston','Dallas','Los Angeles']
print(len(cities))

In [None]:
# add this new list to the DataFrame
df['cities'] = cities

In [None]:
# let's change the index
df2 = df.set_index('cities')
df2

In [None]:
# maybe try some math out
# can you figure out what the ** means?
a = y - z**2
print(a)

**Answer:** The ** means take the exponent, or ${z^2}$.

In [None]:
# let's change the data type
b = y.astype(float)
print(b)
c = z.astype(str)
print(c)

In [None]:
d = b - c**2

**Knowledge Check** Why did you get an error message on the previous cell? How could you fix this problem?