## Contents
1. Introduction to Programming
2. Introduction to Python
3. Traditional Data Representation
4. Data Representation in Python

## Section 1: Introduction to Programming

### What is programming?
Right now, you are working inside of an **integrated development environment**, or IDE. This is a program that allows you to write, run, and test code. Think of each line of code that you write as an instruction to the computer. When you run the code, the computer will execute each instruction in the order in which it is written.

### How do we write code?
Every programming language has a set of rules that dictate how code should be written. These rules are called **syntax**. If you don't follow the syntax of a language, the computer will not be able to understand your code. Python is known for having a very simple and easy-to-understand syntax, which is why it is often recommended as a first language for beginners. It is also very powerful and versatile, which is why it is used by many professional developers, particularly in the interdisciplinary fields of data science and machine learning.

### How can you get started?
This notebook will introduce you to the basics of programming and Python. You will learn how to write and run code, and you will also learn about the different types of data that you can work with in Python. By the end of this notebook, you will be able to write simple programs that can perform calculations and manipulate data. One critical thing to always remember while programming is that **you are in control**. You are the one who decides what the computer should do, and you are the one who tells it how to do it. The computer will only do what you tell it to do, so it is important to be very clear and precise in your instructions. This is also important for debugging purposes.

## Section 2: Introduction to Python

Now that you have a high-level understanding of programming, let's dive into Python. As mentioned above, Python is a very popular programming language that is known for its simplicity and versatility. However, this does not mean it is *easy*. Python is a powerful tool and there are many intracacies and best practices you must learn if you wis to become proficient. Given that the purpose of this course (and our club) is to help you build the skills necessary for data science, we will be skipping many of the Python fundamentals that are not directly related to data science. However, if you are interested in learning more about Python, there are many resources available online, including this free course from [Harvard University](https://www.edx.org/certificates/professional-certificate/harvardx-computer-science-for-python-programming).

### Cell Types and Comments

In Jupyter notebooks, there are two types of cells: Markdown and Python. Right now, we're in a Markdown cell, so all of the features of Markdown editing work here, including **bolding**, *italicizing*,
- bulleting, and
###### Headings.

In general, Python cells are used to write code and Markdown cells are used to describe the function of Python cells or introduce some additional information to the notebook.

In [1]:
# However, sometimes it can be useful to write additional information directly inside of a Python cell. 
# This is called commenting and is done by adding a # to the beginning of a line in a Python cell.
# Sometimes, multiline comments are useful which are marked by a triple quotation
"""
here 
is
an
example
"""

# Comments can also be used to "disable" lines of code. For example,
print("Hello, world!") # this executes
# print("Hello, world!") this does not execute

Hello, world!


It's also useful to know that Markdown cells cannot be run while Python cells must be run in order for them to "take effect." You can run a cell by clicking the "Play Arrow" or hitting Shift + Enter. Finally, please remember that when writing Python code outside of a Jupyter notebook, you must always call ```print()``` to write to the standard output. However, Jupyter will automatically display the last line of each Python cell if possible. Other than rare debugging cases, it is still considered best practice to invoke ```print()``` whenever you want to display a result.

In [2]:
# In normal Python, if I have variable name that stores a string, I'd have to write print(name)
name = "John Doe"
print(name)

John Doe


In [3]:
# But in Jupyter, I can simply write the variable as the last line in my cell
name

'John Doe'

### Writing Code

Once you know the syntax of a given language, writing code is very simple. All you need to do is translate what you want to happen into syntactically valid instructions. Sometimes, syntax can be very complex. Thankfully, Python's syntax is relatively easy to pick up. For example, if I want to display some text in the output of my cell, I can call the built-in ```print()``` function, as shown below:

In [4]:
print("Hello, world!")

Hello, world!


### Functions

Functions are perhaps the most powerful tool in all of programming because they enable you to write sets of customized instructions for your machine. There are many parts to a function, but let's start with the simplest: the arguments. Arguments may also be called parameters or inputs, but all of these refer to the values that you pass into your function. In other words, these are the operands of your function's operations. Let's use the ```print()``` function as an example.

In [2]:
# In the example above, our argument was the string "Hello, world!"
print("Hello, world!")

# But we can also print a variety of other things.
print(1)
print([1, 4, 5, 6])

Hello, world!
1
[1, 4, 5, 6]


In addition to arguments, functions also have definitions. Because ```print()``` is a built-in function, the Python interpreter already knows its definition so we don't need to write one. But let's say you want a function that can add two numbers together. Of course, you could just use the summation operator but let's assume we need to introduce a function for the purpose of this demonstration.

In [5]:
def add(a: int, b: int) -> int: # here, we name our function and assign variables to our parameters. it's not strictly necessary but we also specify the data types of our parameters and our return type
    # result: int = a + b
    # return result
    return a + b

# Question: Is the 'result' variable strictly necessary?

Now that we have defined our function, we can call it on any two parameters ```a``` and ```b```, so long as they are both the correct data type.

In [6]:
print(add(1, 2))

3


### Data Types

In an actual CS course, a significant amount of time is spent explaining various data types because it is useful for advanced programmers to have a solid foundation in these theoretical concepts. However, for our purposes, you'll only need the basics.

In [7]:
# Integers (int) are numbers
number: int = 3
print(type(number))

<class 'int'>


In [11]:
# Characters (chr) are individual letters/characters
character: chr = 'a'
print(type(character))

<class 'str'>


In [8]:
# As you can see, Python does not have a dedicated char type so instead we use strings (str)
string: str = "this is a sentence"
print(type(string))

<class 'str'>


In [13]:
# Arrays (list) are collections of values. Try to keep all values in an array the same data type or at least along one dimension.
array: list = [1, 2, 3, 4]
print(type(array))

<class 'list'>


In [9]:
# Sets (set) are arrays without duplicate elements
my_set: set = {1, 3, 4, 4}
print(type(my_set))
print(my_set)

<class 'set'>
{1, 3, 4}


In [10]:
# Dictionaries (dict) are sets of keys and values
dictionary: dict = {'a': 1, 'b': 2, 'c': 3}
print(type(dictionary))

<class 'dict'>


In [11]:
# Booleans (bool) are logic values (either True or False)
boolean: bool = True
print(type(boolean))

# Question: What will happen if I name a variable 'bool'?

<class 'bool'>


### Conditional Logic

Sometimes it is useful to check if a condition is true before executing a line or multiple lines of code.

In [16]:
login: bool = False
if login: 
    print("Access Granted!")
elif not login:
    print("Login failed")
else:
    print("This will never execute")

# Question: Why will line 7 never execute?

Login failed


### Loops

Some data types are iterable, meaning that you can extract one element at a type and analyze it or call some function on it. The way that we perform these iterations are through loops.

In [13]:
# A for loop will iterate over a specific iterable and look at each element/value
array: list = [1, 2, 3, 4]
for element in array:
    print(element)

# Question: What is the difference between the code above and simply calling print(array)

1
2
3
4


In [14]:
# A while loop will iterate until a certain condition is met
i: int = 0
while i < 3:
    print(i)
    i += 1

# Question: Would the loop terminate if we moved line 2 after line 3?

0
1
2


### Exercises

1. Create a new cell and write your name to the standard output.
2. Store your name as a string in a variable called 'name.'
3. Write a function to reverse your name.
4. Call that function and print its output.

## Section 3: Traditional Data Representation

Download the IRIS dataset [here](https://www.kaggle.com/datasets/arshid/iris-flower-dataset) and the Churn dataset [here](https://www.kaggle.com/datasets/blastchar/telco-customer-churn).

## Section 4: Data Representation in Python

In [15]:
# Import the pandas package and alias it as pd and the numpy package as np
import pandas as pd
import numpy as np

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [16]:
# Read the csv file
IRIS = pd.read_csv("./data/IRIS.csv")
IRIS.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [17]:
# Check how many classes of flower we have
print(IRIS['species'].unique())

['Iris-setosa' 'Iris-versicolor' 'Iris-virginica']


In [23]:
# Count instances of each flower class
print(IRIS['species'].value_counts())

species
Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: count, dtype: int64


In [26]:
# Look only at virginicas
IRIS[IRIS['species'] == 'Iris-virginica'].head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
100,6.3,3.3,6.0,2.5,Iris-virginica
101,5.8,2.7,5.1,1.9,Iris-virginica
102,7.1,3.0,5.9,2.1,Iris-virginica
103,6.3,2.9,5.6,1.8,Iris-virginica
104,6.5,3.0,5.8,2.2,Iris-virginica


In [18]:
# Find the average petal length for all Virginicas
print(np.mean(IRIS[IRIS['species'] == 'Iris-virginica']['petal_length']))

5.5520000000000005


## Section 5: Exercies

1. Complete exercies 1-5 on Kaggle's [introduction to programming](https://www.kaggle.com/learn/intro-to-programming).
2. Complete exercises 1-7 on Kaggle's [introduction to Python](https://www.kaggle.com/learn/python).
3. Complete exercises 1 and 2 on Kaggle's [introduction to Pandas](https://www.kaggle.com/learn/pandas).