# Setup

You need to install `miniconda` which is a dependency manager for `python`. Go to this link to download the latest version: https://docs.anaconda.com/free/miniconda/

As as developer, it is common to use an Interactive Development Environment (IDE) to develop your code. We will use Microsoft VScode which you can download here: https://code.visualstudio.com/

# Programming Environment

Python is a general purpose programming language which can also be used for statistics and machine learning. The language is open sourced and free for personal, academic and commerical use. It is available on multiple platforms including Windows, Linux and MacOS. In recent years, Python has gained popularity and become one of the fastest growing programming languages in the world. In
this chapter, we will go through the basics of Python.

Visual Studio Code (VS Code) is a popular Integrated Development Environment (IDE) for the many programming languages. It is a very powerful editor which enables programmers to interact with the Python language. It is also free for personal and commercial use.

## Packages

Functionality in Python can be extended through packages. Most packages are opensourced with a few exceptions. They are published on community-maintained repositories such as [PyPi](https://pypi.org).

Python packages already installed can be listed using the terminal command `pip freeze`. This can be executed in VS Code using the maginc command box. It starts with the `%` symbol followed by your terminal command:

In [None]:
# List all existing packages
%pip freeze

To install new packages, simply use the `pip install` command. The following code chunk will install the `pandas` package which handles data frame manipulation in Python.

In [None]:
# Install the pandas package
%pip install pandas

# Programming Concepts

In Python, values ares tored in variables. Users can assign value to variable using the `=` symbol. The `#` character is used for commenting.

When creating a new variable, it is important to avoid using reserved words. A list of reserved words can be found here: https://www.w3schools.com/python/python_ref_keywords.asp

In [None]:
# This is a line of comment
a = 5
b = 20
c = a * b
print(c)

Variables can be defined using different cases, as shown below. Different notations usually signify the scope (ie. local vs global) of the variables.

In [None]:
# These are commonly used to denote local variables
my_snake_case_var = "This is snake case"
myCamalCaseVar = "This is camal case"
MyPascalCaseVar = "This is pascal case"

# This usually denotes global variables
MY_UPPER_CASE_VAR = "This is upper case"

The function `myFunc()` below adds an arbitrary value to the input and returns the result. The variables `num_to_add` and `result` have private scope which means they are only accessible within the function.

In [None]:
def myFunc(x):
    num_to_add = 10
    result = x + num_to_add
    return result 

In [None]:
myValue = 3
myResult = myFunc(myValue)
print(myResult)

The variable `num_to_add` is defined within the function `myFunc()`, therefore it is inaccessible outside of it. The following code chunk will return an exception.

In [None]:
# print(num_to_add)

## Data Structures



## List and Subsetting
Variable can be in the form of a list if element are in the same type. Users can subset the list by index number.

In [None]:
# Create a list
myList1 = [1,2,3,4,5]

# Print the list length
len(myList1)

In [None]:
# Sum the value of a list
sum(myList1)

In [None]:
# Maximum value of the list
max(myList1)

In [None]:
# Minumum value of the list
min(myList1)

In [None]:
# Arithmetic mean of the list
sum(myList1) / len(myList1)

Index numbers start count at zero. A list containing five elements would have indexes \(0, 1, 2, 3, 4\).

In [None]:
# Print our the element at index 3
myList1[3]

It is possible to loop through a list and perform some logic with it. See code chunk below:

In [None]:
# Using a for loop
for i in myList1:
    print(f"The value is {i}.")

In [None]:
# Using inline format to create a new list variable
myList2 = [x*2 for x in myList1]
print(myList2)

A range can also be iterated through a for loop and perform similar operations.

In [None]:
myList3 = range(10,20,2)
for x in myList3:
    print(x)

## Datetime

In [None]:
import datetime
myDate = datetime.datetime.fromisoformat("2024-05-15")
myDate

In [None]:
myNewDates = [myDate + datetime.timedelta(days=x) for x in myList1]
myNewDates

Users can perform datetime operations like below:

In [None]:
# Calculate the day of week of this date value
# Return the day of the week as an integer, where Monday is 0 and Sunday is 6.
myNewDates[2].weekday()

In [None]:
myNewDates[2] + datetime.timedelta(hours=5)

## Logical Operators

In [None]:
myVar1 = 25
myVar2 = 30

In [None]:
# Returns True if value is greater than 5
myVar1 > 5

In [None]:
# Returns True if value is equal to 25
myVar1 == 25

In [None]:
# Returns True if both conditions are satified
myVar1 > 20 & myVar2 < 35

In [None]:
# Returns True if at least one condition is satified
myVar1 > 20 | myVar2 < 20

## Dictonary

A dictionary can be used to store key-valyue-pairs of different types. Example below:

In [None]:
myPerson = {
    "name": "John Doe",
    "dob": datetime.datetime.fromisoformat("1980-06-20"),
    "account_balance": 387.59,
}
print(myPerson)

In [None]:
# Subset an element by name
myPerson["account_balance"]

## If-Else Loops

An `if-else` loop can be used to perform operations based on conditions.

In [None]:
if myPerson["account_balance"] > 100:
    print("Full of cash")
else:
    print("Insufficient money")

The `if` block can be placed within another loop to form complex logic. The code chunk below creates a new list from an existing list.

In [None]:
myList4 = []

for x in myList1:
    if x > 2:
        text = f"Number is {x}"
        myList4.append(text)

print(myList4)

Inline `if-else` loop can be used instead. The following code chunk is rquivalent to the one above.

In [None]:
myList4 = [f"Number is {x}" for x in myList1 if x > 2]
print(myList4)

## Data Frame

A data frame is a list of variables of the same number of rows with unique column names. In many cases, datasets extracted from CSV file or SQL server are returned as a data frame object.

In [None]:
import pandas as pd

df = pd.DataFrame({
    "title": [
        "Dr. No",
		"Goldfinger",
        "Diamonds are Forever",
        "Moonraker",
        "The Living Daylights",
        "GoldenEye",
        "Casino Royale"],
    "year": [
        1962, 1964, 1971, 1979, 1987, 1995, 2006],
    "box": [59.5, 125, 120, 210.3, 191.2, 355, 599],
    "bondActor": [
        "Sean Connery",
        "Sean Connery",
        "Sean Connery",
        "Roger Moore",
        "Timothy Dalton",
        "Pierce Brosnan",
        "Daniel Craig"]
})

df

Use the code chunk below to add a new row to an existing data frame.

In [None]:
newRow = pd.DataFrame({
    "title": ["Spectre"],
    "year": [2015],
    "box": [880.7],
    "bondActor": ["Daniel Craig"]
})

df = pd.concat([df, newRow], ignore_index=True)

df

In [None]:
# Subset a column by name
df["bondActor"]

In [None]:
# Computes Boolean flag  where row value is equal to a certain value
df["bondActor"] == "Sean Connery"

In [None]:
# Subset the dataframe by filtering
df[df["bondActor"] == "Sean Connery"]

Optionally, the dataframe can be subsetted by IDs using `df.iloc[:,:]` THe first value is the row index and second one is the column index. The `:` symbol indicates 'take everything'.

In [None]:
# Subset by index
df.iloc[5,:]

## Lab Exercise

Write your own code to answer the questions below.

In [None]:
# What is the total box office of all films made by Sean Connery?
# TODO: Add code in this box



In [None]:
# How many films were made by Daniel Craig?
# TODO: Add code in this box


In [None]:
# Which film has the highest box office?
# TODO: Add code in this box
