# Setup

You need to install `miniconda` which is a dependency manager for `python`. Go to this link to download the latest version: https://docs.anaconda.com/free/miniconda/

As as developer, it is common to use an Interactive Development Environment (IDE) to develop your code. We will use Microsoft VScode which you can download here: https://code.visualstudio.com/

# Programming Environment

Python is a general purpose programming language which can also be used for statistics and machine learning. The language is open sourced and free for personal, academic and commerical use. It is available on multiple platforms including Windows, Linux and MacOS. In recent years, Python has gained popularity and become one of the fastest growing programming languages in the world. In
this chapter, we will go through the basics of Python.

Visual Studio Code (VS Code) is a popular Integrated Development Environment (IDE) for the many programming languages. It is a very powerful editor which enables programmers to interact with the Python language. It is also free for personal and commercial use.

## Packages

Functionality in Python can be extended through packages. Most packages are opensourced with a few exceptions. They are published on community-maintained repositories such as [PyPi](https://pypi.org).

Python packages already installed can be listed using the terminal command `pip freeze`. This can be executed in VS Code using the maginc command box. It starts with the `%` symbol followed by your terminal command:

In [1]:
# List all existing packages
%pip freeze

asttokens==2.4.1
colorama==0.4.6
comm==0.2.2
contourpy==1.2.1
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
exceptiongroup==1.2.1
executing==2.0.1
fonttools==4.52.4
ipykernel==6.29.4
ipython==8.24.0
jedi==0.19.1
joblib==1.4.2
jupyter_client==8.6.2
jupyter_core==5.7.2
kiwisolver==1.4.5
matplotlib==3.9.0
matplotlib-inline==0.1.7
nest-asyncio==1.6.0
numpy==1.26.4
packaging==24.0
pandas==2.2.2
parso==0.8.4
pillow==10.3.0
platformdirs==4.2.2
prompt_toolkit==3.0.45
psutil==5.9.8
pure-eval==0.2.2
Pygments==2.18.0
pyparsing==3.1.2
python-dateutil==2.9.0.post0
pytz==2024.1
pywin32==306
pyzmq==26.0.3
scikit-learn==1.5.0
scipy==1.13.1
six==1.16.0
stack-data==0.6.3
threadpoolctl==3.5.0
tornado==6.4
traitlets==5.14.3
typing_extensions==4.12.0
tzdata==2024.1
wcwidth==0.2.13
Note: you may need to restart the kernel to use updated packages.


To install new packages, simply use the `pip install` command. The following code chunk will install the `pandas` package which handles data frame manipulation in Python.

In [2]:
# Install the pandas package
%pip install pandas

Note: you may need to restart the kernel to use updated packages.


# Programming Concepts

In Python, values ares tored in variables. Users can assign value to variable using the `=` symbol. The `#` character is used for commenting.

When creating a new variable, it is important to avoid using reserved words. A list of reserved words can be found here: https://www.w3schools.com/python/python_ref_keywords.asp

In [3]:
# This is a line of comment
a = 5
b = 20
c = a * b
print(c)

100


Variables can be defined using different cases, as shown below. Different notations usually signify the scope (ie. local vs global) of the variables.

In [4]:
# These are commonly used to denote local variables
my_snake_case_var = "This is snake case"
myCamalCaseVar = "This is camal case"
MyPascalCaseVar = "This is pascal case"

# This usually denotes global variables
MY_UPPER_CASE_VAR = "This is upper case"

## Data Structures



## List and Subsetting
Variable can be in the form of a list if element are in the same type. Users can subset the list by index number.

In [5]:
# Create a list
myList1 = [1, 2, 3, 4, 5]

# Print the list length
len(myList1)

5

In [6]:
# Sum the value of a list
sum(myList1)

15

In [7]:
# Maximum value of the list
max(myList1)

5

In [8]:
# Minumum value of the list
min(myList1)

1

In [9]:
# Arithmetic mean of the list
sum(myList1) / len(myList1)

3.0

Index numbers start count at zero. A list containing five elements would have indexes \(0, 1, 2, 3, 4\).

In [10]:
# Print our the element at index 3
myList1[3]

4

It is possible to loop through a list and perform some logic with it. See code chunk below:

In [11]:
# Using a for loop
for i in myList1:
    print(f"The value is {i}.")

The value is 1.
The value is 2.
The value is 3.
The value is 4.
The value is 5.


In [12]:
# Using inline format to create a new list variable
myList2 = [x*2 for x in myList1]
print(myList2)

[2, 4, 6, 8, 10]


A range can also be iterated through a for loop and perform similar operations.

In [13]:
myList3 = range(10,20,2)
for x in myList3:
    print(x)

10
12
14
16
18


## Datetime

In [14]:
import datetime
myDate = datetime.datetime.fromisoformat("2024-05-15")
myDate

datetime.datetime(2024, 5, 15, 0, 0)

In [15]:
myNewDates = [myDate + datetime.timedelta(days=x) for x in myList1]
myNewDates

[datetime.datetime(2024, 5, 16, 0, 0),
 datetime.datetime(2024, 5, 17, 0, 0),
 datetime.datetime(2024, 5, 18, 0, 0),
 datetime.datetime(2024, 5, 19, 0, 0),
 datetime.datetime(2024, 5, 20, 0, 0)]

Users can perform datetime operations like below:

In [16]:
# Calculate the day of week of this date value
# Return the day of the week as an integer, where Monday is 0 and Sunday is 6.
myNewDates[2].weekday()

5

In [17]:
myNewDates[2] + datetime.timedelta(hours=5)

datetime.datetime(2024, 5, 18, 5, 0)

## Logical Operators

In [18]:
myVar1 = 25
myVar2 = 30

In [19]:
# Returns True if value is greater than 5
myVar1 > 5

True

In [20]:
# Returns True if value is equal to 25
myVar1 == 25

True

In [21]:
# Returns True if both conditions are satified
myVar1 > 20 & myVar2 < 35

True

In [22]:
# Returns True if at least one condition is satified
myVar1 > 20 | myVar2 < 20

False

## Dictonary

A dictionary can be used to store key-valyue-pairs of different types. Example below:

In [23]:
myPerson = {
    "name": "John Doe",
    "dob": datetime.datetime.fromisoformat("1980-06-20"),
    "account_balance": 387.59,
}
print(myPerson)

{'name': 'John Doe', 'dob': datetime.datetime(1980, 6, 20, 0, 0), 'account_balance': 387.59}


In [24]:
# Subset an element by name
myPerson["account_balance"]

387.59

## If-Else Loops

An `if-else` loop can be used to perform operations based on conditions.

In [25]:
if myPerson["account_balance"] > 100:
    print("Full of cash")
else:
    print("Insufficient money")

Full of cash


The `if` block can be placed within another loop to form complex logic. The code chunk below creates a new list from an existing list.

In [26]:
myList4 = []

for x in myList1:
    if x > 2:
        text = f"Number is {x}"
        myList4.append(text)

print(myList4)

['Number is 3', 'Number is 4', 'Number is 5']


Inline `if-else` loop can be used instead. The following code chunk is rquivalent to the one above.

In [27]:
myList4 = [f"Number is {x}" for x in myList1 if x > 2]
print(myList4)

['Number is 3', 'Number is 4', 'Number is 5']


## Data Frame

A data frame is a list of variables of the same number of rows with unique column names. In many cases, datasets extracted from CSV file or SQL server are returned as a data frame object.

In [28]:
import pandas as pd

df = pd.DataFrame({
    "title": [
        "Dr. No",
		"Goldfinger",
        "Diamonds are Forever",
        "Moonraker",
        "The Living Daylights",
        "GoldenEye",
        "Casino Royale"],
    "year": [
        1962, 1964, 1971, 1979, 1987, 1995, 2006],
    "box": [59.5, 125, 120, 210.3, 191.2, 355, 599],
    "bondActor": [
        "Sean Connery",
        "Sean Connery",
        "Sean Connery",
        "Roger Moore",
        "Timothy Dalton",
        "Pierce Brosnan",
        "Daniel Craig"]
})

df

Unnamed: 0,title,year,box,bondActor
0,Dr. No,1962,59.5,Sean Connery
1,Goldfinger,1964,125.0,Sean Connery
2,Diamonds are Forever,1971,120.0,Sean Connery
3,Moonraker,1979,210.3,Roger Moore
4,The Living Daylights,1987,191.2,Timothy Dalton
5,GoldenEye,1995,355.0,Pierce Brosnan
6,Casino Royale,2006,599.0,Daniel Craig


Use the code chunk below to add a new row to an existing data frame.

In [29]:
newRow = pd.DataFrame({
    "title": ["Spectre"],
    "year": [2015],
    "box": [880.7],
    "bondActor": ["Daniel Craig"]
})

df = pd.concat(
    [df, newRow], 
    ignore_index=True
)

df

Unnamed: 0,title,year,box,bondActor
0,Dr. No,1962,59.5,Sean Connery
1,Goldfinger,1964,125.0,Sean Connery
2,Diamonds are Forever,1971,120.0,Sean Connery
3,Moonraker,1979,210.3,Roger Moore
4,The Living Daylights,1987,191.2,Timothy Dalton
5,GoldenEye,1995,355.0,Pierce Brosnan
6,Casino Royale,2006,599.0,Daniel Craig
7,Spectre,2015,880.7,Daniel Craig


In [30]:
# Subset a column by name
df["bondActor"]

0      Sean Connery
1      Sean Connery
2      Sean Connery
3       Roger Moore
4    Timothy Dalton
5    Pierce Brosnan
6      Daniel Craig
7      Daniel Craig
Name: bondActor, dtype: object

In [31]:
# Computes Boolean flag  where row value is equal to a certain value
df["bondActor"] == "Sean Connery"

0     True
1     True
2     True
3    False
4    False
5    False
6    False
7    False
Name: bondActor, dtype: bool

In [32]:
# Subset the dataframe by filtering
df[df["bondActor"] == "Sean Connery"]

Unnamed: 0,title,year,box,bondActor
0,Dr. No,1962,59.5,Sean Connery
1,Goldfinger,1964,125.0,Sean Connery
2,Diamonds are Forever,1971,120.0,Sean Connery


Optionally, the dataframe can be subsetted by IDs using `df.iloc[:,:]` THe first value is the row index and second one is the column index. The `:` symbol indicates 'take everything'.

In [33]:
# Subset by index
df.iloc[5,:]

title             GoldenEye
year                   1995
box                   355.0
bondActor    Pierce Brosnan
Name: 5, dtype: object

## Function

User can create custom functions in Python. 

In [34]:
# Defines a custom function
def isOdd(x):
    # Modulo operator runs the divide operation and returns the remainder
    remainder = x % 2
    # If a number divide by 2 gives remainder 1, then it is an odd number
    equalToOne = remainder == 1
    return equalToOne

isOdd(5)

True

A function can be applied through each element of aa list. The following code chunk uses inline syntax:

In [35]:
[isOdd(x) for x in myList1]

[True, False, True, False, True]

The function `addTen()` below adds an arbitrary value to the input and returns the result. The variables `num_to_add` and `result` have private scope which means they are only accessible within the function.

In [36]:
def addTen(x):
    num_to_add = 10
    result = x + num_to_add
    return result 

In [37]:
myValue = 3
myResult = addTen(myValue)
print(myResult)

13


The variable `num_to_add` is defined within the function `addTen()`, therefore it is inaccessible outside of it. The following code chunk will return an exception.

In [38]:
# Uncomment below line to execute. Exception expected
# print(num_to_add)

## Lab Exercise

Write your own code to answer the questions below.

In [39]:
# What is the total box office of all films made by Sean Connery?
# TODO: Add code in this box



In [40]:
# How many films were made by Daniel Craig?
# TODO: Add code in this box


In [41]:
# Which film has the highest box office?
# TODO: Add code in this box
