# Introduction to Python

To follow along and run the labs in this course, you will need two things:

* An installation of `Python3`, which is the specific version of `Python`  used in the labs.
* Access to  `Jupyter`, a very popular `Python` interface that runs code through a file called a *notebook*.

Unfortunately, currently there is not one stop solution to install both at the same time that is free for organisational use.

## Download instructions
1. Download the installer from [Python Downloads](https://www.python.org/downloads/).  
   Example version used: [Python 3.12.4](https://www.python.org/ftp/python/3.12.4/python-3.12.4-amd64.exe).

2. Run the `.exe` file as Administrator (“Als Administrator ausführen”).

3. During installation:

   - Check the boxes for **Admin privileges** and **Add python.exe to PATH**.
   - Choose **Custom Install** and uncheck the boxes:
     - `tcl/tk`
     - `Python test suite`
   - Check **Install Python 3.12 for all users** and **Precompile Standard Library**.

4. Select **Disable path length limit** and then complete the installation.

## 5. Restart

Restart your computer to apply all changes from the Python installations.

## 6. Python Libraries Installation

1. Open the Command Prompt as Administrator.
   - To do this, search for "Eingabeaufforderung" on the Windows search bar.
   - Right click on it. This will show you an option called “Als Administrator ausführen” which will allow you to open Command Prompt as Administrator.
2. Run the following commands to install required libraries:
   ```bash
   py -m pip install notebook pandas asyncio aiohttp tqdm
   ```
   Do this by copying this line and pasting it in the command prompt and then pressing enter. Please wait until the installation is completed.


 There are a number of ways to get access to `Jupyter`. Here are just a few:

 * Using Google's `Colaboratory` service: [colab.research.google.com/](https://colab.research.google.com/).
 * Using Kaggle: [kaggle.com](https://www.kaggle.com/)
 * Using `JupyterHub`, available at [jupyter.org/hub](https://jupyter.org/hub).
 * Using your own `jupyter` installation. Installation instructions are available at [jupyter.org/install](https://jupyter.org/install).
 * GWDG Jupyter instance https://jupyter-cloud.gwdg.de/

## Getting Started

### The Python programming language

Python is a high-level, general-purpose programming language with an emphasis on code readability. It has the following features:

* dynamically type-checked (type safety performed at runtime)
    * optional type annotations since version 3.5
* multi-paradigm
    * procedural (the procedure the primary focus, data is acted on)
    * object-oriented (the objects are the primary focus, data is encapsulated in objects with their functions)
    * functional (the functions always yield the same value for the same input)
* "batteries-included" (many features in the standard libraries)

The Zen of python can be found on [wikipedia](https://en.wikipedia.org/wiki/Zen_of_Python).

An example of Inheritance is below

![image](https://developer.ibm.com/developer/default/articles/the-class-diagram/images/bell_fig5.jpg)

## Hello World

One of the simplest (and most important!) tasks you can ask a computer to do is to print a message.
In Python, we ask a computer to print a message for us by writing `print()` and putting the message inside the parentheses and enclosed in quotation marks

In [None]:
print("Hello world!")

## Data Types

### Numeric
In Python, you can work with various numeric data types, including integers and floating-point numbers.

In [None]:
x = 5       # integer
y = 2.5     # floating-point number
print(type(x))
print(type(y))

### Strings
Strings are sequences of characters, and they can be defined using single or double quotes.

In [None]:
name = "Alice"
print(name)
message = 'Hello, World!'
print(message)

#### Converting variables between different data types


In [None]:
new_int = 4
print("Initially, we had new_int =", new_int, "of type", type(new_int))
new_int = float(new_int)
print("Now, we have new_int =", new_int, "of type", type(new_int))

**Note:**
Some conversions are not possible at all, and some might only work in specific cases
- For example, It's possible to cast from string to int, but only if all characters in the string are valid for an integer

In [None]:
print(int("37"))

In [None]:
print(int("Thirty-Seven"))

In [None]:
print(int("42.0"))

In [None]:
int(42.0)

### None
Signifies that a variable holds no value whatsoever, and that it has not yet a type such as `int` or `string`.

In [None]:
var = None
print(type(var))

### Boolean
Assumes a `True` or `False` value.

In [None]:
x = True
y = False
print(type(x))
print(type(y))

## Arithmetic operators
Some of the operations include: `+`, `-`, `/`, `*`, `//`, `**`, `%`

In [None]:
10 + 2

In [None]:
int_var = 5
float_var = 1.23
int_var + float_var

In [None]:
'sample' + 2

In [None]:
'sample' + str(2)

In [None]:
100 - 1

In [None]:
100 - (-1)

In [None]:
int_var - float_var

In [None]:
100 * - 4

In [None]:
'sample' * 2

In [None]:
15/4

**Note**: integer division (`//`) operation 11`//`2 and `int(11/2)` both give you the same result which is the integer before the decimal point
 the modulus operator `%` returns the remainder of a division

In [None]:
11//2

In [None]:
13%4

For exponentiation in Python we can use the `**` operator or `pow` function

In [None]:
2**3

In [None]:
pow(2,3)

In [None]:
16**0.5

Exercise 


![image.png](attachment:73039391-2530-46be-816c-11944882d197.png)

If a test identifies 80 individuals as positive for a condition (true positives) but misses 20 individuals who have the condition (false negatives), what is the sensitivity of the test?


## Logical Operators & more
Booleans are useful to any verification we peform. We will perform them all the time, especially when we are working with Branches, Loops, etc.

In [None]:
flag_a = True
flag_b = False

print(flag_a and flag_b)
print(flag_a or flag_b)
print(not(flag_a))
print(not(flag_b))

In [None]:
print(3 < 7)
print(73 == 73.0)
print(4 == '4')
print(1!=0)

Empty constructions are `False`.

In [None]:
print(bool([]))
print(bool(''))

Anything that holds a non-zero value or isn't empty is `True`.

In [None]:
print(bool(37))
print(bool("hello"))
print(bool(['a', 3, 2.0]))
print(bool([False]))

## Lists
Lists are **ordered**, **mutable** collections of items that can be of different data types.

Non-homogenous List

In [None]:
mess = ["flour", 100, True, "milk", 42.0]
print(mess)
print(type(mess))

In [None]:
days = ['Monday','Tuesday','Wednesday']
print(days)

In [None]:
# Checking membership
'Monday' in days

In [None]:
'Friday' in days

In [None]:
# indexing in Python starts with 0.
print(days[0])
print(days[2])

In [None]:
eclipse_dates = ['June 21, 2001', 'December 4, 2002', 'November 23, 2003',
                 'March 29, 2006', 'August 1, 2008', 'July 22, 2009',
                 'July 11, 2010', 'November 13, 2012', 'March 20, 2015',
                 'March 9, 2016']
print(eclipse_dates[-1])

In [None]:
# obtain the length of a list
len(eclipse_dates)

In [None]:
# Lists are mutable - can be modified
## The `append()` method is used to add a single element to the end of the list
days = ['Monday','Tuesday','Wednesday']
days.append('Friday')
print(days)

In [None]:
# The extend() method is used to append multiple elements to the end of a list
days.extend(['Saturday', 'Sunday'])
days

In [None]:
# The insert() method is used to add an element at a specified position within a list.
days.insert(3, 'Thurday')
days

### Slicing
Slicing is a technique used to extract portions of a list or a string by specifying a range of indices.
You can specify a `start` index (**inclusive**) and an `end` index (**exclusive**) to create a new list that contains a portion of the original list.

Homogenous List

In [None]:
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

days[1:3]

In [None]:
# default start value - 0
days[:5]

In [None]:
# default end value - length of the list
days[3:]

In [None]:
# ::step_size
days[::2]

In [None]:
days[::-1]

### Nested Lists
A list can contain lists itself. You can for example think of a matrix as a list of lists as in the following example.

In [None]:
matrix = [ [1,2] , [3,4] ]
matrix

In [None]:
matrix[0]

In [None]:
matrix[1][0]

The nested lists can be as deep as you need them. For example, you can construct a 3 dimensional matrix with nested lists.

In [None]:
tensor = [ [[1,2],[3,4]] ,[[5,6],[7,8]] , [[9,10],[11,12]] ]
tensor

## Dictionaries
Dictionaries are a versatile data structure in Python that store data as key-value pairs. Each key is associated with a specific value, and you can use the key to retrieve the corresponding value. Keys in a dictionary are unique

In [None]:
country_codes = {'Switzerland': 41, 'France': 33, 'Spain': 34,
                 'UK': 44, 'Germany': 49}
country_codes

In [None]:
country_codes['Germany']

Querying the contained information

In [None]:
country_codes.items()

In [None]:
country_codes.keys()

In [None]:
country_codes.values()

Adding new entries

In [None]:
country_codes['Italy'] = 39
country_codes['UK'] = '44'
print(country_codes)

In [None]:
nums = dict(one=1, two=2, three=3, four=4)
nums

In [None]:
del nums['two']

In [None]:
nums['two']

In [None]:
nums.get('two', 'NA')

## Tuples and Sets

Tuples are ordered, immutable collections in Python. Once you create a tuple, you cannot change its contents.

In [None]:
# Creating a tuple
coordinates = (3, 4)

# Accessing elements
x = coordinates[0]
y = coordinates[1]

print("X-coordinate:", x)
print("Y-coordinate:", y)

In [None]:
# tuples are immutable
coordinates[0] = 5

Sets are unordered collections of unique elements. They are used when you want to store multiple items and ensure uniqueness.

In [None]:
# Creating a set
my_set = {1, 2, 3, 3, 4, 4}

# Accessing elements (no specific index)
print("Set:", my_set)
print(3 in my_set)

In the above example, we create a set with unique elements, and duplicates are automatically removed.

## Conditional Statements
Conditional statements are used to execute different blocks of code based on specific conditions. The basic structure consists of an `if` statement, optionally followed by one or more `elif` (else if) statements, and an optional `else` statement.

In [None]:
threshold_value = 0.3
if threshold_value >= 0.5:
    print("The threshold value exceeds 0.5!")

In [None]:
if threshold_value >=0.5:
    print("The threshold value exceeds 0.5!")

else:
    print("Threshold was not met!")


In [None]:
grade = 85

if grade >= 90:
    result = "A"
elif grade >= 80:
    result = "B"
elif grade >= 70:
    result = "C"
else:
    result = "D"

print("Result:", result)

In [None]:
# What happens now?

grade = 95

if grade >= 70:
    result = "C"
elif grade >= 80:
    result = "B"
elif grade >= 90:
    result = "A"
else:
    result = "D"

print("Result:", result)

## Loops
There are two types of loops in Python, `for` and `while`.

A `while` loop is used to execute a block of code as long as a certain condition is met.

In [None]:
counter = 0

while counter < 10:
    counter = counter + 1

    print(counter)

print("\n", counter, "is the final value of counter.")

`for` loops iterate over a given sequence  such as a list, tuple, string, etc. Here is an example:

In [None]:
primes = [2, 3, 5, 7]
for prime in primes:
    print(prime)

In [None]:
for letter in 'string':
    print(letter)

In [None]:
cast = {
           "Jerry Seinfeld": "Jerry Seinfeld",
           "Julia Louis-Dreyfus": "Elaine Benes",
           "Jason Alexander": "George Costanza",
           "Michael Richards": "Cosmo Kramer"
       }

for key, value in cast.items():
    print(f"Actor: {key:20}\tRole: {value}")

For loops can iterate over a sequence of numbers using the `range()` function

In [None]:
# Prints out the numbers 0,1,2,3,4
for x in range(5):
    print(x)

print("--------------------------")

# Prints out 3,4,5
for x in range(3, 6):
    print(x)

print("--------------------------")

# Prints out 3,5,7
for x in range(3, 8, 2):
    print(x)

In [1]:
friends = ["Joey Tribbiani", "Monica Geller", "Chandler Bing", "Phoebe Buffay"]

for i in range(len(friends)):
    friends[i] = friends[i].lower().replace(' ', '_')
print(friends)

['joey_tribbiani', 'monica_geller', 'chandler_bing', 'phoebe_buffay']


In [2]:
[
    friends[i].lower().replace(' ', '_') for i in range(len(friends))
]


['joey_tribbiani', 'monica_geller', 'chandler_bing', 'phoebe_buffay']

## Debugging in Jupyter Notebook

In [None]:
# Debugging in using breakpoints
for x in range(5):
    print(x)


### `enumerate()` and `zip()`

The `enumerate()` and `zip()` functions are powerful tools for working with sequences in `for` loops.

The `enumerate()` function adds a counter to an iterable and returns an enumerate object, which you can use to iterate through the elements along with their index.


In [None]:
fruits = ["apple", "banana", "cherry"]

# Using enumerate to iterate with indices
for index, fruit in enumerate(fruits):
    print(f"Index {index}: {fruit}")

The `zip()` function is used to combine multiple sequences into a single iterable, creating pairs of corresponding elements.

In [None]:
constants = [2.71, 3.14, 1.61]
names = ['e','pi','phi']

for n, c in zip(names, constants):
    print(f"{n} ≈ {c}")

In [None]:
constants = [2.71, 3.14, 1.61]
names = ['e','pi','phi']
colours = ['green', 'red', 'blue', 'yellow']

for n, c, col in zip(names, constants, colours):
    print(n,c,col)

### List comprehension
A list comprehension is a concise way to perform a loop and store the result in a list.

In [None]:
numbers = [1, 2, 3, 4, 5]
squares = [x ** 2 for x in numbers]

print("Original List:", numbers)
print("Squares List:", squares)

In [None]:
[i**2 for i in range(4,10)]

## Functions

A function in Python is a reusable block of code that performs a specific task. Functions allow you to organize your code, make it more readable, and reduce repetition. Functions can take inputs, perform operations, and return outputs.

* Like most programming languages, Python uses functions to perform operations
* To run a function called fun, we type `fun(input1,input2)`, where the input (or arguments) `input1` and `input2` tell Python how to run the function.
* A function can have any number of inputs.

For example, the `print()` function outputs a text representation of all of its arguments to the console.

In [None]:
print('fit a model with', 11, 'variables')

## Functions
Functions are a convenient way to divide your code into useful blocks, allowing us to order our code, make it more readable, reuse it and save some time.

In [None]:
# Example of defining and using a function
def greet(name):
    """This function greets the person with the given name."""
    print("Hello, " + name + "!")

# Calling the function
greet("Alice")
greet("Bob")

Functions can accept parameters and return values. Parameters allow you to pass data into a function, and return values allow a function to provide a result.

In [None]:
def add_date(a_string, date):
    # Here begins the function body
    dated_string = a_string + '_' + date

    return dated_string

returned_str = add_date(a_string='experiment', date='05-11-19')
returned_str

In [None]:
def convert_temp(degrees_celsius = None, degrees_fahrenheit = None):
    '''
    This function converts degrees Celsius to degrees Fahrenheit and
    vice versa.

    degree_celsius: Input value in degrees Celsius to be converted to
                    degrees Fahrenheit.
    degree_fahrenheit: Input value in degrees Fahrenheit to be converted
                       to degrees Celsius.
    return: Temperature in the converted units.
    '''

    if degrees_celsius is not None:
        degrees_fahrenheit = degrees_celsius * 9/5 + 32
        print(f"{degrees_celsius} in °C are {degrees_fahrenheit} °F")
        return degrees_fahrenheit

    else:
        degrees_celsius = (degrees_fahrenheit - 32) * 5/9
        print(f"{degrees_fahrenheit} in °F are {degrees_celsius} °C")
        return degrees_celsius


In [None]:
deg_F = convert_temp(degrees_celsius = 30)
deg_F

In [None]:
deg_C = convert_temp(degrees_fahrenheit = 23)
deg_C

In [4]:
# scope, hard, soft copy
temperature = 30
def to_reach_100(temperature):
    return 100 - temperature
to_reach_100(40)

60

In [3]:
nested_list = [[1,2,3],[4,5,6]]
copy_of_list = nested_list
nested_list[0][0] = 0
print(copy_of_list)

[[0, 2, 3], [4, 5, 6]]


In [5]:
import copy

In [6]:
copy_of_list = copy.deepcopy(nested_list)
nested_list[0][0] = 100
print(copy_of_list)

[[0, 2, 3], [4, 5, 6]]


In [7]:
print(nested_list)

[[100, 2, 3], [4, 5, 6]]


## Modules
Modules are files containing Python code, which can define functions, classes, and variables. They allow you to organize your code into reusable components. You can use modules in your programs to access their functionality.

Python comes with a variety of built-in modules that provide a wide range of functionality.

To use installed modules, make sure to `import` them.

In [None]:
import math
num = 25
print(math.sqrt(num))

Fully qualified names and directory structure

In [None]:
# Example of using the random module
import random

# Generate a random number between 1 and 10
random_number = random.randint(1, 10)
print("Random Number:", random_number)

Import specific symbols from a module into the local namespace

In [None]:
from math import ceil, floor # we are specifying which modules will be used.
print(ceil(3.7))# => 4.0
print(floor(3.7)) # => 3.0

In Python, you can use the `as` keyword to bind module symbols to new symbols in the local namespace. This is particularly useful when you want to use shorter or more convenient names for symbols from imported modules.

In [1]:
import time

In [6]:
start_time = time.time()
for i in range(10_000_000):
    pass
end_time = time.time()

elapsed_time = end_time - start_time

print(f"The code ran for {elapsed_time:.6f} seconds.")


The code ran for 0.500741 seconds.


In [None]:
import numpy as np

arr1 = np.array([[1,2],[3,4]], dtype=np.float64)
arr2 = np.array([[5,6],[7,8]], dtype=np.float64)

print("arr1:")
print(arr1)
print()
print("arr1:")
print(arr2)
print()
# Element-wise addition
print(arr1 + arr2)

## Classes

In [18]:
# Parent class
class Animal:
    def __init__(self, name: str, sound: str):
        self.name = name
        self.sound = sound

    def make_sound(self):
        print(f"{self.name} makes a sound: {self.sound}")

    def call_other_animal(self, other_animal):
        print(f"{self.name} is calling {other_animal.name}: {self.sound}!")
        other_animal.respond_to_call(self)

    def respond_to_call(self, caller):
        print(f"{self.name} responds to {caller.name}: {self.sound}!")


# Subclass Cat
class Cat(Animal):
    def __init__(self, name: str, sound: str = "Meow"):
        super().__init__(name, sound)

    def climb(self):
        print(f"{self.name} is climbing a tree.")

    def call_another_cat(self, other_cat):
        if isinstance(other_cat, Cat):
            print(f"{self.name} is calling {other_cat.name}: {self.sound}!")
            other_cat.respond_to_call(self)
        else:
            # potentially throw errors here
            print(f"{self.name} can only call another cat, not a {type(other_cat).__name__}.")


# Subclass Dog
class Dog(Animal):
    def __init__(self, name: str, sound: str = "Bark"):
        super().__init__(name, sound)

    def fetch(self):
        print(f"{self.name} is fetching the ball.")


# Subclass PersianCat (inherits from Cat)
class PersianCat(Cat):
    def __init__(self, name: str):
        super().__init__(name, sound="Soft Meow")

    def groom(self):
        print(f"{self.name} is being groomed. Persian cats love grooming!")


# Example usage:
# Creating instances
generic_animal = Animal("Generic Animal", "Some sound")
cat1 = Cat("Whiskers")
cat2 = PersianCat("Snowball")
dog = Dog("Buddy")

# General animal-to-animal communication
generic_animal.call_other_animal(cat1)  # Generic Animal calls Whiskers
cat1.call_other_animal(dog)  # Whiskers calls Buddy
dog.call_other_animal(cat2)  # Buddy calls Snowball

# Cat-specific communication
cat1.call_another_cat(cat2)  # Whiskers calls Snowball
cat1.call_another_cat(dog)  # Whiskers tries to call Buddy but fails
cat2.call_another_cat(cat1)  # Snowball calls Whiskers

# Testing individual behaviors
cat1.climb()
dog.fetch()
cat2.groom()


Generic Animal is calling Whiskers: Some sound!
Whiskers responds to Generic Animal: Meow!
Whiskers is calling Buddy: Meow!
Buddy responds to Whiskers: Bark!
Buddy is calling Snowball: Bark!
Snowball responds to Buddy: Soft Meow!
Whiskers is calling Snowball: Meow!
Snowball responds to Whiskers: Soft Meow!
Whiskers can only call another cat, not a Dog.
Snowball is calling Whiskers: Soft Meow!
Whiskers responds to Snowball: Meow!
Whiskers is climbing a tree.
Buddy is fetching the ball.
Snowball is being groomed. Persian cats love grooming!


In [1]:
# add package code here

In [16]:
class BankAccount:
    def __init__(self, owner: str, balance: float = 0.0):
        self.owner = owner
        self.balance = balance

    def deposit(self, amount: float):
        if amount > 0:
            self.balance += amount
            print(f"Deposited ${amount:.2f}. New balance is ${self.balance:.2f}.")
        else:
            print("Deposit amount must be positive.")

    def withdrawal(self, amount: float):
        if amount > 0 and amount <= self.balance:
            self.balance -= amount
            print(f"Withdrew ${amount:.2f}. New balance is ${self.balance:.2f}.")
        elif amount > self.balance:
            print("Insufficient funds for this withdrawal.")
        else:
            print("Withdrawal amount must be positive.")


class CheckingAccount(BankAccount):
    def __init__(self, owner: str, balance: float = 0.0, insufficientFundsFee: float = 25.0):
        super().__init__(owner, balance)
        self.insufficientFundsFee = insufficientFundsFee

    def processCheck(self, checkToProcess: float):
        print(f"Processing check for ${checkToProcess:.2f}.")
        self.withdrawal(checkToProcess)

    def withdrawal(self, amount: float):
        if amount > 0:
            if amount > self.balance:
                print("Insufficient funds. Charging insufficient funds fee.")
                self.balance -= self.insufficientFundsFee
                print(f"Charged fee of ${self.insufficientFundsFee:.2f}. New balance is ${self.balance:.2f}.")
            else:
                super().withdrawal(amount)
        else:
            print("Withdrawal amount must be positive.")


class SavingsAccount(BankAccount):
    def __init__(self, owner: str, balance: float = 0.0, annualInterestRate: float = 0.02):
        super().__init__(owner, balance)
        self.annualInterestRate = annualInterestRate

    def depositMonthlyInterest(self):
        monthlyInterest = self.balance * (self.annualInterestRate / 12)
        self.deposit(monthlyInterest)
        print(f"Deposited monthly interest of ${monthlyInterest:.2f}. New balance is ${self.balance:.2f}.")

    def withdrawal(self, amount: float):
        if amount > 0:
            super().withdrawal(amount)
        else:
            print("Withdrawal amount must be positive.")


# Example usage:
# Checking account example
checking_account = CheckingAccount("Alice", 100.0)
checking_account.deposit(50.0)
checking_account.withdrawal(200.0)
checking_account.processCheck(30.0)

# Savings account example
savings_account = SavingsAccount("Bob", 1000.0, 0.05)
savings_account.deposit(500.0)
savings_account.depositMonthlyInterest()
savings_account.withdrawal(200.0)


Deposited $50.00. New balance is $150.00.
Insufficient funds. Charging insufficient funds fee.
Charged fee of $25.00. New balance is $125.00.
Processing check for $30.00.
Withdrew $30.00. New balance is $95.00.
Deposited $500.00. New balance is $1500.00.
Deposited $6.25. New balance is $1506.25.
Deposited monthly interest of $6.25. New balance is $1506.25.
Withdrew $200.00. New balance is $1306.25.


## Data Manipulation and Management: NumPy and Pandas

`NumPy` and `Pandas` are powerful libraries for data management and manipulation in Python. They provide essential tools for handling data, performing mathematical operations, and conducting data analysis.

NumPy focuses on arrays and numerical operations, while Pandas is designed for data manipulation and analysis, particularly with tabular data.


### NumPy
Numpy arrays **are** great alternatives to Python Lists. Some of the key advantages of Numpy arrays are that they are fast, easy to work with, and give users the opportunity to perform calculations across entire arrays.

In [9]:
import numpy as np

height = [1.87,  1.87, 1.82, 1.91, 1.90, 1.85]
weight = [81.65, 97.52, 95.25, 92.98, 86.18, 88.45]

np_height = np.array(height)
np_weight = np.array(weight)

print(type(height))
print(type(np_height))

<class 'list'>
<class 'numpy.ndarray'>


In numpy, an *array* is a generic term for a multidimensional set of numbers.

#### Element-wise calculations
These operations are very fast and computationally efficient. They are particularly helpful when you have 1000s of observations in your data.

In [None]:
# Calculate bmi
bmi = np_weight / np_height ** 2
bmi

#### Subsetting
Another great feature of Numpy arrays is the ability to subset.

In [None]:
# For a boolean response
print(bmi > 25)

In [None]:
# Print only those observations above 23
bmi[bmi > 23]

#### Other Operations

In [None]:
matrix = np.array( [ [1, 2, 3], [4, 5, 6], [7, 8, 9] ] )
print("array:\n", matrix)
print("shape:", matrix.shape)
print("number of dimensions:", matrix.ndim)

In [None]:
np.zeros( (2,3) )

In [None]:
np.ones( (2,3) )

In [None]:
eye = np.eye( 3 )
print(eye)

In [None]:
matrix

In [None]:
matrix * 2

In [None]:
elementwise_product = matrix * matrix
print(elementwise_product)

In [None]:
matrix_mulitplication = matrix @ matrix
print(matrix_mulitplication)

In [None]:
matrix

In [None]:
matrix.transpose()

In [None]:
matrix.min()

In [None]:
matrix.max()

In [None]:
matrix.max(axis=0)

In [None]:
matrix.max(axis=1)

In [None]:
matrix.sum()

In [None]:
matrix = np.array([[1, 2, 3],
                  [4, 5, 6]])

# Sum along axis 0 (vertically, along rows)
sum_axis_0 = matrix.sum(axis=0)
print(f"Row-wise sum: {sum_axis_0}")

# Sum along axis 1 (horizontally, along columns)
sum_axis_1 = matrix.sum(axis=1)
print(f"Column-wise sum: {sum_axis_1}")

overall_sum = matrix.sum()
print(f"Overall sum: {overall_sum}")

In [None]:
x = np.array([1, 2, 3, 4, 5, 6])
print('beginning x:\n', x)
x_reshape = x.reshape((2, 3))
print('reshaped x:\n', x_reshape)

Generating random data with NumPy

The `np.random.normal()` function generates a vector of random normal variables.

In [None]:
np.random.normal?

By default, this function will generate random normal variable(s) with mean (`loc`)  0  and standard deviation (`scale`)  1 ; furthermore, a single random variable will be generated unless the argument to `size` is changed.

We now generate 50 independent random variables from a $N(0,1)$ distribution.

In [None]:
x = np.random.normal(size=50)
x

Each time we call `np.random.normal()`, we will get a different answer, as shown in the following example.

In [None]:
print(np.random.normal(scale=5, size=2))
print(np.random.normal(scale=5, size=2))

In order to ensure that our code provides exactly the same results
each time it is run, we can set a *random seed*
using the
`np.random.default_rng()` function.
This function takes an arbitrary, user-specified integer argument. If we set a random seed before
generating random data, then re-running our code will yield the same results. The
object `rng` has essentially all the random number generating methods found in `np.random`. Hence, to
generate normal data we use `rng.normal()`.

In [None]:
np.random.default_rng?

In [None]:
rng = np.random.default_rng(42)
print(rng.normal(scale=5, size=2))
rng2 = np.random.default_rng(42)
print(rng2.normal(scale=5, size=2))

In [None]:
rng = np.random.default_rng(42)
X = rng.standard_normal((5, 3))
X

In [None]:
X.mean(axis=0)

In [None]:
X.var(axis=1)

### Pandas

Pandas is another Python library which offers a powerful way to work with more efficient data structures and allows for advanced data manipulation and analysis.

The main data structure that Pandas works with is called a **Data Frame**.  This is a two-dimensional table of data in which the rows typically represent cases, and the columns represent variables.  Pandas also has a one-dimensional data structure called a **Series** that we will encounter when accesing a single column of a Data Frame.

In [None]:
country_dict = {"country": ["Brazil", "Russia", "India", "China", "South Africa"],
       "capital": ["Brasilia", "Moscow", "New Dehli", "Beijing", "Pretoria"],
       "area": [8.516, 17.10, 3.286, 9.597, 1.221],
       "population": [200.4, 143.5, 1252, 1357, 52.98] }

import pandas as pd
brics = pd.DataFrame(country_dict)
brics

In [None]:
brics.columns

In [None]:
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
matrix_df = pd.DataFrame(matrix, columns = ['col1','col2','col3'],
                         index = ['row1','row2','row3'])

matrix_df

Lets say we would like to slice our data frame and select only specific portions of our data.

In [None]:
brics

#### Accessing Columns
To access a specific column in a Pandas DataFrame, you can use indexing with square brackets or dot notation.

In [None]:
# Using square brackets to access a column
brics_countries = brics['country']
brics_countries

In [None]:
print(type(brics))
print(type(brics_countries))

In [None]:
# Using dot notation to access a column
brics_capitals = brics.capital
brics_capitals

#### Accessing Rows

You can use the `.loc[]` or `.iloc[]` indexer to access specific rows by their labels or integer positions, respectively.

In [None]:
matrix_df

In [None]:
# Using .loc[] to access a row by label - label-based indexing.
row1 = matrix_df.loc['row1']
print(row1)
print()
# Using .iloc[] to access a row by integer position positional indexing.
row2 = matrix_df.iloc[1]
print(row2)

#### Accessing Specific Cells
To access a specific cell, you can combine row and column selection using `.loc[]` or `.iloc[]`.

In [None]:
matrix_df

In [None]:
# Using .loc[] to access a specific cell by label
cell1 = matrix_df.loc['row1', 'col1']
cell1

In [None]:
# Using .iloc[] to access a specific cell by integer position
cell1 = matrix_df.iloc[0, 0]
cell1

#### Filtering Data
You can filter the DataFrame based on specific conditions using boolean indexing.

In [None]:
brics

In [None]:
# Filtering rows where population is greater than 200
brics[brics['population'] > 200.0]

Pandas has a variety of functions named '`read_xxx`' for reading data in different formats.  Right now we will focus on reading '`csv`' files, which stands for comma-separated values.

In [None]:
# Dataset source:www.statlearning.com/resources-python
Auto = pd.read_csv('Auto.csv')
# The head() method shows the first 5 rows (by default) of our Data Frame.
Auto.head()

In [None]:
Auto.shape

In [None]:
Auto.dtypes

In [None]:
Auto['horsepower']

We see that the `dtype` of this column is `object`.
 - All values of the `horsepower` column are being interpreted as strings when reading in the data.

We can find out why by looking at the unique values.

In [None]:
Auto['horsepower'].unique()

To fix the problem, we must provide `pd.read_csv()` with an argument called `na_values`.
Now,  each instance of  `?` in the file is replaced with the
value `np.nan`, which means *not a number*:

In [None]:
Auto = pd.read_csv('Auto.csv',
                   na_values=['?'])
Auto['horsepower'].sum()

Dealing with missing values

In [None]:
Auto.isna().sum(axis=0)

There are various ways to deal with missing data. For now, since only five of the rows contain missing observations, we'll simply remove these rows.

In [None]:
Auto.shape

In [None]:
Auto_new = Auto.dropna()
Auto_new.shape

In [None]:
Auto_new.isna().sum(axis=0)

In [None]:
Auto_new = Auto_new.set_index('name')
Auto_new


In [None]:
stats = Auto_new.describe()
stats

Sorting the dataset by a column

In [None]:
Auto_new.sort_values(by=['horsepower', 'acceleration'], ascending=False).head()

## Data Visulaization

### Matplotlib
Matplotlib is a powerful library for creating a wide range of visualizations, including line plots, bar charts, scatter plots, histograms, and more.

In [10]:
import matplotlib.pyplot as plt

In [None]:
# instructs IPython to display Matplotlib plots directly within the notebook
%matplotlib inline

In [None]:
# use fivethirtyeight style
plt.style.use('fivethirtyeight')

In [None]:
x = [1, 2, 3, 4, 5]
y = [10, 15, 13, 18, 20]
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Simple Line Plot')

In [None]:
x = [1,2,3,4,5,6,7,8]
y = [1,4,9,16,25,36,49,64]
z = [-64,-27,-8,-1,0,1,8,27]

plt.plot(x, y, label = r'$x^2$', marker = 'o', color='black')
plt.plot(x, z, label = r'$(x-5)^3$', marker = 'X', color='red')

plt.xlabel('x values')
plt.ylabel('y values')

plt.title('Example Title')

plt.legend()

plt.grid()

In [None]:
categories = ['A', 'B', 'C', 'D', 'E']
values = [30, 45, 55, 20, 60]
plt.bar(categories, values)
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Chart')
plt.show()

### Exporting Plots from matplotlib

### Creating Multiplots on Same Canvas

In [None]:
# plt.subplot(nrows, ncols, plot_number)
## create a single subplot at a specific location within a grid
import numpy as np
plt.subplot(1,2,1)
x = np.linspace(0, 5, 11)
y = x ** 2
plt.plot(x, y, 'r--')
plt.subplot(1,2,2)
plt.plot(y, x, 'g*-')

In [None]:
# Use similar to plt.figure() except use tuple unpacking to grab fig and axes objects
fig, axes = plt.subplots()
x = np.linspace(0, 5, 11)
y = x ** 2
z = x ** 3
m = x ** 4
# Now use the axes object to add stuff to plot
axes.plot(x, y, 'r')
axes.plot(x, z, 'b')
axes.plot(x, m, 'g')
axes.set_xlabel('x-sxis')
axes.set_ylabel('y-axis')
axes.set_title('title');

In [None]:
np.random.randint?

In [None]:
# Generate random data
x = np.random.randint(1, 100, 20)
y = np.random.randint(50, 100, 20)

# Create a scatter plot with customizations
plt.scatter(x, y, c='blue', marker='o', edgecolors='black', s=100, alpha=0.7)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')

In [None]:
rand_data = np.random.randn(50)
plt.hist(rand_data, bins=20);

In [None]:
data = [np.random.normal(0, std, 100) for std in range(1, 4)]

# rectangular box plot
plt.boxplot(data,vert=True,patch_artist=True);

Using Pandas built-in plotting functionality

In [None]:
Auto.plot.scatter?

In [None]:
ax = Auto.plot.scatter('horsepower', 'mpg')
ax.set_title('Horsepower vs. MPG');

In [None]:
Auto.boxplot?

In [None]:
fig, ax = plt.subplots(figsize=(8,8))
Auto.boxplot('mpg', by='cylinders', ax=ax,)

In [None]:
fig, ax = plt.subplots(figsize=(8, 8))
Auto.hist('mpg', ax=ax);

# Simple Linear Regression

In [3]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, GammaRegressor, OrthogonalMatchingPursuitCV
from sklearn.metrics import mean_squared_error, r2_score

np.random.seed(42)
X = 2 * np.random.rand(100, 1)
y = 4 + 3 * X + np.random.randn(100, 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a Linear Regression model
model = LinearRegression()
model1 = GammaRegressor()

# Train the model on the training data
model.fit(X_train, y_train)
model1.fit(X_train, y_train)

# Make predictions on the test data
y_pred = model.predict(X_test)
y_pred1 = model1.predict(X_test)


# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

mse1 = mean_squared_error(y_test, y_pred1)
r21 = r2_score(y_test, y_pred1)


print(f"Mean Squared Error: {mse:.2f}")
print(f"R² Score: {r2:.2f}")
print(f"Mean Squared Error: {mse1:.2f}")
print(f"R² Score: {r21:.2f}")
print(f"Model Coefficients: {model.coef_[0][0]:.2f}")
print(f"Model Intercept: {model.intercept_[0]:.2f}")

# Plot the results
plt.scatter(X, y, color="blue", label="Data")
plt.plot(X_test, y_pred, color="red", label="Prediction")
plt.plot(X_test, y_pred1, color="orange", label="Prediction (Gamma)")
plt.xlabel("Independent Variable (X)")
plt.ylabel("Dependent Variable (y)")
plt.legend()
plt.title("Simple Linear Regression")
plt.show()


NameError: name 'np' is not defined

# References and Further Readings (Resources)

* [Gipp Lab's Python Couse](https://gipplab.atlassian.net/wiki/spaces/STUD/pages/1682964532/Python+Course)
* [An Introduction to Statistical Learning (Python)](https://intro-stat-learning.github.io/ISLP/labs/Ch02-statlearn-lab.html)
* https://www.pythontutorial.net/
* http://www.matplotlib.org - The project web page for matplotlib.
* https://github.com/matplotlib/matplotlib - The source code for matplotlib.
* http://matplotlib.org/gallery.html - A large gallery showcaseing various types of plots matplotlib can create. Highly recommended!
* http://www.loria.fr/~rougier/teaching/matplotlib - A good matplotlib tutorial.
* https://github.com/samarinm/pythonCC/tree/main
* http://scipy-lectures.github.io/matplotlib/matplotlib.html - Another good matplotlib reference.
* [Seaborn](https://seaborn.pydata.org/tutorial.html)
* [Kaggle courses on Pandas and Data Visualization](https://www.kaggle.com/learn)
