# Lab 1
## Data Structures & Algorithms
### Thursday, 8 February 2024

## Today

* [Some preliminary stuff for the DS&A labs.](#prelim)
* [Helpful tools](#tools)
* [Setting up a new project...](#setup)
* [Good coding practice :)](#goodcoding)
* [Before you get started](#getstarted) (optional)
* [Let's start coding!](#exercises)

## Some preliminary stuff. <a class="anchor" id="prelim"></a>

### How will these labs be structured?

90 minutes in total, divided into

1. Intro & refresher
2. Exercises (~1h)
3. Any other coding-related work & questions, **if there is time**. Otherwise, my office hours are **Thursdays 2-3pm** in room **F180-2.31**

### Lab 'rules'

There are no stupid questions!

Try it yourself before asking others/googling/stack overflowing/asking chatGPT!

Enjoy it - coding is fun!

### Access lab resources and exercises
You will find the Jupyter notebooks for the labs at [https://github.com/lenafm/data_structures_and_algorithms_2024](https://github.com/lenafm/data_structures_and_algorithms_2024).

To access the Jupyter notebooks, the easiest thing is to `git clone` the repository, by running the following in your command line:

```
git clone https://github.com/lenafm/data_structures_and_algorithms_2024.git
```

In the [Before you get started](#getstarted) section, you find a step-by-step guide of how to run a Jupyter notebook.

## Helpful tools <a class="anchor" id="tools"></a>

Here is a list of tools and resources that can help you along the way. Use these to go through the steps in the [Before you get started](#getstarted) section.

* **How to work with the command line?** It's useful for anyone working with data and code to know how to use the command line (aka terminal/shell), this [article](https://www.dataquest.io/blog/why-learn-the-command-line/) explains why. [Here](https://tutorial.djangogirls.org/en/intro_to_command_line/) is an introduction of the most important commands on different operating systems. Using [TMUX](https://deliciousbrains.com/tmux-for-local-development/#what-is-tmux) will make using the command line much easier! 
* **How to install Python and keep track of dependencies?** I highly recommend using a virtual environment, ideally with [miniconda](https://docs.anaconda.com/free/miniconda/#quick-command-line-install)! Miniconda [CHEATSHEET](https://docs.conda.io/projects/conda/en/latest/_downloads/843d9e0198f2a193a3484886fa28163c/conda-cheatsheet.pdf)
* **Where/how to write your code?** Choose an integrated development environment (IDE) or code editor (I recommend [PyCharm](https://www.dataquest.io/blog/how-to-set-up-pycharm-community-edition/)) and install [Jupyter Notebooks](https://realpython.com/jupyter-notebook-introduction/) for code experimentation (and to run the DS&A labs), and use Jupyter [keyboard shortcuts](https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330).
* **How to keep track of changes?** [Download and install Git](https://www.atlassian.com/git/tutorials/install-git)! Git [CHEATSHEET](https://education.github.com/git-cheat-sheet-education.pdf)
* **How to collaborate?** Sign up to GitHub.
* **How to write basic syntax in Python?** Look at this [CHEATSHEET](https://www.pythoncheatsheet.org/cheatsheet/basics) or use StackOverflow.

## Setting up a new project <a class="anchor" id="setup"></a>

Here are some best-practice steps you can take *every time* you start a new project (including a 'project' for what we'll be doing in DS&A labs), to keep your code organised, sharable, and up-to-date. The [Before you get started](#getstarted) section below goes through these steps one by one.

1. Create a new directory for each project!
2. Set up a virtual environment with your preferred Python version using conda.
3. Install jupyter into this environment.
4. Set up a new project in PyCharm, selecting your preferred Python installation as the interpreter (ideally the one you just created using conda).
5. Set up git (create .gitignore, initialise, first commit, etc.).
6. Start jupyter within your environment by running the command `jupyter notebook`.
7. START CODING!

## Refresher: some best practice for coding :) <a class="anchor" id="goodcoding"></a>

The course textbooks are a great resource and there are many blog posts on best-practice for coding. Here's a very top-line summary:

* Think carefully about naming conventions for variables, classes and functions
* Write good documentation and comments
* Make sure your code is reusable and scalable
    * Once you find yourself repeating a bit of code, write a function for it.
    * High cohesion (code *within* modules/classes/functions should be closely related).
    * Low coupling (code in *different* modules/classes/functions should depend on each other as little as possible).
* Test your code (the smaller the units you test, the easier you will make your life in the future).
* Track your changes; remember to use version control so that your collaborators and future self understand what you have changed/added!

## Before you get started <a class="anchor" id="getstarted"></a>

If you have been coding in Python for a while and already have your preferred set-up, you can skip this section and go on to the coding exercises. Otherwise, go through the following steps to create a handy and easily reproducible coding environment, before you dive into the coding exercises. Doing this once will very likely make your coding experience in the future much easier, even if it seems like a hassle in the beginning. **Use the resources and links listed in the [Helpful tools](#tools) section.**

### 1. Familiarise yourself with the command line
Being familiar with the command line, and being used to working with it, will help you set up and navigate data processing pipelines, work with data that is stored remotely (i.e. not on your local computer), switch between different programmes, and deploy web apps, to name just a few. If you have never used the command line before, open the command line introduction and make yourself familiar with the commands to
1. how to print the information of the current directory
2. move between directories (aka folders)
3. how to print a list of files and subdirectories within the current directory
4. create new directories
5. remove files and folders
6. (optional and slightly more advanced) a really helpful tool for simplifying the command line is TMUX (terminal multiplexer), so you can run multiple terminal (command line) windows in parallel.

### 2. Set up a tool for virtual environments
Virtual environments are a great way to separate your package dependencies and even your Python versions. I highly recommend miniconda (called 'conda') -- since it is not limited to Python; you can even set up an R environment with conda. Alternatives are virtualenv and pipenv. If you already work with virtual environments, you can skip this step. Otherwise
1. install miniconda; now that you have some basic command line skills, it's probably easiest to use the Quick Command Line Install (just choose your operating system and follow the instructions)
2. with the help of the cheatsheet: create a new virtual environment in which you install Python version 3.11, give it the name `dsa` (to do this, run `conda create -n dsa python=3.11`)
3. double check you have successfully installed the environment by running the conda command that lists all environments
4. activate the new environment that you created
5. list the packages that are installed within this environment


### 3. Create a new git repo (or clone an existing one)
Using version control with git will make your collaborators' and future self's life *a lot* easier. It helps you to track your own changes on a project and to collaborate effectively with your teams. Generally, this part of the work flow will consist of creating a new repo or cloning an existing one from Github. The steps are
1. install git (run `git --version` to check if you already have it installed)
2. to clone the existing DS&A labs repository, such as the one for these labs, run `git clone https://github.com/lenafm/data_structures_and_algorithms_2024.git` and then move into the new directory that this creates, now create and move into a new branch by running `git checkout -b lab1` (where lab1 is now the name of the branch you have created)

### 4. Set up Jupyter notebooks
Jupyter notebooks are a great way for quick experimentation with code, to present code, or to create data science work flows. The final functions and classes (and testing) that you write for a project should **not** sit in Jupyter notebook. Those should be written in `.py` files (aka Python modules) or, even better, in Python packages. One best-practice tip: start your code experiments in a Jupyter notebook and once you find yourself using some functions/classes repeatedly or think you might need them in other notebooks, migrate them into Python modules (and later to a package). To run this Jupyter notebook, follow these steps:
1. make sure the `dsa` environment you created in step 2 is activated
2. install jupyter into this environment using the `pip install jupyter` command (installing something *into* an environment just means that you need to run the installation command after having activated the environment)
3. now move into the directory for the DS&A repository that you cloned in step 3 (called `data_structures_and_algorithms_2024`).
4. run `jupyter notebook`; this will start your default browser (or open a new tab) and you should now see a folder structure that includes the Jupyter notebook file `lab1.ipynb`
5. click on `lab1.ipynb` to run it and familiarise yourself with the keyboard shortcuts for running cells, creating new cells (above and below the current cell), and removing cells.

### 5. Set up your IDE
After step 4., you can now write code in Jupyter notebooks. For the type of coding necessary for larger (and collaborative) projects, you'll also need an IDE (integrated development environment), such as PyCharm. This is where you'll write classes, functions, helper functions, tests etc. I recommend PyCharm, because it has many 'intelligent code' features such as code prediction, readability, error detection, easy refactoring, and because it provides debugging tools.
1. install the free PyCharm Community edition (there's also a PyCharm Professional which is not free, but you can register with them as a student to get the Professional version for free)
2. create a new PyCharm project for the DS&A labs: <br>
    i. on the `New Project` screen, set the `Location` path to the path to the `data_structures_and_algorithms_2024` directory <br>
    ii. change the radio button from `New encironment using ...` to `Previously configured interpreter` <br>
    iii. in the `Interpreter` dropdown menu, choose the Python installation that is within the `dsa` conda environment you created above<br>
4. we will use this set-up in later labs!


## Let's get coding! <a class="anchor" id="exercises"></a>

### Exercise 1
Write a function that prints a string to the screen.

In [1]:
def print_string(x):
    """
    Print the input string

    Parameters
    ----------
    x : a string
    """

    # Implement me
    return print(x)

In [2]:
print_string('Hi there!')

Hi there!


### Exercise 2
Write a function that takes a name of a person as an input and prints 'Hello ', followed by the name.

In [3]:
def greet_someone(x):
    """
    Print 'Hello ' followed by the input string

    Parameters
    ----------
    x : a string
    """

    # Implement me
    if not isinstance(x, str):
        raise TypeError('The input should be a string')
    x = x.capitalize()
    greeting_txt = 'Hello '
    if x in ['Margaret', 'Henry']:
        greeting_txt = 'Dearest '
    return print(greeting_txt + x)

In [4]:
greet_someone('Quentin')

Hello Quentin


In [5]:
greet_someone('margaret')

Dearest Margaret


In [6]:
greet_someone(7)

TypeError: The input should be a string

EXTENSION

Try to modify the previous function in a few ways (one step at a time):

* ensure that the first letter of the name is capitalised when printed
* when the names that are passed are 'Margaret' or 'Henry', they should be greeted with 'Dearest ' instead of 'Hello '
* include some more code that checks if the input is a string; if it isn't a string, the function should return `None`. *Hint: Look up how to use the `isinstance()` function; think about where in the function this should go!*
* now rewrite this again, this time make the function raise an error if the input is not a string. *Hint: Look up how to deal with a `TypeError` exception in Python.*

### Exercise 3
Write a function that takes as an input two numeric values `a` and `b` and returns their sum.

In [7]:
def sum_two_values(a, b):
    """
    Sum up two numeric values

    Parameters
    ----------
    a : a numeric value
    b : a second numeric value
    """

    # Implement me
    if isinstance(a, (int, float)) and isinstance(b, (int, float)):
        return a+b
    else:
        raise TypeError('Both inputs need to be of type int or float')

In [8]:
sum_two_values(3,5)

8

In [9]:
sum_two_values('some text', 6)

TypeError: Both inputs need to be of type int or float

EXTENSION

Try to modify the previous function to:

* raise an error if one of the two values is NOT a number. *Hint: Check for the types `int` and `float`.*

### Exercise 4
Write a function that takes an integer x > 1 as an input and returns the sum of all integers 1 to x (including x).

In [10]:
def sum_integers(x):
    """
    Return the sum of integers 1 to (and including) x.

    Parameters
    ----------
    x : an integer
    """

    # Implement me
    current_sum = 0
    for i in range(x+1):
        current_sum += i
    return current_sum

In [11]:
def sum_integers2(x):
    """
    Return the sum of integers 1 to (and including) x, using list comprehension.

    Parameters
    ----------
    x : an integer
    """

    # Implement me
    integer_list = [i for i in range(x+1)]
    return sum(integer_list)

In [12]:
sum_integers(4)

10

In [13]:
sum_integers2(4)

10

EXTENSION

Write an alternative version of this function called `sum_integers2` using something called 'list comprehension' and the built-in `sum` function.

### Exercise 5
Write a function that checks if an integer is a multiple of another integer.

In [14]:
def is_multiple(a, b):
    """
    Return True if integer a is a multiple of integer b.

    Parameters
    ----------
    a : an integer
    b : another integer
    """

    # Implement me
    return a % b == 0

In [15]:
is_multiple(4,3)

False

In [16]:
is_multiple(16,4)

True

### Exercise 6
Use the function written in Exercise 4 to write a function that checks if an integer is even.

In [17]:
def is_even(x):
    """
    Return True if integer x is even.

    Parameters
    ----------
    x : an integer
    """

    # Implement me
    return is_multiple(x, 2)

In [18]:
is_even(5)

False

In [19]:
is_even(6)

True

### Exercise 7
Write a function to determine the sum of all the multiples of 3 or 5 below 1000. *Hint: You could, again, use the function written in Exercise 4 and adapt it using the function written in Exercise 5.*

In [26]:
def calculate_sum_of_multiples(x):
    """
    Return the sum of multiples of 3 and 5 below the input value x. 

    Parameters
    ----------
    x : an integer
    """

    # Implement me
    return sum([x for x in range(x) if (is_multiple(x, 3) or is_multiple(x, 5))])

In [27]:
calculate_sum_of_multiples(1000)

233168

### Exercise 8
Determine the 20th Fibonacci number. For this, write a function to return the nth term of the Fibonacci sequence, with the first two terms being $x_0 = 0$ and $x_1 = 1$. The Fibonacci sequence continues by adding the previous two terms, so with these two starting values, the first few terms of the sequence are $0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...$.

In [28]:
def fibonacci(n):
    """
    Return the nth value in the fibonacci sequence.

    Parameters
    ----------
    n : an integer
    """

    # Implement me
    if n==1 or n==2:
        return 1

    x_minus_1 = 1
    x = 1
    idx = 2
    while idx < n:
        temp = x
        x = x + x_minus_1
        x_minus_1 = temp
        idx = idx + 1

    return x

In [29]:
fibonacci(20)

6765

### Exercise 9
Determine the sum of the even terms in the Fibonacci sequence (below 4 million). *Hint: Write a function that sums even terms below some integer n. Can you use the functions you implemented for Exercises 6 and 8?*

In [30]:
def sum_even_fibonacci(x_max):
    """
    Sum even fibonacci numbers below x_max.

    Parameters
    ----------
    x_max : an integer
    """

    # Implement me
    current_sum = 0

    x_minus_1 = 1
    x = 1
    idx = 2
    while x < x_max :
        temp = x
        x = x + x_minus_1
        if is_even(x):
            current_sum += x
        x_minus_1 = temp
        idx = idx + 1

    return current_sum

In [31]:
sum_even_fibonacci(4000000)

4613732