# Lecture 2 Notes

## The Science of Algorithms

Computer science can be seen as the study of **algorithms**.

An algorithm is a precise step-by-step description of how to solve a problem.

Before we dive into the details of programming, lets take a look at couple of interesting algorithms.

## A Greatest Common Divisor Algorithm

Given two positive integers, like 51 and 90, a common question to ask is what is the greatest integer that divides both numbers. We call this the **greatest common divisor**, or **GCD**, of the numbers.

Try these questions:

- What's the GCD of 15 and 39?
- What's the GCD of 7 and 5?

What algorithm did you use to answer the questions? Would the way you do it always eifficiently give the correct answer for any pair of numbers?

### Euclid's Algorithm in Pseudocode

A famous algorithm for calculating GCDs is [Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm). It was first described in 300 BC in [Euclid's Elements](https://en.wikipedia.org/wiki/Euclid%27s_Elements), and so is over 2000 years old.

[Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm) says that to find the greatest common divisor of two positive numbers, do the following:

1. if the numbers are the same, stop: their GCD is the number itself
2. if they are different, replace the bigger number with the difference between it and the smaller number
3. go to step 1

When we write an algorithm in English we call it **pseudocode**. We will often use pseudocode when talking about algorithms.

### Example
Calculate the GCD of 15 and 39 usign [Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm).

We start with 15 and 39. Line 1 of the algorithm says to check if the numbers are the same. If they are the same, we're done. But in this case they're not, so we go to line 2 which says to replace the bigger number with difference between it and the other number. So 15 and 39 becomes 15 and 39-15=24, or just 15 and 24.

Now we repreat lines 1 and 2 until the numbers are the same:

```
39 15
24 15
9 15
9 6
3 6
3 3
```

When the numbers are the same [Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm) tells us to stop, and the GCD of 39 and 15 is 3.

A computer scientist would ask questions like these about [Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm):
- **Does it always return the correct answer?** Are there numbers that it doesn't work for, or, say, get's stuck in an infinite loop with? Can we *prove* it always works correctly?
    - If you're curious, you can prove [the correctness of Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm#Proof_of_validity) using some number theory.
- **How efficient is this algorithm?** Can we make it faster? Or perhaps is there a different algorithm that calculates GCDs more quickly?
    - If you're curious, check out [the performance of Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm#Algorithmic_efficiency).


### Euclid's Algorithm in Python

Here is a Python version of [Euclid's algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm) ... click on the arrow at the top left to run it:

In [None]:
# tell the user what the program does
print('Please enter two integers.')
print("I will calculate their GCD using Euclid's algorithm")
print()

# get the input numbers from the user (as strings)
a = input('What is the first number? ')
b = input('What is the second number? ')

# convert the strings to integers
a = int(a)
b = int(b)

# find the greatest common divisor
while a != b:
    print(a, b)  # see how and a and b change
    if a > b:
        a = a - b
    else:
        b = b - a

# print the result
print("greatest common divisor:", a)

Please enter two integers.
I will calculate their GCD using Euclid's algorithm

What is the first number? 7
What is the second number? 5
7 5
2 5
2 3
2 1
greatest common divisor: 1


Run the above program with these values:

- 10 and 9
- 100 and 99
- 7 and -5
- -7 and -5
- 3.14 and 2.77
- two and three

Don't worry if you don't understand all the details of the program yet. We will get to them in the course.



## The Most Common Word

Suppose you want to find out the most common word in some text. For instance, what word is most common in this?

```
the top part of a pop tart is too tart for a pop tart
```

We can see that `tart` is occurs 3 times, so it is the most common word.

Here is a pseudocode algorithm that solves this problem:

1. Put the text into a variable named `text`.
2. Make a list called `words` of the words in `text`.
3. Make an initially empty dictionary called `count` of *word:count* pairs. `count[w]` is the number of occurrences of word `w`.
4. Add each word in `words` to `count`.
5. Convert `count` to a list of (count, word) pairs called `word_counts`.
6. Sort `word_counts` from biggest count to smallest.
7. Print the most frequent word and its count.

Here is the algorithm in Python. Don't worry about the exact details. For now just browse the code and try to get the gist of what it's doing:

In [None]:
text = 'the top part of a pop tart is too tart for a pop tart'
#text = input('Enter some words:')
#text = open('/content/drive/MyDrive/Colab Notebooks/120Fall2024/public/austenPandP.txt').read()

#
# Extract the words of text into a list of words using Python's
# built-in split() method.
#
words = text.split()

#
# count is a dictionary that stores word:count pairs.
# We add each word in words to it, and if a word is already
# there we increment its count.
#
count = {}
for w in words:
    if w in count:
        count[w] += 1
    else:
        count[w] = 1

#
# Convert the count dictionary to a list of (count, word) pairs,
# and then arrange them from highest count to lowest count.
#
word_counts = [(count[w], w) for w in count]
word_counts.sort()

#
# word_counts[-1] is the right-most word count pair on the list, i.e.
# the pair with the highest count.
#
print('Most frequent word:', word_counts[-1][1], word_counts[-1][0])



Most frequent word: tart 3


This algorithm is a good example of something that Python is good at. It uses a number of built-in Python features, like `split`, `sort`, and dictionaries. Not only does it run fairly efficiently, it does not take too long for the programmer to write and debug, and the code is fairly readable. Other languages (like C++ or Java) can be faster, but they typically take more time to write and debug, and the resulting code can be harder to read.

Note the commented-out `text =` statements at the top of the program. By commenting/uncommenting those you can try out different text. In Colab, if you adda file the "Files" tab you can read a text file into a string.

## Programming Errors

When you write computer programs, errors are common. Even the best and most experienced make mistakes and need to be on the lookout for **bugs** (another word for "errors") in their code.

In this course we will see three main kinds of programming errors: syntax errors, runtime errors, and logic errors.

### Syntax Errors

A **syntax error** is a mistake in the spelling or grammar of your program. Python catches some syntax errors *before* a program runs, but not all. Here's an example of a syntax error:

  ```python
  print("oops!)   # syntax error: missing quote
  ```

### Runtime Errors

A runtime error is an error that occurs while your program is running. For example, if you type the string `five` into Euclid's algorithm above, the program will crash due to a runtime error.

### Logic Errors

A **logic error** is an error in the logic of your program. It typically means your algorithm is wrong.

For example, this version of the GCD algorithm has a logic error on line 4:

  1. start with two integers
  2. if they are the same, stop: you've found their GCD
  3. if they are different, replace the bigger number with the difference
     between it and the smaller number
  4. **go to step 3**  <-- **logic error**! Should go to step 2.