# Debugging Infinite Loops

This notebook contains a few tips on how to write, debug, or simply avoid (infinite) while loops in python.

### Avoiding while loops


Some problems can be written either using a `while` loop or a `for` loop.

If possible, it is usually recommended to go with a `for` loop.

As a example, consider the task to print the numbers between `n0` and `n1` in increments of `d`.
To start with, let `n0=20`, `n1=50`, `d=3`.


In [None]:
n0 = 20
n1 = 50
d = 3

In [None]:
# Using a `while True` loop
x = n0
while True:
    print(x)
    x = x + d
    if x >= n1:
        break

In [None]:
# Using a normal `while` loop
x = n0
while x < n1:
    print(x)
    x = x + d

In [None]:
# Using a `for` loop:
for x in range(n0, n1, d):
    print(x)

For the values of `n0`, `n1`, and `d` used above, these loops behave the same.

However, if these values change (e.g. as results of previous computations), it could e.g. happen that a negative value `d=-5` is provided.
In this case, only the `for` loop will terminate.


*Note: Here, the `for` loop is also easiest to read, but that might differ from case to case.*

### Debugging infinite loops

Of course, it is often not possible to avoid `while` loop altogether (e.g. in policy iteration).
If the body of the loop contains a mistake, this might lead to infinite loops that don't terminate.

In these cases, the following tips might help to debug the issue:

* Learn how to interrupt running code!
    * In jupyter, click `interrupt`
    * In the terminal, type `ctrl + c`
    * In a debugger, click `stop` or `disconnect` etc.

<!-- -->

* Double check your `while` condition!
    * E.g. if you are waiting for an iterative computation to converge, your tolerance might be too small, or you might have mixed up a `<` and a `>`.

<!-- -->

* Convert your `while` loops to `for` loops!
    * Sometimes it makes sense to introduce a `max_iteration` parameter set to a high number,
    and terminate the loop if this number of iterations is reached.
    This is common practice in many optimization algorithms.
    * If this is not feasible,
    you can *temporarily* replace `while XXX` with `for i in range(100)` to make sure the loop terminates.
    Afterwards you can check the results of the computations in the loop.

<!-- -->

* Keep track of your `while`-condition!

    Consider a loop of the form

    ```
    while max(list_of_errors) > SOME_TOLERANCE:
        list_of_errors = do_some_computations()
    ```

    To get a better idea of what's going on, you can print `max(list_of_errors)` or `list_of_errors` after every iteration.
    This will show e.g. if the error is even decreasing (slowly) or something completely wrong is happening.

<!-- -->

* Use a debugger!

    The most effective way to debug most issues is to use a debugger.
    In vscode this can be done by pressing `F5` with a python file open, or using the command `Debug Cell` in a Jupyter notebook.
    Other editors will have different commands/buttons/shortcuts.

    With a debugger, you can set a breakpoint inside your loop body and inspect variables, change variable values, step through the code, and more.
 

To practice, let's debug the following infinite loop in the policy evaluation for the "Stand" policy.

In [1]:
# Define constants
BUST = 27
DEALER_CARDS = list(range(20, 27))
REWARD_WIN = 10
REWARD_DRAW = 0
REWARD_LOST = -10

# Choose a small threshold
THETA = 1e-18

# Initialize values at 0
# We use one extra slot for the terminal state after going bust
values = [0 for s in range(BUST + 1)]

while True:
    Delta = 0
    # We do not need to update the terminal state (BUST)
    # Iterate over all other states
    for s in range(BUST):
        # Remember old value
        v = values[s]

        # Compute expected value and reward for action "stand"
        # Cards do not change -> always the same value
        newValue = v
        # Add the expected reward
        for dc in DEALER_CARDS:
            if s > dc:
                reward = REWARD_WIN
            elif s == dc:
                reward = REWARD_DRAW
            else:
                reward = REWARD_LOST
            # We can ignore the new state value since it is always the terminal one (=0)
            newValue += 1/len(DEALER_CARDS) * reward
        
        # update values, Delta
        Delta = max(Delta, abs(v - newValue))
        values[s] = newValue

    # stop if no significant change:
    if Delta < THETA:
        break

print(values)

[-10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -10.0, -8.571428571428571, -5.7142857142857135, -2.8571428571428568, -4.440892098500626e-16, 2.8571428571428563, 5.7142857142857135, 8.571428571428571, 0]
