# Logical Errors

Logical errors are issues where the code does not throw any exceptions, but there is something wrong with its outputs.

I thought pandas allowed declaring a list of values by column? Why is this not working as intended and the headers are getting their own row?

In [14]:
import pandas as pd

data = [
    ["customer_id", 1, 2, 3],
    ["name", "Alice", "Bob", "Charlie"],
    ["total_spent", 150.0, None, 300.5]
]

# Create DataFrame
customers_df = pd.DataFrame(data)
customers_df


Unnamed: 0,0,1,2,3
0,customer_id,1,2,3
1,name,Alice,Bob,Charlie
2,total_spent,150.0,,300.5


## Troubleshooting a Monte Carlo Simulation

In [None]:
from random import randint, choice

def random_door(): return randint(1, 3)

trial_count = 10000

stay_wins = 0
switch_wins = 0

for i in range(0, trial_count):
    prize_door = random_door()
    selected_door = random_door()
    opened_door = choice([d for d in range(1, 4) if d != selected_door and d != prize_door])
    switch_door = choice([d for d in range(1, 4) if d != opened_door])

    if selected_door == prize_door:
        stay_wins += 1

    if switch_door == prize_door:
        switch_wins += 1

print("STAY WINS: {}, SWITCH WINS: {}".format(
    stay_wins, switch_wins))

print("STAY WIN RATE: {}, SWITCH WIN RATE: {}".format(
    float(stay_wins)/float(trial_count), float(switch_wins)/float(trial_count)))

## Some Misbehaving Regular Expressions

I'm trying to break up the words in a sentence, but it is only returning everything as one word?

In [None]:
import re

# Helper function to break up words from a string
def break_up_words(str):
    return re.sub(r'[^\w\\s]', '', str.lower()).split()

break_up_words("Hello, this is a sentence I am trying to tokenize.")

I am trying to extract all websites from a doucment but I am not getting any matches.

In [None]:
import re
web_pattern = re.compile(r'(https?://)?(www\.)?([a-z0-9]+)\.(com,org,gov)')

urls = """
Here are a few websites below:

https://www.yawmanflight.com
http://microsoft.com
https://youtube.com
https://www.anaconda.com

These are non-commercial sites:
https://www.python.org
https://whitehouse.gov
"""

matches = web_pattern.finditer(urls)

for match in matches:
    print(match[0])

Here is another example where I'm looking for IP addresses in a log. My coworker said this regular expression `([0-9]+[.][0-9.]+)+` should work but it seems to be capturing the `32.5` seconds.

In [None]:
import re
ip_addr_pattern = re.compile(r'([0-9]+[.][0-9.]+)+')

log = """
[2025-08-14 02:15:34] INFO  JobScheduler - Starting nightly backup process
[2025-08-14 02:15:35] INFO  Connection from 192.168.54.23 established
[2025-08-14 02:15:35] INFO  Connection from 10.44.8.91 established
[2025-08-14 02:15:36] INFO  Transferring data to backup node at 172.16.3.144
[2025-08-14 02:15:42] WARN  Slow response from node 10.44.8.91 (32.5 seconds)
[2025-08-14 02:15:49] INFO  Backup chunk 1/5 completed from 192.168.54.23
[2025-08-14 02:15:53] INFO  Backup chunk 2/5 completed from 10.44.8.91
[2025-08-14 02:15:57] ERROR Transfer failed to node 172.16.3.144: connection timeout
[2025-08-14 02:16:03] INFO  Retrying transfer to node 172.16.3.144 (attempt 1)
[2025-08-14 02:16:08] INFO  Transfer to node 172.16.3.144 successful
[2025-08-14 02:16:15] INFO  Nightly backup process completed successfully
"""

matches = ip_addr_pattern.finditer(log)

for match in matches:
    print(match[0])

## Misbehaving SQL

Let's set up a database connection.

In [None]:
import requests
import pandas as pd
import sqlite3

open("company_operations.db", "wb") \
    .write(requests.get("https://github.com/thomasnield/anaconda_intro_to_sql/raw/refs/heads/main/company_operations.db").content)

conn = sqlite3.connect("company_operations.db")

Here is an example I messed up just coming up with the next example XD. Can you spot it?

In [None]:
sql = """
SELECT 'ORDER_DATE'
FROM CUSTOMER_ORDER
"""

pd.read_sql(sql, conn)


Next, why is my D/M/Y format for my date messed up?

In [None]:
sql = """
SELECT CUSTOMER_ORDER_ID, ORDER_DATE, strftime('%d/%M/%Y', ORDER_DATE) AS FORMATTED_DATE
FROM CUSTOMER_ORDER
"""

pd.read_sql(sql, conn)


I am a little confused why "Alpha Medical" with a CUSTOMER_ID of 1 is not showing up in this query. Maybe AI can help me out, even if it does not have access to the dataset itself?

In [None]:
sql = """
SELECT
CUSTOMER_ORDER_ID,
CUSTOMER.CUSTOMER_ID,
CUSTOMER_NAME,
ADDRESS,
CITY,
STATE,
ZIP,
ORDER_DATE,
PRODUCT_ID,
QUANTITY

FROM CUSTOMER, CUSTOMER_ORDER
WHERE CUSTOMER.CUSTOMER_ID = CUSTOMER_ORDER.CUSTOMER_ID
ORDER BY CUSTOMER.CUSTOMER_ID
"""

pd.read_sql(sql, conn)

Okay, let's now bring in the `PRODUCT` information too. Oh no, why are the records missing again?

In [None]:
sql = """
SELECT
CUSTOMER_ORDER_ID,
CUSTOMER.CUSTOMER_ID,
CUSTOMER_NAME,
ADDRESS,
CITY,
STATE,
ZIP,
ORDER_DATE,
PRODUCT.PRODUCT_ID,
QUANTITY,
PRICE

FROM CUSTOMER LEFT JOIN CUSTOMER_ORDER
ON CUSTOMER.CUSTOMER_ID = CUSTOMER_ORDER.CUSTOMER_ID

INNER JOIN PRODUCT
ON PRODUCT.PRODUCT_ID = CUSTOMER_ORDER.PRODUCT_ID
ORDER BY CUSTOMER.CUSTOMER_ID
"""

pd.read_sql(sql, conn)

Okay, one last thing I need to figure out. I'm aggregating the total revenue by customer. But why am I only getting a single record back? Ironically, this time it is only "Alpha Medical"? Help me out here, AI assist!

In [None]:
sql = """
SELECT
CUSTOMER.CUSTOMER_ID,
CUSTOMER_NAME,
SUM(PRICE * QUANTITY) AS TOTAL_REVENUE

FROM CUSTOMER LEFT JOIN CUSTOMER_ORDER
ON CUSTOMER.CUSTOMER_ID = CUSTOMER_ORDER.CUSTOMER_ID

LEFT JOIN PRODUCT
ON PRODUCT.PRODUCT_ID = CUSTOMER_ORDER.PRODUCT_ID
"""

pd.read_sql(sql, conn)

## EXERCISE

Try to investigate (with the help of an AI assistant) why SQLite is not producing yesterday's date below. 

In [None]:
sql = """
SELECT DATE('now') - 1 AS YESTERDAY
"""

pd.read_sql(sql, conn)
