## Task 2: Becoming a Data Detective

For the SQL exploration I will use duckdb as a SQL engine because of its easy of use and setup.


Importing duckdb and creating views for the given CSV. This makes writing SQL a bit more easier.

In [None]:
# import the duckdb 
import duckdb

# Connect to an in-memory DuckDB database
con = duckdb.connect(database=':memory:')

# Create views for the CSV files
con.execute("CREATE VIEW maintenance AS SELECT * FROM '../data/raw/maintenance_events.csv'")
con.execute("CREATE VIEW manufacturing AS SELECT * FROM '../data/raw/manufacturing_factory_dataset.csv'")
con.execute("CREATE VIEW operator AS SELECT * FROM '../data/raw/operators_roster.csv'")

In [53]:
query = """
SELECT 
    *
FROM manufacturing 
"""


df = con.execute(query).df()
print(df.head())

            timestamp    factory_id line_id    shift product_id  \
0 2025-11-03 06:00:00  FRA-PLANT-01  Line-A  Shift-1   P-Widget   
1 2025-11-03 06:00:00  FRA-PLANT-01  Line-B  Shift-1   P-Widget   
2 2025-11-03 06:00:00  FRA-PLANT-01  Line-C  Shift-1   P-Widget   
3 2025-11-03 06:15:00  FRA-PLANT-01  Line-A  Shift-1   P-Widget   
4 2025-11-03 06:15:00  FRA-PLANT-01  Line-B  Shift-1   P-Widget   

           order_id  planned_qty  produced_qty  scrap_qty  defects_count  ...  \
0  WO-20251103-1860           47            41          1              0  ...   
1  WO-20251103-9666           67             2          0              0  ...   
2  WO-20251103-8513           59            58          3              2  ...   
3  WO-20251103-5297           53            48          2              2  ...   
4  WO-20251103-9006           63            57          3              3  ...   

  machine_state      downtime_reason  maintenance_type  maintenance_due_date  \
0       Running               

# 1. What’s the total maintenance cost?

The total cost comes out at **total_maintenance_cost:** 169389.62

In [None]:
query = """
SELECT 
    SUM(cost_eur) AS total_maintenance_cost
FROM maintenance 
"""


df = con.execute(query).df()
print(df)

##  2. How many minutes of downtime have there been?

Total downtime is **6180 minutes**. 

The query in itself is simple, but I first checked the min, max values to make sure there were no negative values, checked if there were NULL rows: 

```
SELECT  
    COUNT(*) AS total_rows,
    COUNT(downtime_min) AS non_null_rows,
    MIN(downtime_min) AS min_val,
    MAX(downtime_min) AS max_val,
    SUM(downtime_min) AS total_downtime
FROM maintenance
```

I also doubled checked my results by comparing difference b/w the start and end time of maintenance by using the following query:

```
SELECT  
    date_diff('minute', start_time, end_time) AS calculated_downtime,
    downtime_min
FROM maintenance 
```


In [40]:
query = """
SELECT  
    SUM(downtime_min) AS total_downtime
FROM maintenance 
"""


df = con.execute(query).df()
print(df)

   total_downtime
0          6180.0


## 3.How many maintenance events occurred?

**Total maintenance event occurred:** 94

In [41]:
query = """
SELECT  
    COUNT(*) AS maintenance_event_count
FROM maintenance 
"""


df = con.execute(query).df()
print(df)

   maintenance_event_count
0                       94


## 4. How many breakdowns (unplanned) happened?

**Total Unplanned Events:** 23


In [44]:
query = """
SELECT  
    count(*) AS unplanned_maintenance_events
FROM maintenance
WHERE reason = 'Unplanned Breakdown' 
"""


df = con.execute(query).df()
print(df)

   unplanned_maintenance_events
0                            23


## 5. What’s the average downtime per event?

**Average Downtime (min):** 65.7 minutes

In [48]:
query = """
SELECT
    count(*) AS planned_maintenance_events,
    AVG(downtime_min) AS average_downtime
FROM maintenance
where downtime_min > 0
"""


df = con.execute(query).df()
print(df)

   planned_maintenance_events  average_downtime
0                          94         65.744681
