## HW 1: Docker, SQL and Terraform

### Question 1

Run docker with the python:3.13 image. Use an entrypoint bash to interact with the container.

What's the version of pip in the image?

In [1]:
!docker run -it --entrypoint bash python:3.13

[?2004hroot@22d41d02a65b:/# ^C[?2004l
[?2004l
[?2004hroot@22d41d02a65b:/# 

In [2]:
!docker run --rm python:3.13 pip --version

pip 25.3 from /usr/local/lib/python3.13/site-packages/pip (python 3.13)


Answer: `25.3`

---

### Question 2

Given the following docker-compose.yaml, what is the hostname and port that pgadmin should use to connect to the postgres database?

services:
  db:
    container_name: postgres
    image: postgres:17-alpine
    environment:
      POSTGRES_USER: 'postgres'
      POSTGRES_PASSWORD: 'postgres'
      POSTGRES_DB: 'ny_taxi'
    ports:
      - '5433:5432'
    volumes:
      - vol-pgdata:/var/lib/postgresql/data

  pgadmin:
    container_name: pgadmin
    image: dpage/pgadmin4:latest
    environment:
      PGADMIN_DEFAULT_EMAIL: "pgadmin@pgadmin.com"
      PGADMIN_DEFAULT_PASSWORD: "pgadmin"
    ports:
      - "8080:80"
    volumes:
      - vol-pgadmin_data:/var/lib/pgadmin

volumes:
  vol-pgdata:
    name: vol-pgdata
  vol-pgadmin_data:
    name: vol-pgadmin_data

Answer(s):
1. `postgres:5432`
2. `db:5432`

The service name is `db` and the container name is `postgres`, so they both point to the same container. The important part is that it listens on port 5432.

---

### Question 3

For the trips in November 2025 (lpep_pickup_datetime between '2025-11-01' and '2025-12-01', exclusive of the upper bound), how many trips had a trip_distance of less than or equal to 1 mile?

To see how I loaded the `green taxi` data and created the `green_taxi_new` table, refer to the `green_taxi.ipynb` notebook in this folder. Once the table was created inside my pre-existing pgsql server, I solved the problem by running the following SQL query:

In [12]:
! uv add sqlalchemy pandas

[2mResolved [1m122 packages[0m [2min 4ms[0m[0m
[2mAudited [1m42 packages[0m [2min 0.87ms[0m[0m


In [13]:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://root:root@localhost:5432/ny_taxi')

In [14]:
query = """
SELECT COUNT(*) AS trip_count
FROM green_taxi_new
WHERE lpep_pickup_datetime >= '2025-11-01'
  AND lpep_pickup_datetime <  '2025-12-01'
  AND trip_distance <= 1;
"""

df_result = pd.read_sql(query, con=engine)
print(df_result)

   trip_count
0        8007


Answer: `8,007`

---

### Question 4

Which was the pick up day with the longest trip distance? Only consider trips with trip_distance less than 100 miles (to exclude data errors).

Answer: `2025-11-14`

---

### Question 5

Which was the pickup zone with the largest total_amount (sum of all trips) on November 18th, 2025?

Answer: `East Harlem North`

---

### Question 6

For the passengers picked up in the zone named "East Harlem North" in November 2025, which was the drop off zone that had the largest tip?

Answer: `Yorkville West`

---

### Question 7

Which of the following sequences, respectively, describes the workflow for:

1. Downloading the provider plugins and setting up backend,
2. Generating proposed changes and auto-executing the plan
3. Remove all resources managed by terraform`

Answer: `terraform init, terraform apply -auto-approve, terraform destroy`

---