# Assignment 9 - (Dockers)

## TOPIC: Docker


1. Scenario: You are building a microservices-based application using Docker. Design a Docker Compose file that sets up three containers: a web server container, a database container, and a cache container. Ensure that the containers can communicate with each other properly.


In [None]:
version: '3'
services:
  webserver:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - 80:80
    depends_on:
      - database
      - cache

  database:
    image: postgres:latest
    environment:
      - POSTGRES_USER=myuser
      - POSTGRES_PASSWORD=mypassword
      - POSTGRES_DB=mydatabase

  cache:
    image: redis:latest


            
#run the following command to start the containers
#docker-compose up


2. Scenario: You want to scale your Docker containers dynamically based on the incoming traffic. Write a Python script that utilizes Docker SDK to monitor the CPU usage of a container and automatically scales the number of replicas based on a threshold.

In [None]:
import docker
import psutil
import time

# Define the Docker API client
client = docker.from_env()

# Define the container information
container_name = "my_container"
threshold = 70  # CPU usage threshold for scaling (in percentage)

# Monitor and scale the container
while True:
    # Get the CPU usage of the container
    container = client.containers.get(container_name)
    container_stats = container.stats(stream=False)
    cpu_percent = container_stats["cpu_stats"]["cpu_usage"]["total_usage"] / container_stats["cpu_stats"]["system_cpu_usage"] * 100

    # Scale the container based on CPU usage
    if cpu_percent > threshold:
        # Scale up the container
        client.containers.run(
            image=container.image,
            detach=True
        )
        print("Scaled up container due to high CPU usage:", container_name)

    time.sleep(10)  # Adjust the sleep duration as needed

#run the following command  before running the script.   
#pip install docker

3. Scenario: You have a Docker image stored on a private registry. Develop a script in Bash that authenticates with the registry, pulls the latest version of the image, and runs a container based on that image.

In [None]:
#!/bin/bash

# Registry authentication credentials
registry_url="your-registry-url"
username="your-username"
password="your-password"

# Image details
image_name="your-image-name"
container_name="your-container-name"

# Authenticate with the private registry
docker login -u "$username" -p "$password" "$registry_url"

# Pull the latest version of the image
docker pull "$registry_url/$image_name"

# Run a container based on the latest image
docker run -d --name "$container_name" "$registry_url/$image_name"


#You can save the script in a file, e.g., pull_and_run.sh, and execute it using bash pull_and_run.sh.
#'docker run' command to run a container

## TOPIC: Airflow



1. Scenario: You have a data pipeline that requires executing a shell command as part of a task. Create an Airflow DAG that includes a BashOperator to execute a specific shell command.

In [None]:
from datetime import datetime
from airflow import DAG
from airflow.operators.bash import BashOperator

# Define the DAG
dag = DAG(
    dag_id='execute_shell_command',
    start_date=datetime(2023, 7, 1),
    schedule_interval='0 0 * * *',  # Run once daily at midnight
)

# Define the BashOperator task
execute_shell_command = BashOperator(
    task_id='execute_shell_command_task',
    bash_command='your_shell_command_here',
    dag=dag,
)

# Set task dependencies
execute_shell_command  # No dependencies

#Save the script as a Python file, e.g., execute_shell_command_dag.py, and place it in your Airflow DAG
#directory. Airflow will automatically detect the DAG and make it available for scheduling and execution.

2. Scenario: You want to create dynamic tasks in Airflow based on a list of inputs. Design an Airflow DAG that generates tasks dynamically using PythonOperator, where each task processes an element from the input list.


In [None]:
from datetime import datetime
from airflow import DAG
from airflow.operators.python import PythonOperator

# Define the function to be executed by each task
def process_input(input_value):
    # Perform processing logic for each input value
    print(f"Processing input: {input_value}")

# Define the input list
input_list = ['Input A', 'Input B', 'Input C']  # Modify with your actual input values

# Define the DAG
dag = DAG(
    dag_id='dynamic_task_generation',
    start_date=datetime(2023, 7, 1),
    schedule_interval=None,  # Set to None to disable automatic scheduling
)

# Generate dynamic tasks using PythonOperator
for input_value in input_list:
    task_id = f'task_{input_value}'
    task = PythonOperator(
        task_id=task_id,
        python_callable=process_input,
        op_kwargs={'input_value': input_value},
        dag=dag,
    )

    # Set task dependencies
    task  # No dependencies



3. Scenario: You need to set up a complex task dependency in Airflow, where Task B should start only if Task A has successfully completed. Implement this dependency using the "TriggerDagRunOperator" in Airflow.

In [None]:
from datetime import datetime
from airflow import DAG
from airflow.operators.dagrun import TriggerDagRunOperator
from airflow.operators.dummy import DummyOperator

# Define the DAG
dag = DAG(
    dag_id='complex_task_dependency',
    start_date=datetime(2023, 7, 1),
    schedule_interval=None,  # Set to None to disable automatic scheduling
)

# Define Task A
task_a = DummyOperator(
    task_id='task_a',
    dag=dag,
)

# Define Task B
task_b = DummyOperator(
    task_id='task_b',
    dag=dag,
)

# Define the TriggerDagRunOperator to trigger Task B after Task A
trigger_task_b = TriggerDagRunOperator(
    task_id='trigger_task_b',
    trigger_dag_id='complex_task_dependency',
    dag=dag,
)

# Set task dependencies
task_a >> trigger_task_b >> task_b


## TOPIC: Sqoop


1. Scenario: You want to import data from an Oracle database into Hadoop using Sqoop, but you only need to import specific columns from a specific table. Write a Sqoop command that performs the import, including the necessary arguments for column selection and table mapping.

In [None]:
sqoop import \
--connect jdbc:oracle:thin:@<hostname>:<port>/<database> \
--username <username> \
--password <password> \
--table <table_name> \
--columns "<column1>,<column2>,<column3>" \
--target-dir <target_directory> \
--as-parquetfile \
--num-mappers 4


2. Scenario: You have a requirement to perform an incremental import of data from a MySQL database into Hadoop using Sqoop. Design a Sqoop command that imports only the new or updated records since the last import.

In [None]:
sqoop import \
--connect jdbc:mysql://<hostname>:<port>/<database> \
--username <username> \
--password <password> \
--table <table_name> \
--target-dir <target_directory> \
--as-parquetfile \
--incremental append \
--check-column <check_column> \
--last-value <last_import_value> \
--num-mappers 4


3. Scenario: You need to export data from Hadoop to a Microsoft SQL Server database using Sqoop. Develop a Sqoop command that exports the data, considering factors like database connection details, table mapping, and appropriate data types.

In [None]:
sqoop export \
--connect "jdbc:sqlserver://<hostname>:<port>;database=<database>" \
--username <username> \
--password <password> \
--table <table_name> \
--export-dir <export_directory> \
--input-fields-terminated-by '\t' \
--input-lines-terminated-by '\n' \
--num-mappers 4
