 # Why Connecting to SQL Database Using Python
Being able to connect to and interact with SQL databases is a key part 
of data engineering. Understanding the content of this notebook is an integral 
part of successful completion of the module "Connecting to SQL Databases Using 
Python". The module focuses on giving learners practical skills to automate and 
manage data workflows. Imagine handling large amounts of data stored in 
SQL databases. Manually querying and extracting this data can be slow 
and prone to mistakes. As a data engineer, automating these tasks saves 
time and reduces errors. Python makes this easier with libraries like 
`pyodbc`, `pandas`, and `sqlalchemy`, helping you connect to databases, 
run queries, and handle data efficiently. Think of a scenario where the 
sales team needs regular reports from a company’s SQL database. By using 
Python to automate data extraction and reporting, you create a process 
that’s quicker and more reliable. Python also lets you adjust the process 
as needed, making it easier to adapt to new requirements. This module 
uses a Jupyter Notebook with five activities, starting with the basics 
of connecting to databases and progressing to more advanced tasks. The 
final stretch activity involves adding new features to an existing Python 
pipeline that connects to an SQL database on ACG. Using `pyodbc`, the pipeline 
extracts data, runs SQL queries from `.SQL` files, and writes results back to 
the database and as `.txt` files. By completing this module, learners build 
skills that reflect real-world tasks. These abilities are valuable for 
data engineering, data analytics, and data science roles, where Python is 
used for data handling and automation. The goal of this module is to help 
learners develop the skills they need to connect to SQL databases, automate 
workflows, and work more efficiently in data-driven environments.

# Instructions for Using this Notebook

This notebook is created to help you explore and understand the 
code used in the automated pipeline found in the script `run_etl_as_script.py`.

This notebook is a great starting point for you to experiment with while completing 
Activities 4, 5, and 6. If at any point you want to reset the notebook to its 
original working form, you can do so easily. To reset the notebook, simply run the script
`run_etl_as_package.py`. This will regenerate the 
notebook in its fully functional state.

The notebook is created automatically by a Python script called `manage_notebook.py`, 
which can be found inside the `etl_pipeline` folder.

The notebook has two main purposes:
1. To help you understand the code by running each cell. Each function is 
explained briefly before the code, allowing you to see how the parts work 
step by step.
2. To support you in Activities 4, 5, and 6 in the notebook 
`Practical_Activities_Notebook.ipynb`. This notebook is a useful 
starting point where you can try out and edit the code without 
worrying about breaking the pipeline. 

In Activity 6, you will be challenged to modify the script `manage_notebook.py` 
(which is the script that generates this notebook) to include new
functionalities that you will embed into the pipeline to ensure 
that any new functions you add to the pipeline `run_etl_as_script.py`
are included in the automatic generation of this notebook.

# Understanding the Tools: SQLAlchemy and pyodbc

SQLAlchemy and pyodbc are two tools that help Python connect 
to SQL databases, but they are not the only ones. There are other 
tools like psycopg2 for PostgreSQL, MySQL Connector for MySQL, and 
sqlite3 for SQLite. However, SQLAlchemy and pyodbc are widely used 
and versatile for different types of databases, making them useful 
to learn. In this module, we will focus on pyodbc because it is simple 
to use, easy to set up, and works directly with SQL queries. This makes 
it a practical choice for connecting to databases and building automated 
pipelines without adding extra complexity.

**SQLAlchemy** lets you work with databases by writing Python 
code instead of SQL. It uses a method called Object Relational Mapping (ORM), 
which means you can create and manage database tables by working with Python 
objects and code. This can make managing databases easier and help with more 
complex projects. However, learning SQLAlchemy can take time, and it might feel
more complicated for simple tasks.

**pyodbc** is simpler and allows you to connect directly to a database and 
run SQL queries from Python. It doesn’t add extra layers or tools – you write 
the SQL yourself and send it to the database. This makes pyodbc easy to use 
and quick to learn. However, because you have to write the SQL manually, there’s 
a higher chance of small errors, and the code can get repetitive for bigger projects.

In this module, we will use **pyodbc** because it is straightforward, 
easy to set up, and works well for running SQL queries and building 
automated pipelines.


  
# Connecting to SQL Database using `pyodbc` Library
This notebook is a step-by-step guide to help you learn how to 
use Python functions and libraries to connect to an SQL database 
using the `pyodbc` library. Each section of the notebook includes 
an explanation of the code, which you can run in order to understand 
the workflow step by step.
## Preparing the SQL Database
Before using Python, we need to decide which SQL database to connect to. 
For this exercise, we will use the A Cloud Guru (ACG) sandbox to create an SQL database. 
Once the database is set up, we will obtain the necessary credentials to connect to it using Python. 
A detailed guide on how to create the SQL database in the ACG sandbox is available in a GitHub 
repository called [DE-sql-learning-environment-Azure](https://github.com/Corndel/DE-sql-learning-environment-Azure). 
Following the instructions provided in the repository from step **1 Create the ACG Sandbox for SQL learning** to step **4 Create tables in Azure SQL database and insert data**, by the completion of which you will have created an SQL database called **sakila**. We will need the **connection strings** for this database in order to be able to connect to it using Python library `pyodbc`. 
## Steps to Acquire the Connection Strings
- Visit the SQL databases in the Azure portal (you can search for this service using the search 
box at the top and type SQL).

![step-1](images/step1.png)

- In the list of databases that appears, click on the single database 
**sakila (sakilayaq6nqkx2fcqo/sakila)**.

![step-2](images/step2.png)

- Click on **Settings** and then on **Connection Strings** in the drop down menu.

![step-3](images/step3.png)

- Go to the `ODBC`tab to see the connection string.

![step-4](images/step4.png)

- You will need to copy the underlined parts of the **Connection String** and store 
it somewhere safely to be used when running the pipeline.

![step-5](images/step5.png)

- Note that the password *{your_password_here}* is a placeholder within the connection strings, 
which needs to be replaced with the actual password of the database.

- When running the pipeline you will be prompted to enter these three pieces of information:

    * Server name (the longest underlined part in the above screenshot, which you will be different 
    every time you create sakila). 
    * Username which for this database is **corndeladmin**.
    * Passoword, which is **Password01**.


## **Sakila** Database Connection & Creating an ETL Pipeline

This guide will walk you through the steps (using the `pyodbc` library) to connect to the SQL 
database we have just created. You will learn how to extract data from the database, manage 
tables within it, and write the processed data back into the database. Follow each step carefully 
to gain a clear understanding of how to build an ETL pipeline when you are using Python to work 
with SQL databases on virtual machines or on remote platforms online.


### The Python Libraries

#### 1. `pyodbc`
It is a Python library for connecting to SQL databases using ODBC drivers. 
It allows communication between Python and SQL databases, making it essential 
for running SQL queries and managing database connections.

---

#### 2. `pandas`
This is a Python library for data manipulation and analysis. It provides powerful tools 
to handle tabular data, such as reading, writing, and processing datasets extracted from 
the database.

---

#### 3. `os`
This is a standard Python library for interacting with the operating system. 
It enables tasks like file and directory manipulation, crucial for managing input/output 
files during the ETL process.

---

#### 4. `shutil`
It is a standard Python library for high-level file operations. It helps in efficiently 
clearing or organising folders by removing directories and their contents.

---

#### 5. `logging`
This library is used for tracking events during code execution. It provides detailed 
logs for debugging and monitoring the ETL pipeline, ensuring transparency and easier 
troubleshooting.

#### 5. `getpass`
The `getpass` library in Python securely prompts the user for sensitive information, 
such as passwords, without displaying the input on the screen. It is ideal for creating 
secure, interactive command-line applications.

Let's run the following cell to import all of these libraries. This code after the importation 
uses the `logging` library to set up logging for the ETL pipeline to track events and errors. 
It writes logs to a file named `etl_pipeline.log` and displays them on the console. The log 
messages include the time, log level, and the message for clear and detailed tracking 
of the pipeline’s progress.


In [2]:
import pyodbc
import pandas as pd
import os
import shutil
import logging
import getpass
from etl_pipeline import manage_notebook
logging.basicConfig(level=logging.INFO, 
                    format='%(asctime)s - %(levelname)s - %(message)s',
                    handlers=[
                        logging.FileHandler("etl_pipeline.log"),
                        logging.StreamHandler()
                    ])

#### Function: `clear_folder`

This function removes all files and subdirectories within a specified folder. 
It iterates through each item in the folder, deleting files, symbolic links, and entire 
directories as needed. The process is logged for transparency, with messages indicating 
when the operation starts, completes, or encounters an error. This function is particularly 
useful for ensuring a clean workspace before running an ETL pipeline or similar processes. 
We will use this function to ensure that our target folder is cleared before the ETL output 
is written into it. Our target folder for this function is called `reports`, which will be 
used when we run the pipeline.

In [3]:
def clear_folder(folder_path):
    """
    Clears the contents of a specified folder by deleting all files, subdirectories, 
    and symbolic links within it.

    This function recursively deletes all items inside the given folder path, including:
    - Regular files
    - Symbolic links
    - Subdirectories and their contents

    Args:
        folder_path (str): The path to the folder whose contents need to be cleared.

    Logging:
        - Logs the start of the clearing process at the INFO level.
        - Logs a success message when the folder is cleared at the INFO level.
        - Logs any errors encountered during the process at the ERROR level.

    Raises:
        Exception: If any error occurs while attempting to delete files, subdirectories, 
        or symbolic links.

    Examples:
        >>> clear_folder("/path/to/directory")
        INFO: Starting to clear the contents of folder: /path/to/directory
        INFO: Contents of folder /path/to/directory have been cleared.

    Note:
        - The function does not delete the folder itself, only its contents.
        - Ensure that the folder exists before calling this function. If the folder does not 
          exist, an error may be raised.
        - Use with caution as this action is irreversible.

    """
    logging.info(f"Starting to clear the contents of folder: {folder_path}")
    try:
        for item in os.listdir(folder_path):
            item_path = os.path.join(folder_path, item)
            if os.path.isfile(item_path) or os.path.islink(item_path):
                os.unlink(item_path)
            elif os.path.isdir(item_path):
                shutil.rmtree(item_path)
        logging.info(f"Contents of folder {folder_path} have been cleared.")
    except Exception as e:
        logging.error(f"Error while clearing folder {folder_path}: {e}")

#### Function: `manage_tables`
The `manage_tables` function is responsible for resetting the structure of specific database tables. 
It connects to the database using the provided connection string and executes SQL scripts to drop 
existing tables (`payment_summary_table` and `duration_summary_table`) and recreate them. 
The function reads the SQL commands from files located in the `queries` folder, ensuring the 
database is prepared for new data. It handles errors, such as database connection issues or 
missing SQL files, and logs the process for transparency and troubleshooting.

In [4]:
def manage_tables(connection_string):
    """
    Drops and recreates database tables by executing SQL scripts.

    This function reads SQL scripts from predefined file paths to drop existing tables 
    and create new ones. It ensures that the database schema is updated by executing 
    the scripts sequentially.

    Args:
        connection_string (str): A valid database connection string used to connect to 
                                 the database.

    Logging:
        - Logs the start of the table management process.
        - Logs success after recreating tables.
        - Logs errors encountered during execution.

    SQL File Structure:
        - Drop Table SQL Files:
            - 'sqlFiles/tableManagement/drop_payment_summary_table.sql'
            - 'sqlFiles/tableManagement/drop_duration_summary_table.sql'
            - 'sqlFiles/tableManagement/drop_profitable_actors_table.sql'
        - Create Table SQL Files:
            - 'sqlFiles/tableManagement/create_payment_summary_table.sql'
            - 'sqlFiles/tableManagement/create_duration_summary_table.sql'
            - 'sqlFiles/tableManagement/create_profitable_actors_table.sql'

    Raises:
        Exception: If any error occurs during the database operations.

    Examples:
        >>> manage_tables("Driver={ODBC Driver 18 for SQL Server};"
                          "Server=server_name;"
                          "Database=database_name;"
                          "Uid=username;Pwd=password;")
    """

    logging.info("Starting to manage tables in the database.")
    try:
        connection = pyodbc.connect(connection_string)
        cursor = connection.cursor()
        drop_payment_table_file = os.path.join('sql_files/table_management', 'drop_payment_summary_table.sql')
        drop_duration_table_file = os.path.join('sql_files/table_management', 'drop_duration_summary_table.sql')
        drop_profitable_table_file = os.path.join('sql_files/table_management','drop_profitable_actors_table.sql')
        create_payment_table_file = os.path.join('sql_files/table_management', 'create_payment_summary_table.sql')
        create_duration_table_file = os.path.join('sql_files/table_management', 'create_duration_summary_table.sql')
        create_profitable_actors_table_file = os.path.join('sql_files/table_management','create_profitable_actors_table.sql')
        def execute_sql_file(file_path):
            with open(file_path, 'r') as file:
                sql = file.read()
                cursor.execute(sql)
        execute_sql_file(drop_payment_table_file)
        execute_sql_file(drop_duration_table_file)
        execute_sql_file(drop_profitable_table_file)
        connection.commit()
        execute_sql_file(create_payment_table_file)
        execute_sql_file(create_duration_table_file)
        execute_sql_file(create_profitable_actors_table_file)
        connection.commit()
        logging.info("Tables:\n\n                                         payment_summary_table\n"         
                     "                                         duration_summary_table\n"
                     "                                         profitable_actors_table \n\n"
                     "                                 have been recreated in the database.")
    except pyodbc.Error as e:
        logging.error(f"Error managing tables: {e}")
    except FileNotFoundError as e:
        logging.error(f"SQL file not found: {e}")
    finally:
        if 'connection' in locals() and connection:
            connection.close()

#### Function: `calculate_payments`
The `calculate_payments` function reads an SQL query from a specified file, executes it on 
a database, and retrieves the results as a pandas DataFrame. The function connects to the 
database using a given connection string and processes the query to calculate a payments summary, 
including columns such as `Records`, `Minimum`, `Maximum`, `Total`, and `Average`. It logs progress 
and errors for transparency and closes the database connection after execution. The result is 
returned as a structured DataFrame for further analysis.

In [5]:
def calculate_payments(sql_file_path, connection_string):
    """
    Calculates a summary of payments using a SQL query and returns a pandas DataFrame.

    This function executes a SQL query from the specified file, processes the query 
    results, and generates a summary DataFrame with the following columns:
    - Records
    - Minimum
    - Maximum
    - Total
    - Average

    Args:
        sql_file_path (str): Path to the SQL file containing the query.
        connection_string (str): Database connection string.

    Returns:
        pandas.DataFrame: A DataFrame containing the payments summary.

    Raises:
        pyodbc.Error: If there is an error while executing the query or connecting to the database.

    Examples:
        >>> payments_df = calculate_payments("queries/payments.sql", connection_string)
    """
    logging.info("Starting to calculate payments summary.")
    try:
        connection = pyodbc.connect(connection_string)
        cursor = connection.cursor()
        with open(sql_file_path, 'r') as file:
            sql_query = file.read()
        cursor.execute(sql_query)
        rows = cursor.fetchall()
        payments_summary = pd.DataFrame((tuple(t) for t in rows)) 
        payments_summary.columns = ['Records', 'Minimum', 'Maximum', 'Total', 'Average']
        logging.info("Payments summary successfully retrieved.")
    except pyodbc.Error as e:
        logging.error(f"Error executing payments query: {e}")
    finally:
        if 'connection' in locals() and connection:
            connection.close()
    return payments_summary

#### Function: `calculate_duration`

This function retrieves a summary of film durations from an SQL database by executing a query 
provided in an external SQL file. It connects to the database using the given connection string, 
reads the query from the specified file, and runs it. The results are then stored in a pandas 
DataFrame with columns: `Minimum`, `Maximum`, `Total`, and `Average`. Finally, it logs the process 
and ensures the database connection is closed.

In [6]:
def calculate_duration(sql_file_path, connection_string):
    """
    Calculates a summary of film durations using a SQL query and returns a pandas DataFrame.

    This function executes a SQL query from the specified file, processes the query 
    results, and generates a summary DataFrame with the following columns:
    - Minimum
    - Maximum
    - Total
    - Average

    Args:
        sql_file_path (str): Path to the SQL file containing the query.
        connection_string (str): Database connection string.

    Returns:
        pandas.DataFrame: A DataFrame containing the duration summary.

    Raises:
        pyodbc.Error: If there is an error while executing the query or connecting to the database.

    Examples:
        >>> duration_df = calculate_duration("queries/filmduration.sql", connection_string)
    """
    logging.info("Starting to calculate duration summary.")
    try:
        connection = pyodbc.connect(connection_string)
        cursor = connection.cursor()
        with open(sql_file_path, 'r') as file:
            sql_query = file.read()
        cursor.execute(sql_query)
        rows = cursor.fetchall()
        duration_summary = pd.DataFrame((tuple(t) for t in rows)) 
        duration_summary.columns = ['Minimum', 'Maximum', 'Total', 'Average']
        logging.info("Duration summary successfully retrieved.")
    except pyodbc.Error as e:
        logging.error(f"Error executing duration query: {e}")
    finally:
        if 'connection' in locals() and connection:
            connection.close()
    return duration_summary

#### Function: `calculate_profitable_actors`

The `calculate_profitable_actors` function retrieves a list of the most profitable actors from 
a database based on an SQL query. It connects to the database using the provided connection string, 
reads the SQL query from a file, and executes it to fetch the results. These results are converted 
into a pandas DataFrame with clear column names: `ActorID`, `FirstName`, `LastName`, and 
`TotalSale`. The function includes error handling to log any issues and ensures the database 
connection is closed afterwards to manage resources efficiently. This function is a great way 
for learners to experiment with combining Python, SQL, and pandas to process and analyse data 
effectively.

In [7]:
def calculate_profitable_actors(sql_file_path, connection_string):
    """
    Retrieves a summary of the most profitable actors using a SQL query and returns a pandas DataFrame.

    This function executes a SQL query from the specified file, processes the query 
    results, and generates a summary DataFrame with the following columns:
    - ActorID
    - FirstName
    - LastName
    - TotalSale

    Args:
        sql_file_path (str): Path to the SQL file containing the query.
        connection_string (str): Database connection string.

    Returns:
        pandas.DataFrame: A DataFrame containing the summary of profitable actors.

    Raises:
        pyodbc.Error: If there is an error while executing the query or connecting to the database.

    Examples:
        >>> profitable_actors_df = calculate_profitable_actors("queries/profitable_actors.sql", connection_string)
    """
    logging.info("Starting to calculate profitable actors.")
    try:
        connection = pyodbc.connect(connection_string)
        cursor = connection.cursor()
        with open(sql_file_path, 'r') as file:
            sql_query = file.read()
        cursor.execute(sql_query)
        rows = cursor.fetchall()
        profitable_actors = pd.DataFrame((tuple(t) for t in rows)) 
        profitable_actors.columns = ['ActorID', 'FirstName', 'LastName', 'TotalSale']
        logging.info("Profitable actors successfully retrieved.")
    except pyodbc.Error as e:
        logging.error(f"Error executing duration query: {e}")
    finally:
        if 'connection' in locals() and connection:
            connection.close()
    return profitable_actors

#### Function: `write_dataframe_to_db`

The `write_dataframe_to_db` function inserts the rows of a pandas DataFrame into a specified 
SQL database table. It connects to the database using the provided connection string, then 
iterates through each row of the DataFrame and executes an SQL `INSERT` statement to write the data. 
The function dynamically matches column names and values using placeholders to ensure compatibility. 
It also handles errors gracefully by logging issues and ensuring the database connection is properly 
closed after the operation.

In [8]:
def write_dataframe_to_db(dataframe, table_name, connection_string):
    """
    Inserts rows from a pandas DataFrame into a specified database table.

    This function takes a pandas DataFrame, converts its rows into SQL INSERT statements, 
    and writes them to the specified table in the database.

    Args:
        dataframe (pandas.DataFrame): The DataFrame containing the data to insert.
        table_name (str): The name of the database table to insert data into.
        connection_string (str): A valid database connection string used to connect to 
                                 the database.

    Logging:
        - Logs the start of the data insertion process.
        - Logs success after the data is written.
        - Logs errors encountered during execution.

    Raises:
        Exception: If any error occurs while inserting data into the database.

    Examples:
        >>> import pandas as pd
        >>> data = {'column1': [1, 2], 'column2': ['A', 'B']}
        >>> df = pd.DataFrame(data)
        >>> write_dataframe_to_db(df, "my_table", 
                                  "Driver={ODBC Driver 18 for SQL Server};"
                                  "Server=server_name;"
                                  "Database=database_name;"
                                  "Uid=username;Pwd=password;")
    """
    logging.info(f"Starting to write DataFrame to table: {table_name}")
    try:
        connection = pyodbc.connect(connection_string)
        cursor = connection.cursor()
        # Insert rows into the database
        for index, row in dataframe.iterrows():
            placeholders = ', '.join(['?'] * len(row))
            columns = ', '.join(dataframe.columns)
            sql = f"INSERT INTO {table_name} ({columns}) VALUES ({placeholders})"
            cursor.execute(sql, tuple(row))
        connection.commit()
        logging.info(f"Data successfully written to table: {table_name}.")
    except pyodbc.Error as e:
        logging.error(f"Error writing to database table {table_name}: {e}")
    finally:
        if 'connection' in locals() and connection:
            connection.close()

#### Function: `write_local_txt_output`

This function saves the contents of a pandas DataFrame as a tab-separated text file in a specified 
folder. It first ensures the folder exists (creating it if necessary), then writes the DataFrame to 
a file with the given name. The file does not include row indices, making it cleaner for sharing or 
further processing. If successful, the function logs the file’s location and returns its path. 
In case of an error, it logs the issue and returns `None`, ensuring clear feedback for 
troubleshooting.

In [9]:
def write_local_txt_output(dataframe, folder_path, file_name):
    """
    Writes a pandas DataFrame to a local text file in tab-delimited format.

    This function ensures the target folder exists (creates it if necessary), then writes 
    the given DataFrame to a text file with tab-separated values. The resulting file is saved 
    in the specified folder with the provided file name.

    Args:
        dataframe (pandas.DataFrame): The DataFrame to write to the text file.
        folder_path (str): The path to the folder where the file will be saved. If the folder 
                           does not exist, it will be created.
        file_name (str): The name of the text file to create.

    Returns:
        str: The full path to the created file if the operation is successful.
        None: If an error occurs during the operation.

    Logging:
        - Logs the start of the file writing process at the INFO level.
        - Logs the successful completion of the operation at the INFO level.
        - Logs any errors encountered during the operation at the ERROR level.

    Raises:
        Exception: Logs any error that occurs during folder creation, file writing, or 
                   DataFrame processing.

    Examples:
        >>> import pandas as pd
        >>> data = {'Column1': [1, 2], 'Column2': ['A', 'B']}
        >>> df = pd.DataFrame(data)
        >>> write_local_txt_output(df, "output_folder", "output_file.txt")
        INFO: Starting to write DataFrame to text file: output_file.txt
        INFO: Processed data successfully written to output_folder/output_file.txt

    Notes:
        - The file is written in tab-delimited format (`sep='\t'`).
        - The index is not included in the output file (`index=False`).
        - Ensure the DataFrame contains valid data before calling this function.

    """
    logging.info(f"Starting to write DataFrame to text file: {file_name}")
    try:
        os.makedirs(folder_path, exist_ok=True)
        file_path = os.path.join(folder_path, file_name)
        dataframe.to_csv(file_path, sep='\t', index=False)
        logging.info(f"Processed data successfully written to {file_path}")
        return file_path
    except Exception as e:
        logging.error(f"An error occurred while writing to text file {file_name}: {e}")
        return None

#### Running the pipeline

This section of the code serves as the entry point for the script. It begins by prompting the user 
to provide the SQL Server address, username, and password, ensuring secure input for the connection. 
A connection string is then constructed to establish communication with the `sakila` database using 
the specified SQL Server. The script prepares a target folder (`reports`) by clearing any existing 
content and processes the database tables using custom queries.

It performs the following tasks:
1. Executes the SQL query for payments and calculates summary data.
2. Executes the SQL query for film durations and calculates summary data.
3. Executes the SQL query for profitable actors and calculates a table of the most profitable actors.
4. Saves the results into three summary tables in the database: `payment_summary_table`, `duration_summary_table`, and `profitable_actors_table`.
5. Exports also the same results into local `.txt` files, stored in the `reports` folder.

The pipeline ensures that all relevant data is processed, stored, and made accessible for further use.


In [10]:
# Main block starts here

if __name__ == "__main__":
    server = input("Please enter the SQL Server address (hint: starts with tcp and ends with .net): ").strip()
    username = input("Please enter your Username:").strip()
    password = getpass.getpass("Please enter your Password: ").strip()
    connection_string =   str(
    f"Driver={{ODBC Driver 18 for SQL Server}};"
    f"Server={server},1433;"
    f"Database=sakila;"
    f"Uid={username};"
    f"Pwd={password};"
    f"Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;")
    target_folder = "reports"
    clear_folder(target_folder)
    manage_tables(connection_string)
    payments_df = calculate_payments("sql_files/queries/payments.sql", connection_string)
    duration_df = calculate_duration("sql_files/queries/film_duration.sql", connection_string)
    profitable_actors_df = calculate_profitable_actors("sql_files/queries/profitable_actors.sql", connection_string)
    write_dataframe_to_db(payments_df, "payment_summary_table", connection_string)
    write_dataframe_to_db(duration_df, "duration_summary_table", connection_string)
    write_dataframe_to_db(profitable_actors_df,"profitable_actors_table", connection_string)
    write_local_txt_output(payments_df, "reports", "payment_summary.txt")
    write_local_txt_output(duration_df, "reports", "duration_summary.txt")
    write_local_txt_output(profitable_actors_df, "reports", "profitable_actors.txt")

2025-01-10 09:10:57,307 - INFO - Starting to clear the contents of folder: reports
2025-01-10 09:10:57,309 - INFO - Contents of folder reports have been cleared.
2025-01-10 09:10:57,310 - INFO - Starting to manage tables in the database.
2025-01-10 09:11:02,517 - INFO - Tables:

                                         payment_summary_table
                                         duration_summary_table
                                         profitable_actors_table 

                                 have been recreated in the database.
2025-01-10 09:11:02,518 - INFO - Starting to calculate payments summary.
2025-01-10 09:11:05,181 - INFO - Payments summary successfully retrieved.
2025-01-10 09:11:05,400 - INFO - Starting to calculate duration summary.
2025-01-10 09:11:08,053 - INFO - Duration summary successfully retrieved.
2025-01-10 09:11:08,274 - INFO - Starting to calculate profitable actors.
2025-01-10 09:11:11,364 - INFO - Profitable actors successfully retrieved.
2025-01-10 09