## The Risk Measure DataSet generation engine codebase

This engine contains the following key components (these are *the* prompts used for ChatGPT!):

1.	First, a *singleton* object, *CurveManager*, that contains a list of curves identifiable by name and date. Each curve is a specialization of an abstract Curve class and the simplest of which has a vector of double values itemized by dates and when the object is called with a date, an interpolated value is returned. 

2.	Second, a singleton object, *ScenarioManager*, that contains a list of market scenarios identifiable by name and date. Each scenario is constructed by taking the complete list of curves from the CurveManager as the BASE, and perturb each curve up and down in a certain fashion, so that the scenario will contain three lists of the curves: the BASE, the UP perturbed, and the DOWN perturbed.

3.	Thirdly, a singleton object, *SecurityManager*, that contains a list of financial securities identifiable by an id. Each security is a specialization of an abstract class Security, constructed from a dictionary of <attribute, value>, and has a function, NPV, that takes a scenario object and returns a value.

4.	Last, a generic function that iterates through each security from the SecurityManager and each scenario from the ScenarioManager, and call the security's NPV function three times to obtain three values for each the perturbations contained in the scenario. The three values are then pushed into a data store with all the relevant identifiers.

5.	Piecing them together is a workflow manager, or *task dispatcher*, to use this code base: it takes a JSON document as a configuration input that instructs about where all the input files such as curve definition and security definition and the scenario definition are located, and indicates what use case the job is tasked with, then it will first pre-process the inputs and construct the three containers, CurveManager, ScenarioManager, and SecurityManager, and dispatch the workflow to the relevant use case. At the end, output the result in the data store to a CSV file.

6.	And one more item and most important to complete the picture: *info logging*! we now need a logger through the code so that each step is logged and if there is any exception, error messages are captured into the log. 

This file is the *architectural design master* document, also the POC.

In [30]:
# python
from ast import Pow
import logging
import json
import pandas as pd # type: ignore
from datetime import datetime
from abc import ABC, abstractmethod
from typing import Dict, Union, List, Optional, Tuple
from scipy.interpolate import interp1d
import numpy as np
import csv
import bisect
import sys

# Configure logging
''' disable file logging for now
logging.basicConfig(
    filename="task_dispatcher.log",
    filemode="a",
    format="%(asctime)s - %(levelname)s - %(message)s",
    level=logging.INFO,
)
'''
logging.basicConfig(stream=sys.stdout, level=logging.INFO, format='%(asctime)s %(levelname)s %(name)s %(process)d: %(message)s')


In [31]:
# Abstract Classes and Singleton Managers
class Curve:
    """Abstract base class for a curve."""
    # concrete method to initialize meta data for the curve
    def __init__(self, name: str, date:datetime):
        self.name = name
        self.date = date

    @abstractmethod
    def get_value(self, date: int) -> float:
        pass

class SimpleCurve(Curve):
    """A simple curve with linear interpolation."""
    def __init__(self, name:str, date:datetime, dates: int, values: float):
        super().__init__(name, date)
        self.dates = np.array(dates)
        self.values = np.array(values)
        self.interpolator = interp1d(self.dates.astype(int), self.values, kind='linear', fill_value="extrapolate")

    def geometric_interp(self, date: int) -> float:
        # find the right point
        idx = bisect.bisect_left(self.dates, date)

        # init the two bounding points for interpolation 
        t0, t1 = self.dates[0], self.dates[1] 
        v0, v1 = self.values[0], self.values[1] 
        #handle edge case: flat, or extrapolate
        if idx == 0:    #do not extrapolate leftward 
            return v0
        elif idx == len(self.dates):    #extrapolate using the last two points
            t0, t1 = self.dates[-2], self.dates[-1] 
            v0, v1 = self.values[-2], self.values[-1] 

        # otherwise
        t0, t1 = self.dates[idx - 1], self.dates[idx] 
        v0, v1 = self.values[idx - 1], self.values[idx] 

        e = float((date - t0) / (t1 - t0))
        # Perform geometric interpolation 
        return v0 * pow((v1 / v0), e)

    def get_value(self, date: int) -> float:
        return self.geometric_interp(date) # self.interpolator(date)

class CurveManager:
    """Singleton class managing curves."""
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super(CurveManager, cls).__new__(cls)
            cls._instance.curves = {}
        return cls._instance

    def set_valuation_date(self, valuation_date: datetime):
        self.valuation_date = valuation_date

    # use read_curves_from_csv instead
    def load_curves(self, curve_file):
        self.read_curves_from_csv(curve_file)
        '''
        curves_data = pd.read_csv(curve_file)
        for _, row in curves_data.iterrows():
            dates = [datetime.strptime(d, "%Y-%m-%d") for d in row["dates"].split(";")]
            values = [float(v) for v in row["values"].split(";")]
            self.add_curve(SimpleCurve(name=row["curve_name"], date=self.valuation_date, dates=dates, values=values))
        '''
        
    def read_curves_from_csv(self, file_path):
        try:
            with open(file_path, 'r') as file:
                reader = csv.reader(file)
                curve_name = None
                curve_date = self.valuation_date
                curve_type = None
                dates = []
                values = []

                for row in reader:
                    if not row:  # Blank row indicates end of current curve
                        if curve_name and dates and values:
                            curve = SimpleCurve(curve_name, curve_date, dates, values)
                            self.add_curve(curve)
                            curve_name = None
                            dates = []
                            values = []
                    elif curve_name is None:  # Header row containing curve name
                        curve_name = row[0]
                        curve_date = datetime.strptime(row[1], '%Y%m%d')
                        curve_type = row[2]
                    else:  # Data rows containing <date, discount factor>
                        date = int(row[0])
                        value = float(row[1])
                        dates.append(date)
                        values.append(value)

                # Add the last curve if the file doesn't end with a blank row
                if curve_name and dates and values:
                    curve = SimpleCurve(curve_name, curve_date, dates, values)
                    self.add_curve(curve)
                    
            logging.info(f"Successfully read and added curves from CSV: {file_path}")

        except Exception as e:
            logging.error(f"Error reading curves from CSV file {file_path}: {e}")
            raise

    def add_curve(self, curve: Curve):
        key = (curve.name,curve.date)
        self.curves[key] = curve
        logging.info(f"Added curve: {curve.name}")

    def get_curve(self, name: str, date:datetime) -> Optional[Curve]:
        return self.curves.get((name,date))



In [32]:
class Scenario:
    """Class representing a market scenario with BASE, UP, and DOWN perturbed curves."""
    def __init__(self, name: str, date: datetime, base_curves: Dict[Tuple[str], Curve]):
        self.name = name
        self.date = date
        self.base_curves = base_curves
        self.up_curves = {}
        self.down_curves = {}
        self._generate_perturbations()

    def _generate_perturbations(self):
        """Generates UP and DOWN perturbed curves."""
        for key, curve in self.base_curves.items():
            up_values = [v * 1.1 for v in curve.values]
            down_values = [v * 0.9 for v in curve.values]
            self.up_curves[key] = SimpleCurve(curve.name, curve.date, curve.dates, up_values)
            self.down_curves[key] = SimpleCurve(curve.name, curve.date, curve.dates, down_values)


class ScenarioManager:
    """Singleton class managing scenarios."""
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super(ScenarioManager, cls).__new__(cls)
            cls._instance.scenarios = {}
        return cls._instance

    def set_valuation_date(self, valuation_date):
        self.valuation_date = valuation_date

    def create_scenario(self, name: str, date: datetime):
        curve_manager = CurveManager()
        base_curves = curve_manager.curves
        scenario = Scenario(name, date, base_curves)
        self.scenarios[(name, date)] = scenario
        logging.info(f"Created scenario: {name} on {date}")

    def load_scenarios(self, scenario_file):
        # just the BASE for now
        # scenarios_data = pd.read_csv(scenario_file)
        logging.info(f"Start loading scenarios: first the BASE curves")
        self.create_scenario("BASE", self.valuation_date)


In [33]:
class Security(ABC):
    """Abstract class for a financial security."""
    def __init__(self, security_id: str, attributes: Dict[str, Union[str, float, int]]):
        self.security_id = security_id
        self.attributes = attributes
        self.setup_security()

    @abstractmethod
    def setup_security(self):
        pass

    @abstractmethod
    def NPV(self, scenario: Scenario) -> float:
        pass


class Bond(Security):
    """A simple bond implementation."""
    def __init__(self, security_id, attributes):
        super().__init__(security_id, attributes)

    def setup_security(self):
        self.cashflow_dates = [1, 5, 7]
        self.cashflor_values = [100,120,100100]

    def NPV(self, scenario: Scenario) -> float:
        curve_name = (self.attributes["DiscountCurve"],)
        base_curve = scenario.base_curves.get(curve_name, scenario.date)

        if not base_curve:
            raise ValueError(f"Curve {curve_name} not found in scenario.")

        npv = 0.0
        for i in range(0,2):
            discount_factor = base_curve.get_value(self.cashflow_dates[i])
            npv += self.cashflor_values[i] * discount_factor
        return npv

class Equity(Security):
    def __init__(self, security_id, attributes):
        super().__init__(security_id, attributes)

    def setup_security(self):
        pass

    def NPV(self, scenario: Scenario) -> float:
        # Implement equity-specific NPV calculation
        npv = 0
        for curve_name, curve in scenario.base_curves.items():
            # Custom logic for NPV calculation for equities
            npv += sum(curve.values)  # Simplified example
        return npv

class SecurityManager:
    """Singleton class managing securities."""
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super(SecurityManager, cls).__new__(cls)
            cls._instance.securities = {}
        return cls._instance

    def set_valuation_date(self, valuation_date):
        self.valuation_date = valuation_date

    def add_security(self, security: Security):
        self.securities[security.security_id] = security
        logging.info(f"Added security: {security.security_id}")

    def construct_and_add_security(self, attributes: dict) -> int:
        security_type = attributes.pop("SecType", None)
        security_id = attributes.get("SecId")

        if security_type == "Bond":
            security = Bond(security_id, attributes)
        elif security_type == "equity":
            security = Equity(security_id, attributes)
        else:
            logging.error(f"Unknown security type: {security_type}")
            return -1

        self.add_security(security)
        logging.info(f"Constructed and added security of type {security_type}: {security_id}")
        return 0

    def read_securities_from_tsv(self, file_path):
        cnt1 = int(0)
        cnt2 = int(0)
        try:
            with open(file_path, 'r') as file:
                reader = csv.DictReader(file, delimiter='\t')
                for row in reader:
                    attributes = dict(row)
                    cnt2 += self.construct_and_add_security(attributes)
                    cnt1 += 1

            logging.info(f"Successfully read {cnt1} rows and added {cnt1+cnt2} securities from TSV: {file_path}")
        except Exception as e:
            logging.error(f"Error reading securities from TSV file {file_path}: {e}")
            raise

    def load_securities(self, security_file):
        self.read_securities_from_tsv(security_file)
        '''
        securities_data = pd.read_csv(security_file)
        for _, row in securities_data.iterrows():
            attributes = {
                "curve_name": row["curve_name"],
                "cash_flows": [
                    (datetime.strptime(d, "%Y-%m-%d"), float(a))
                    for d, a in zip(row["cash_flow_dates"].split(";"), row["cash_flow_amounts"].split(";"))
                ]
            }
            self.add_security(Bond(security_id=row["security_id"], attributes=attributes))
        '''



In [35]:
# Generic Calculation Function
def calculate_npv_for_all(security_manager: SecurityManager, scenario_manager: ScenarioManager) -> pd.DataFrame:
    data_store = []
    for (scenario_name, scenario_date), scenario in scenario_manager.scenarios.items():
        for security_id, security in security_manager.securities.items():
            try:
                base_npv = security.NPV(scenario)
                up_npv = security.NPV(Scenario(name=scenario.name, date=scenario.date, base_curves=scenario.up_curves))
                down_npv = security.NPV(Scenario(name=scenario.name, date=scenario.date, base_curves=scenario.down_curves))
                data_store.append({
                    "Security ID": security_id,
                    "Scenario Name": scenario_name,
                    "Scenario Date": scenario_date,
                    "NPV_BASE": base_npv,
                    "NPV_UP": up_npv,
                    "NPV_DOWN": down_npv,
                })
                logging.info(f"Calculated NPVs for Security: {security_id}, Scenario: {scenario_name}")
            except Exception as e:
                logging.error(f"Error calculating NPV for Security: {security_id}, Scenario: {scenario_name}. Error: {e}")
    return pd.DataFrame(data_store)

# Task Dispatcher
from os import path
class TaskDispatcher:
    def __init__(self, config_file: str):
        with open(config_file, 'r') as f:
            self.config = json.load(f)
        self.wk_folder = path.dirname(config_file)
        self.valuation_date = datetime.strptime(self.config["valuation_date"], "%Y-%m-%d")
        self.curve_manager = CurveManager()
        self.curve_manager.set_valuation_date(self.valuation_date)
        self.scenario_manager = ScenarioManager()
        self.scenario_manager.set_valuation_date(self.valuation_date)
        self.security_manager = SecurityManager()
        self.security_manager.set_valuation_date(self.valuation_date)

    def load_curves(self):
        curve_file = self.config["curve_definition_file"]
        self.curve_manager.load_curves(path.join(self.wk_folder, curve_file))

    def load_scenarios(self):
        scenario_file = self.config["scenario_definition_file"]
        self.scenario_manager.load_scenarios(path.join(self.wk_folder, scenario_file))

    def load_securities(self):
        security_file = self.config["security_definition_file"]
        self.security_manager.load_securities(path.join(self.wk_folder, security_file))

    def execute_use_case(self):
        use_case = self.config["use_case"]
        if use_case == "NPV_CALCULATION":
            results = calculate_npv_for_all(self.security_manager, self.scenario_manager)
            results.to_csv(path.join(self.wk_folder, self.config["output_file"]), index=False)
            logging.info(f"Results saved to {self.config['output_file']}")

    def run(self):
        try:
            logging.info("Starting Task Dispatcher...")
            self.load_curves()
            logging.info("Curves loaded")
            self.load_scenarios()
            logging.info("Scenarios loaded")
            self.load_securities()
            logging.info("Securities loaded")
            self.execute_use_case()
            logging.info("Task Dispatcher completed successfully.")
        except Exception as e:
            logging.error(f"Task Dispatcher encountered an error: {e}")

# Example Usage

if __name__ == "__main__":
    # Path to the configuration file
    config_file = "C:/dev/Python/rmds/tests/config.json"
    
    # Initialize and run the TaskDispatcher
    dispatcher = TaskDispatcher(config_file)
    dispatcher.run()
    

2024-12-10 21:20:56,408 INFO root 3448: Starting Task Dispatcher...
2024-12-10 21:20:56,411 INFO root 3448: Added curve: OIS.USD
2024-12-10 21:20:56,413 INFO root 3448: Added curve: OIS_LIBOR.USD
2024-12-10 21:20:56,414 INFO root 3448: Successfully read and added curves from CSV: C:/dev/Python/rmds/tests\curves.csv
2024-12-10 21:20:56,414 INFO root 3448: Curves loaded
2024-12-10 21:20:56,415 INFO root 3448: Start loading scenarios: first the BASE curves


2024-12-10 21:20:56,419 INFO root 3448: Created scenario: BASE on 2024-06-30 00:00:00
2024-12-10 21:20:56,422 INFO root 3448: Scenarios loaded
2024-12-10 21:20:56,427 INFO root 3448: Added security: 3480191_0
2024-12-10 21:20:56,430 INFO root 3448: Constructed and added security of type Bond: 3480191_0
2024-12-10 21:20:56,431 ERROR root 3448: Unknown security type: FloatBond
2024-12-10 21:20:56,432 ERROR root 3448: Unknown security type: MFixedLeg
2024-12-10 21:20:56,433 ERROR root 3448: Unknown security type: MFloatLeg
2024-12-10 21:20:56,434 ERROR root 3448: Unknown security type: MFixedLeg
2024-12-10 21:20:56,434 INFO root 3448: Successfully read 5 rows and added 1 securities from TSV: C:/dev/Python/rmds/tests\securities.tsv
2024-12-10 21:20:56,435 INFO root 3448: Securities loaded
2024-12-10 21:20:56,436 ERROR root 3448: Error calculating NPV for Security: 3480191_0, Scenario: BASE. Error: 'datetime.datetime' object has no attribute 'get_value'
2024-12-10 21:20:56,439 INFO root 344