# Applying the Hidden Markov Model on Budget allocation

The hidden markov model would recommend a budget using conversion state transitions, budget emissions, and return on investment indicators 

### Objective
Minimize the budget _B<sub>t</sub>_ to transition from state _S<sub>t</sub>_ to _S<sub>t+1</sub>_

### Subject to
* Sum of _B<sub>t</sub>_ <= _B_;  for all _t_
* RMS of _R_ <= $\epsilon$; where _R_ is the Residual of the expected and actual return on investmenet _B_
* Sum of _S<sub>t+1</sub> - S<sub>t</sub>_ <= _T_; where _T_ is the total allowable time period to complete all transitions from S<sub>start</sub> to S<sub>final</sub>

#### Reference
[Analyzing Time Series Data with Markov Transition Matrices](https://medium.com/towards-data-science/time-series-data-markov-transition-matrices-7060771e362b)

In [1]:
'''
    WARNING CONTROL to display or ignore all warnings
'''
import warnings; warnings.simplefilter('ignore')     #switch betweeb 'default' and 'ignore'
import traceback

''' Set debug flag to view extended error messages; else set it to False to turn off debugging mode '''
debug = True


## Instantiate Classes

In [8]:
import os
import sys
import numpy as np
from pyspark.sql import functions as F
import tensorflow_probability as tfp
import tensorflow as tf

proj_dir = os.path.abspath(os.pardir)
sys.path.insert(1,proj_dir.split('mining/')[0])
from rezaware.modules.etl.loader import sparkRDBM as db
from rezaware.modules.etl.loader import sparkFile as file
# from rezaware.modules.etl.loader import __propAttr__ as attr

''' restart initiate classes '''
if debug:
    import importlib
    db = importlib.reload(db)
    file=importlib.reload(file)
    # attr=importlib.reload(attr)

__desc__ = "read and write BigQuery dataset for hypothese testing"
clsSDB = db.dataWorkLoads(
    desc=__desc__,
    db_type = 'bigquery',
    db_driver=None,
    db_hostIP=None,
    db_port = None,
    db_name = None,
    db_schema='combined_data_facebook_ads',
    spark_partitions=None,
    spark_format = 'bigquery',
    spark_save_mode=None,
    # spark_jar_dir = _jar,
)
clsFile = file.dataWorkLoads(
    desc = "optimizing action_type budgets for an ad",
    store_mode='local-fs',
    store_root=proj_dir.split('mining/')[0],
    jar_dir=None,
)
# if clsSDB.session:
#     clsSDB._session.stop
print("\n%s class initialization and load complete!" % __desc__)

2025-02-13 13:09:13.302701: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-13 13:09:15.249253: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-13 13:09:16.390989: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1739423357.407988  318893 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1739423357.614501  318893 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-02-13 13:09:19.496428: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU ins

All functional SPARKRDBM-libraries in LOADER-package of ETL-module imported successfully!
All functional SPARKFILE-libraries in LOADER-package of ETL-module imported successfully!
__propAttr__ Class initialization complete
__propAttr__ Class initialization complete
sparkFile Class initialization complete

read and write BigQuery dataset for hypothese testing class initialization and load complete!


## Load data

In [3]:
options = {
    "inferSchema":True,
    "header":True,
    "delimiter":",",
    "pathGlobFilter":'*.csv',
    "recursiveFileLookup":True,
}

sdf=clsFile.read_files_to_dtype(
    as_type = "SPARK",      # optional - define the data type to return
    folder_path="mining/data/budget/",  # optional - relative path, w.r.t. self.storeRoot
        file_name="complete-60-accounts.csv",  # optional - name of the file to read
        file_type=None,  # optional - read all the files of same type
        **options,
)
print("Loaded %d rows" % sdf.count())
sdf.printSchema()

25/02/13 10:36:40 WARN Utils: Your hostname, Waidy-Think-Three resolves to a loopback address: 127.0.1.1; using 192.168.2.82 instead (on interface enp0s25)
25/02/13 10:36:40 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
25/02/13 10:36:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
                                                                                

Loaded 61513 rows
root
 |-- account_id: long (nullable = true)
 |-- ad_id: long (nullable = true)
 |-- adset_id: long (nullable = true)
 |-- campaign_id: long (nullable = true)
 |-- updated_time: date (nullable = true)
 |-- impressions: integer (nullable = true)
 |-- frequency: double (nullable = true)
 |-- reach: integer (nullable = true)
 |-- CTR: double (nullable = true)
 |-- CPC: double (nullable = true)
 |-- CPM: double (nullable = true)
 |-- ROAS: double (nullable = true)
 |-- social_spend: double (nullable = true)
 |-- spend: double (nullable = true)
 |-- click: integer (nullable = true)
 |-- auction_bid: integer (nullable = true)
 |-- goal: string (nullable = true)
 |-- purchase_value: double (nullable = true)
 |-- account_currency: string (nullable = true)
 |-- purchase: double (nullable = true)
 |-- fb_pixel_view_value: double (nullable = true)
 |-- fb_pixel_purchase_value: double (nullable = true)
 |-- fb_pixel_add_to_cart_value: double (nullable = true)
 |-- mobile_app_purc

In [45]:
import collections
import numpy as np
from itertools import chain

# _state_trans = [
#     ('omni_view_content_value','omni_add_to_cart_value'),
#     ('omni_view_content_value', 'omni_initiated_checkout_value'),
#     ('omni_add_to_cart_value', 'omni_purchase_value'),
#     ('omni_initiated_checkout_value', 'omni_purchase_value'),
# ]
_state_trans = {
    0:('omni_view_content_value','omni_add_to_cart_value'),
    1:('omni_view_content_value', 'omni_initiated_checkout_value'),
    2:('omni_add_to_cart_value', 'omni_purchase_value'),
    3:('omni_initiated_checkout_value', 'omni_purchase_value'),
}
_state_trans=collections.OrderedDict(sorted(_state_trans.items()))
print(_state_trans)
''' discover the states from the transition meta data '''
_states = list(set(list(chain(*[_state_trans[k] for k in _state_trans.keys()]))))
print(_states)
_trans_matrix = np.zeros((len(_states),len(_states)))

for i in range(0,len(_states),1):
    for j in range(0,len(_states),1):
        print(_states[i],_states[j])
        if (_states[i],_states[j]) in _state_trans:
            # print(_states[i],_states[j])
            _trans_matrix[i,j]=10
_trans_matrix

# _max_states = len(_state_trans)

# for _trans in _state_trans:
#     _from_state, _to_state = _state_trans[_trans][0], _state_trans[_trans][1]
#     print(_from_state, _to_state)

OrderedDict([(0, ('omni_view_content_value', 'omni_add_to_cart_value')), (1, ('omni_view_content_value', 'omni_initiated_checkout_value')), (2, ('omni_add_to_cart_value', 'omni_purchase_value')), (3, ('omni_initiated_checkout_value', 'omni_purchase_value'))])
['omni_initiated_checkout_value', 'omni_purchase_value', 'omni_view_content_value', 'omni_add_to_cart_value']
omni_initiated_checkout_value omni_initiated_checkout_value
omni_initiated_checkout_value omni_purchase_value
omni_initiated_checkout_value omni_view_content_value
omni_initiated_checkout_value omni_add_to_cart_value
omni_purchase_value omni_initiated_checkout_value
omni_purchase_value omni_purchase_value
omni_purchase_value omni_view_content_value
omni_purchase_value omni_add_to_cart_value
omni_view_content_value omni_initiated_checkout_value
omni_view_content_value omni_purchase_value
omni_view_content_value omni_view_content_value
omni_view_content_value omni_add_to_cart_value
omni_add_to_cart_value omni_initiated_check

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [46]:
from pyspark.sql import functions as F

_actions=sdf.select('omni_view_content_value', 'omni_add_to_cart_value',
                    'omni_initiated_checkout_value', 'omni_purchase_value')\
            .orderBy(F.col('updated_time'))
_actions.show(truncate=False, n=3)

+-----------------------+----------------------+-----------------------------+-------------------+
|omni_view_content_value|omni_add_to_cart_value|omni_initiated_checkout_value|omni_purchase_value|
+-----------------------+----------------------+-----------------------------+-------------------+
|661.0                  |61.0                  |61.0                         |155.13             |
|150.02                 |null                  |null                         |null               |
|289.0                  |44.0                  |null                         |null               |
+-----------------------+----------------------+-----------------------------+-------------------+
only showing top 3 rows



## Implement the Belman equation
The [Belman equation](https://www.datacamp.com/tutorial/bellman-equation-reinforcement-learning) has the following componensts:
1. R(s,a): The immediate reward received for taking action aaa in state sss.
2. γ: The discount factor (between 0 and 1) that determines the importance of future rewards compared to immediate rewards.
3. P(s′∣s,a) The probability of transitioning to state s′ from state sss by taking action a.
4. max⁡(a): The optimal action that maximizes the expected value of future rewards.

### Filter and Separate dataset
1. Transition matrix features and values
2. Emission matrix features and values
3. Observed outcomes

In [43]:
import numpy as np
_emissions=np.array(sdf.select('ROAS','CPC','CTR','CPM').dropna().collect())
print("_emissions\n",_emissions,_emissions.shape)

_observs=np.array(sdf.select('fb_pixel_view_value','fb_pixel_add_to_cart_value',
                             'fb_pixel_purchase_value').dropna().collect())
print('\n_observes\n',_observs,_observs.shape[1])

_emissions
 [[18.63664596  0.92        1.238938   11.39823   ]
 [18.19        1.          1.322751   13.227513  ]
 [ 9.85487616  1.845714    1.592719   29.397042  ]
 ...
 [ 0.35527015  0.836296    3.602882   30.130771  ]
 [ 2.24693786  0.972609    3.141218   30.551762  ]
 [ 5.1682328   0.785476    3.050109   23.957879  ]] (26193, 4)

_observes
 [[3.7302e+02 2.2400e+02 1.2002e+02]
 [2.1804e+02 8.4000e+01 9.0950e+01]
 [5.4508e+02 2.4200e+02 2.5465e+02]
 ...
 [9.2950e+01 5.7970e+01 4.0980e+01]
 [2.0000e-02 8.6000e+01 1.2835e+02]
 [9.0400e+02 1.6000e+02 1.0625e+02]] 3


In [44]:
_observs[0]

array([373.02, 224.  , 120.02])

In [45]:
from hmmlearn.hmm import MultinomialHMM

model = MultinomialHMM(
    n_components=_emissions.shape[1],
    startprob_prior=_observs[0],
    algorithm='viterbi',
    random_state=0,
    n_iter=1,
)
model.fit(_observs)

MultinomialHMM has undergone major changes. The previous version was implementing a CategoricalHMM (a special case of MultinomialHMM). This new implementation follows the standard definition for a Multinomial distribution (e.g. as in https://en.wikipedia.org/wiki/Multinomial_distribution). See these issues for details:
https://github.com/hmmlearn/hmmlearn/issues/335
https://github.com/hmmlearn/hmmlearn/issues/340


ValueError: Symbol counts should be nonnegative integers