Perform the data cleaning and data preproceccing procedure for the Forex.txt file.

 2.1 produce a new column "discretized_price" with "low", "medium" and "high" values for some (choose your own) thresholds.

 2.2 create a crosstab with two dimensions (Forex label, discretized_price) and the frequency count of Forex trades as the aggregate function.

 2.3 compute a second crosstab with the marginal probabilities. For  instance, what is the probability of having ForexAUDCADNoExpiry traded?

 2.4 compute appropriate crosstabs with the conditional probabilities. For  instance, what is the probability of having ForexAUDCADNoExpiry traded given that the price is high? What is the probability of having high price given that ForexAUDCADNoExpiry is traded? [Hint: think if (A) you should use "All" or not in the crosstab specifications, (B) you should use the ```normalize``` parameter and (C) whether normalize should be applied on index, columns of the crosstab or it should be simply set to True. Check [1] for more].

In [1]:
import pandas as pd
import numpy as np

forex_df = pd.read_csv('Forex.txt', sep=',', header=None)

In [2]:
# Check if all values in the first column are 'Forex'
forex_df.isin(['Forex']).all()

# Drop the first column
forex_df = forex_df.iloc[:, 1:3]

# Rename the columns
forex_df.columns = ['exchange', 'price']

print(forex_df)

                 exchange    price
0     ForexAUDCADNoExpiry  0.92919
1     ForexAUDCADNoExpiry  0.92724
2     ForexAUDCADNoExpiry  0.92915
3     ForexAUDCADNoExpiry  0.93456
4     ForexAUDCADNoExpiry  0.93426
...                   ...      ...
1095  ForexXRPUSDNoExpiry  0.60252
1096  ForexXRPUSDNoExpiry  0.60066
1097  ForexXRPUSDNoExpiry  0.63142
1098  ForexXRPUSDNoExpiry  0.63732
1099  ForexXRPUSDNoExpiry  0.70620

[1100 rows x 2 columns]


In [None]:
# Create a new column with the discretized price (low, medium, high)
thresholds = [1.00, 2.00]

forex_df = forex_df.apply(lambda x: pd.Series([x['exchange'], x['price'], 'low' if x['price'] <  thresholds[0] else 'medium' if (x['price'] >= thresholds[0] and x['price'] <= thresholds[1]) else 'high'],\
                                    index = ['exchange', 'price', 'discretized_price']), axis=1, result_type='expand')

print(forex_df)

                 exchange    price discretized_price
0     ForexAUDCADNoExpiry  0.92919               low
1     ForexAUDCADNoExpiry  0.92724               low
2     ForexAUDCADNoExpiry  0.92915               low
3     ForexAUDCADNoExpiry  0.93456               low
4     ForexAUDCADNoExpiry  0.93426               low
...                   ...      ...               ...
1095  ForexXRPUSDNoExpiry  0.60252               low
1096  ForexXRPUSDNoExpiry  0.60066               low
1097  ForexXRPUSDNoExpiry  0.63142               low
1098  ForexXRPUSDNoExpiry  0.63732               low
1099  ForexXRPUSDNoExpiry  0.70620               low

[1100 rows x 3 columns]


In [4]:
# Compute the crosstab
crosstab_freq = pd.crosstab(columns=forex_df['discretized_price'], index=forex_df['exchange'], margins=False)

crosstab_freq.tail(10)

discretized_price,high,low,medium
exchange,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ForexUSDMXNNoExpiry,20,0,0
ForexUSDNOKNoExpiry,20,0,0
ForexUSDPLNNoExpiry,20,0,0
ForexUSDSEKNoExpiry,20,0,0
ForexUSDSGDNoExpiry,0,0,20
ForexUSDTRYNoExpiry,20,0,0
ForexUSDZARNoExpiry,20,0,0
ForexXAGUSDNoExpiry,20,0,0
ForexXAUUSDNoExpiry,20,0,0
ForexXRPUSDNoExpiry,0,20,0


In [5]:
# Entire probability distribution (every cell is P(exchange, price_discretized))
crosstab_probs = pd.crosstab(
    index=forex_df['exchange'], 
    columns=forex_df['discretized_price'], 
    margins=True,            # To include row and column totals
    margins_name='Total',    
    normalize=True           # Make the grand total = 1
)

print("Crosstab with Marginal Probabilities:")
crosstab_probs.tail(10)

Crosstab with Marginal Probabilities:


discretized_price,high,low,medium,Total
exchange,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ForexUSDNOKNoExpiry,0.018182,0.0,0.0,0.018182
ForexUSDPLNNoExpiry,0.018182,0.0,0.0,0.018182
ForexUSDSEKNoExpiry,0.018182,0.0,0.0,0.018182
ForexUSDSGDNoExpiry,0.0,0.0,0.018182,0.018182
ForexUSDTRYNoExpiry,0.018182,0.0,0.0,0.018182
ForexUSDZARNoExpiry,0.018182,0.0,0.0,0.018182
ForexXAGUSDNoExpiry,0.018182,0.0,0.0,0.018182
ForexXAUUSDNoExpiry,0.018182,0.0,0.0,0.018182
ForexXRPUSDNoExpiry,0.0,0.018182,0.0,0.018182
Total,0.564545,0.2,0.235455,1.0


In [6]:
crosstab_cond_on_price = pd.crosstab(
    index=forex_df['exchange'],
    columns=forex_df['discretized_price'],
    margins=True,
    margins_name='Total',
    normalize='columns'  # Normalize each column (price discretized) to sum to 1
)

print("Crosstab with Conditional Probability on Price:")
crosstab_cond_on_price.tail(10)


# p_audcad_given_high = crosstab_cond_on_price.loc['ForexAUDCADNoExpiry', 'high']
# print(p_audcad_given_high)


Crosstab with Conditional Probability on Price:


discretized_price,high,low,medium,Total
exchange,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
ForexUSDMXNNoExpiry,0.032206,0.0,0.0,0.018182
ForexUSDNOKNoExpiry,0.032206,0.0,0.0,0.018182
ForexUSDPLNNoExpiry,0.032206,0.0,0.0,0.018182
ForexUSDSEKNoExpiry,0.032206,0.0,0.0,0.018182
ForexUSDSGDNoExpiry,0.0,0.0,0.07722,0.018182
ForexUSDTRYNoExpiry,0.032206,0.0,0.0,0.018182
ForexUSDZARNoExpiry,0.032206,0.0,0.0,0.018182
ForexXAGUSDNoExpiry,0.032206,0.0,0.0,0.018182
ForexXAUUSDNoExpiry,0.032206,0.0,0.0,0.018182
ForexXRPUSDNoExpiry,0.0,0.090909,0.0,0.018182


In [7]:
crosstab_cond_on_exchange = pd.crosstab(
    index=forex_df['exchange'],
    columns=forex_df['discretized_price'],
    margins=True,
    margins_name='Total',
    normalize='index'
)

print("Crosstab with Conditional Probability on Exchange:")
crosstab_cond_on_exchange.tail(10)

# p_high_given_audcad = crosstab_cond_on_exchange.loc['ForexAUDCADNoExpiry', 'high']
# print(p_high_given_audcad)


Crosstab with Conditional Probability on Exchange:


discretized_price,high,low,medium
exchange,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
ForexUSDNOKNoExpiry,1.0,0.0,0.0
ForexUSDPLNNoExpiry,1.0,0.0,0.0
ForexUSDSEKNoExpiry,1.0,0.0,0.0
ForexUSDSGDNoExpiry,0.0,0.0,1.0
ForexUSDTRYNoExpiry,1.0,0.0,0.0
ForexUSDZARNoExpiry,1.0,0.0,0.0
ForexXAGUSDNoExpiry,1.0,0.0,0.0
ForexXAUUSDNoExpiry,1.0,0.0,0.0
ForexXRPUSDNoExpiry,0.0,1.0,0.0
Total,0.564545,0.2,0.235455
