# Biased and fair coin sequences of arbitrary length

Contents:
* [Installing the parallel_simulations module](#installing-the-parallel_simulations-module)
* [Importing the PipelineOptions class](#import_options) 
* [Importing the `ParallelMCBattery` class](#import_class)
* [Battery configurations, and using the simulations instance](#battery_config)
* [Defining the statistical model](#stats_model)
* [Creating the simulations configurations](#simulations_configs)
* [Simulating with minimal configurations](#simulations_min_config)
* [Sample inspection](#sample_insp)
* [Simulating with output file paths specified via `output_paths`](#simulate_output)
* [Simulating by completing a pre-existing sequence with `starting_point`](#simulate_start)
* [Reusing the cached `output_paths`](#cached_outputs)

<a id="install"></a>
### Installing the `parallel_simulations` module

In [None]:
%pip install -q -e "git+https://github.com/vladimirrotariu/parallel-monte-carlo-simulations#egg=parallel_simulations&subdirectory=src"

### Additional setup

In [23]:
import sys
import numpy as np
# as we provide the pipeline_options programatically, and not via CLI
sys.argv = [""]

<a id="import_options"></a>
### Importing the PipelineOptions class

In [24]:

from apache_beam.options.pipeline_options import PipelineOptions

<a id="import_class"></a>
### Importing the `ParallelMCBattery` class

In [25]:
from parallel_simulations import ParallelMCBattery

<a id="battery_config"></a>
### Battery configurations, and using the simulations instance

One may configure the Monte Carlo parallel battery by choosing the desired random number generator, in this case [Philox](https://numpy.org/doc/stable/reference/random/bit_generators/philox.html#philox-counter-based-rng), and the pipeline options instance of the class [PipelineOptions](https://beam.apache.org/releases/pydoc/2.33.0/apache_beam.options.pipeline_options.html#apache_beam.options.pipeline_options.PipelineOptions) of Apache Beam, for this example choosing for simplicity the default settings, which means the pipeline runs on the local `Direct Runner`.

**Important**: The choice of random number generator, as well as the pipeline options, become *class attributes* of the battery, hence they will be common for all the simulations run from the same battery.

In [26]:
options = PipelineOptions()

battery_configs = {"rng" : "Philox", "pipeline_options": options}

#both the choice of rng, as well as the pipeline_options, become class attributes, hence will be common to all the simulations run from the same battery
battery_parallel_MC = ParallelMCBattery(battery_configs=battery_configs)

<a id="stats_model"></a>
### Defining the statistical model

We are interested to model two distinct **parallelizable** sequences of heads 'H' and tails 'T' generated by simulating tossing coins of a given `bias`, which is a list of `parameters`, corresponding in this case to a unique parameter.

In [27]:
def CoinSequence(number_flips: int, rng: np.random._generator.Generator, bias: list[float]) -> list[str]:
    return ["H" if rng.random() <= bias[0] else "T" for _ in range(number_flips)]

models = [CoinSequence, CoinSequence]

<a id="simulations_configs"></a>
### Creating the simulations configurations

To configure the simulations for these models, one further uses a list of dictionaries, each dictionary corresponding to one of the `models`.

In [28]:
# 100,000 sequences, each of 16 'H' or 'T' generated by simulating tossing a fair coin
fair_coin_config = {"parameters": [0.5], "number_simulations" : 100000, "number_points": 16}

# 60,000 sequences, each of 32 'H' or 'T' generated by simulating tossing a biased coin
biased_coin_config = {"parameters": [0.7], "number_simulations" : 60000, "number_points": 32}

simulation_configs = [fair_coin_config, biased_coin_config]

<a id="simulations_min_configs"></a>
### Simulating with minimal configurations

And now one may perform the Monte Carlo simulations configured above, having in mind that without specifying `output_paths`, the output csv files will be written by default in the directory where it is executed the Python script which calls the object method `simulate`:

In [29]:
battery_parallel_MC.simulate(models, simulation_configs)

<a id="sample_insp"></a>
### Sample inspection

As we did not specify the `output_paths`, the output files names' are created with the index corresponding from the `models` list

In [30]:
!head -n 12 0.txt

H,H,H,H,T,H,T,H,H,H,T,H,T,T,H,H
T,H,H,H,H,H,H,T,T,T,H,H,H,T,T,H
T,H,T,T,T,H,H,T,T,H,H,T,H,H,H,T
T,H,T,H,T,T,T,T,H,H,H,H,H,T,T,T
T,H,T,H,H,H,H,T,H,H,H,T,H,T,H,T
H,T,T,T,T,T,H,H,T,H,H,H,T,H,H,T
T,T,T,T,T,T,H,H,T,T,H,T,H,H,H,H
H,H,T,T,H,H,H,T,H,H,T,H,H,T,H,H
H,H,H,H,T,H,H,T,H,H,T,H,H,H,H,H
T,T,T,H,T,H,T,H,T,T,T,H,T,H,H,T
H,H,H,T,H,H,T,H,T,H,T,T,T,T,T,T
T,H,H,H,T,H,H,H,T,T,T,H,T,T,T,T


In [31]:
!head -n 12 1.txt

H,H,H,T,H,H,H,H,H,T,T,H,H,H,H,H,H,T,H,H,H,H,H,H,H,H,H,H,T,H,H,T
H,T,H,T,T,H,H,H,H,H,T,H,T,H,T,T,T,T,H,H,H,H,H,H,T,H,H,H,T,H,H,H
H,H,H,H,T,H,H,H,T,T,T,H,H,H,H,H,H,H,H,H,T,T,T,T,T,H,T,T,T,H,H,H
H,T,T,H,H,T,H,T,T,H,T,T,T,H,H,T,H,T,H,H,T,H,T,T,T,T,T,H,H,H,H,H
H,H,H,H,H,T,H,H,H,H,H,H,T,T,T,H,H,H,H,H,T,H,H,H,H,H,H,H,T,T,H,H
H,H,H,H,H,H,H,H,H,H,H,T,H,H,T,T,H,H,T,H,H,H,H,H,T,H,T,H,H,T,H,H
H,H,H,H,T,T,H,T,H,T,H,H,H,H,T,T,T,H,H,H,T,T,T,T,H,T,H,H,H,T,H,H
T,H,T,H,T,H,T,T,T,T,T,T,H,H,H,H,H,T,T,H,H,T,H,T,H,H,H,T,T,T,H,T
T,T,T,H,H,T,H,T,H,H,H,H,T,T,H,H,H,H,H,T,T,H,H,H,H,H,H,H,T,H,H,T
H,H,T,H,T,H,H,T,T,T,H,H,T,H,T,H,T,H,H,H,H,T,H,H,H,T,H,H,H,H,H,H
H,H,T,H,H,T,H,H,H,H,H,H,H,T,T,H,H,H,H,H,H,H,T,H,H,H,H,H,H,H,H,T
H,H,T,T,H,H,T,T,H,H,H,H,H,H,T,T,H,T,H,T,H,T,H,H,T,H,H,T,H,T,H,H


<a id="simulate_output"></a>
### Simulating with output file paths specified via `output_paths`

As expected, one may use both relative and absolute paths to configure where the output files of the Monte Carlo simulation is going to be created; in the example above, we choose relative paths to the directory that hosts the Jupyter notebook file where `battery_parallel_MC.simulate` is executed.

In [32]:
output_paths = ["A","B"]

battery_parallel_MC.simulate(models, simulation_configs, output_paths)

In [33]:
!head -n 12 A

H,H,H,H,T,H,T,H,H,H,T,H,T,T,H,H
T,H,H,H,H,H,H,T,T,T,H,H,H,T,T,H
T,H,T,T,T,H,H,T,T,H,H,T,H,H,H,T
T,H,T,H,T,T,T,T,H,H,H,H,H,T,T,T
T,H,T,H,H,H,H,T,H,H,H,T,H,T,H,T
H,T,T,T,T,T,H,H,T,H,H,H,T,H,H,T
T,T,T,T,T,T,H,H,T,T,H,T,H,H,H,H
H,H,T,T,H,H,H,T,H,H,T,H,H,T,H,H
H,H,H,H,T,H,H,T,H,H,T,H,H,H,H,H
T,T,T,H,T,H,T,H,T,T,T,H,T,H,H,T
H,H,H,T,H,H,T,H,T,H,T,T,T,T,T,T
T,H,H,H,T,H,H,H,T,T,T,H,T,T,T,T


In [34]:
!head -n 12 B

H,H,H,T,H,H,H,H,H,T,T,H,H,H,H,H,H,T,H,H,H,H,H,H,H,H,H,H,T,H,H,T
H,T,H,T,T,H,H,H,H,H,T,H,T,H,T,T,T,T,H,H,H,H,H,H,T,H,H,H,T,H,H,H
H,H,H,H,T,H,H,H,T,T,T,H,H,H,H,H,H,H,H,H,T,T,T,T,T,H,T,T,T,H,H,H
H,T,T,H,H,T,H,T,T,H,T,T,T,H,H,T,H,T,H,H,T,H,T,T,T,T,T,H,H,H,H,H
H,H,H,H,H,T,H,H,H,H,H,H,T,T,T,H,H,H,H,H,T,H,H,H,H,H,H,H,T,T,H,H
H,H,H,H,H,H,H,H,H,H,H,T,H,H,T,T,H,H,T,H,H,H,H,H,T,H,T,H,H,T,H,H
H,H,H,H,T,T,H,T,H,T,H,H,H,H,T,T,T,H,H,H,T,T,T,T,H,T,H,H,H,T,H,H
T,H,T,H,T,H,T,T,T,T,T,T,H,H,H,H,H,T,T,H,H,T,H,T,H,H,H,T,T,T,H,T
T,T,T,H,H,T,H,T,H,H,H,H,T,T,H,H,H,H,H,T,T,H,H,H,H,H,H,H,T,H,H,T
H,H,T,H,T,H,H,T,T,T,H,H,T,H,T,H,T,H,H,H,H,T,H,H,H,T,H,H,H,H,H,H
H,H,T,H,H,T,H,H,H,H,H,H,H,T,T,H,H,H,H,H,H,H,T,H,H,H,H,H,H,H,H,T
H,H,T,T,H,H,T,T,H,H,H,H,H,H,T,T,H,T,H,T,H,T,H,H,T,H,H,T,H,T,H,H


<a id="simulate_start"></a>
### Simulating by completing a pre-existing sequence with `starting_point`

To allow to complete a pre-existing sequence of heads 'H' and tails 'T', we first modify the `models`

In [35]:
def CoinSequence(number_flips: int, rng: np.random._generator.Generator, bias: list[float], initial_sequence: list[str]) -> list[str]:
  sequence_completion = ["H" if rng.random() <= bias else "T" for _ in range(number_flips)]

  full_sequence = initial_sequence + sequence_completion

  return full_sequence

models = [CoinSequence, CoinSequence]

Then, we may include the initial sequence via the parameter `starting_point`:

In [36]:
# 200,000 sequences, each starting with 4 'T' and followed by 12 'H' or 'T' generated by simulating tossing a fair coin
fair_coin_config = {"parameters": 0.5, "number_simulations" : 200000, "number_points": 12, "starting_point": ['T', 'T', 'T', 'T', 'T']}

# 80,000 sequences, starting with 9 'T' for 'H' and followed by 28 'H' or 'T' generated by simulating tossing a biased coin
biased_coin_config = {"parameters": 0.7, "number_simulations" : 80000, "number_points": 28, "starting_point": ['T', 'T', 'T', 'T', 'T']}

simulation_configs = [fair_coin_config, biased_coin_config]

We pick the paths to the output files to be relative to the directory where the file containing `battery_parallel_MC.simulate` is executed, knowing that in case the directories/files do not exist, they will be **created**, and if they do exist, they will be **overwritten**.

In [37]:
output_paths = ["fair_coin/simulation_fair.txt", "biased_coin/simulation_biased.txt"]

battery_parallel_MC.simulate(models, simulation_configs, output_paths)

In [38]:
!head -n 12 fair_coin/simulation_fair.txt

T,T,T,T,T,H,H,H,H,T,H,T,H,H,H,T,H
T,T,T,T,T,T,T,H,H,T,H,H,H,H,H,H,T
T,T,T,T,T,T,T,H,H,H,T,T,H,T,H,T,T
T,T,T,T,T,T,H,H,T,T,H,H,T,H,H,H,T
T,T,T,T,T,T,H,T,H,T,T,T,T,H,H,H,H
T,T,T,T,T,H,T,T,T,T,H,T,H,H,H,H,T
T,T,T,T,T,H,H,H,T,H,T,H,T,H,T,T,T
T,T,T,T,T,T,T,H,H,T,H,H,H,T,H,H,T
T,T,T,T,T,T,T,T,T,T,T,H,H,T,T,H,T
T,T,T,T,T,H,H,H,H,H,H,T,T,H,H,H,T
T,T,T,T,T,H,H,T,H,H,T,H,H,H,H,H,H
T,T,T,T,T,T,H,H,T,H,H,T,H,H,H,H,H


In [39]:
!head -n 12 biased_coin/simulation_biased.txt

T,T,T,T,T,H,H,H,T,H,H,H,H,H,T,T,H,H,H,H,H,H,T,H,H,H,H,H,H,H,H,H,H
T,T,T,T,T,T,H,H,T,H,T,H,T,T,H,H,H,H,H,T,H,T,H,T,T,T,T,H,H,H,H,H,H
T,T,T,T,T,T,H,H,H,T,H,H,H,H,H,H,H,T,H,H,H,T,T,T,H,H,H,H,H,H,H,H,H
T,T,T,T,T,T,T,T,T,T,H,T,T,T,H,H,H,H,T,T,H,H,T,H,T,T,H,T,T,T,H,H,T
T,T,T,T,T,H,T,H,H,T,H,T,T,T,T,T,H,H,H,H,H,H,H,H,H,H,T,H,H,H,H,H,H
T,T,T,T,T,T,T,T,H,H,H,H,H,T,H,H,H,H,H,H,H,T,T,H,H,H,H,H,H,H,H,H,H
T,T,T,T,T,H,H,H,T,H,H,T,T,H,H,T,H,H,H,H,H,T,H,T,H,H,T,H,H,H,H,H,H
T,T,T,T,T,T,T,H,T,H,T,H,H,H,H,T,T,T,H,H,H,T,T,T,T,H,T,H,H,H,T,H,H
T,T,T,T,T,T,H,T,H,T,H,T,T,T,T,T,T,H,H,H,H,H,T,T,H,H,T,H,T,H,H,H,T
T,T,T,T,T,T,T,H,T,T,T,T,H,H,T,H,T,H,H,H,H,T,T,H,H,H,H,H,T,T,H,H,H
T,T,T,T,T,H,H,H,H,T,H,H,T,H,H,T,H,T,H,H,T,T,T,H,H,T,H,T,H,T,H,H,H
T,T,T,T,T,H,T,H,H,H,T,H,H,H,H,H,H,H,H,T,H,H,T,H,H,H,H,H,H,H,T,T,H


However, note that in this example the generation of the subsequent points in a simulation is independent of the **initial sequence** enclosed in the `starting_point`. In the next notebook, we will implement an autoregressive model, which might use e.g. the present price of a stock as `starting_point`.

<a id="cached_paths"></a>
### Reusing the cached `output_paths`

The `output_paths` get cached by being stored as a class attribute, hence they will be common for all the simulations coming from the same battery, until another value for `output_paths` is used.

In [40]:
battery_parallel_MC.simulate(models, simulation_configs)

In [41]:
!head -n 12 fair_coin/simulation_fair.txt

T,T,T,T,T,H,H,H,H,T,H,T,H,H,H,T,H
T,T,T,T,T,T,T,H,H,T,H,H,H,H,H,H,T
T,T,T,T,T,T,T,H,H,H,T,T,H,T,H,T,T
T,T,T,T,T,T,H,H,T,T,H,H,T,H,H,H,T
T,T,T,T,T,T,H,T,H,T,T,T,T,H,H,H,H
T,T,T,T,T,H,T,T,T,T,H,T,H,H,H,H,T
T,T,T,T,T,H,H,H,T,H,T,H,T,H,T,T,T
T,T,T,T,T,T,T,H,H,T,H,H,H,T,H,H,T
T,T,T,T,T,T,T,T,T,T,T,H,H,T,T,H,T
T,T,T,T,T,H,H,H,H,H,H,T,T,H,H,H,T
T,T,T,T,T,H,H,T,H,H,T,H,H,H,H,H,H
T,T,T,T,T,T,H,H,T,H,H,T,H,H,H,H,H


In [42]:
!head -n 12 biased_coin/simulation_biased.txt

T,T,T,T,T,H,H,H,T,H,H,H,H,H,T,T,H,H,H,H,H,H,T,H,H,H,H,H,H,H,H,H,H
T,T,T,T,T,T,H,H,T,H,T,H,T,T,H,H,H,H,H,T,H,T,H,T,T,T,T,H,H,H,H,H,H
T,T,T,T,T,T,H,H,H,T,H,H,H,H,H,H,H,T,H,H,H,T,T,T,H,H,H,H,H,H,H,H,H
T,T,T,T,T,T,T,T,T,T,H,T,T,T,H,H,H,H,T,T,H,H,T,H,T,T,H,T,T,T,H,H,T
T,T,T,T,T,H,T,H,H,T,H,T,T,T,T,T,H,H,H,H,H,H,H,H,H,H,T,H,H,H,H,H,H
T,T,T,T,T,T,T,T,H,H,H,H,H,T,H,H,H,H,H,H,H,T,T,H,H,H,H,H,H,H,H,H,H
T,T,T,T,T,H,H,H,T,H,H,T,T,H,H,T,H,H,H,H,H,T,H,T,H,H,T,H,H,H,H,H,H
T,T,T,T,T,T,T,H,T,H,T,H,H,H,H,T,T,T,H,H,H,T,T,T,T,H,T,H,H,H,T,H,H
T,T,T,T,T,T,H,T,H,T,H,T,T,T,T,T,T,H,H,H,H,H,T,T,H,H,T,H,T,H,H,H,T
T,T,T,T,T,T,T,H,T,T,T,T,H,H,T,H,T,H,H,H,H,T,T,H,H,H,H,H,T,T,H,H,H
T,T,T,T,T,H,H,H,H,T,H,H,T,H,H,T,H,T,H,H,T,T,T,H,H,T,H,T,H,T,H,H,H
T,T,T,T,T,H,T,H,H,H,T,H,H,H,H,H,H,H,H,T,H,H,T,H,H,H,H,H,H,H,T,T,H


Consequently, we may overwrite the `output_paths`:

In [43]:
output_paths = ["A", "B"]

battery_parallel_MC.simulate(models, simulation_configs, output_paths)

In [44]:
!head -n 12 A

T,T,T,T,T,H,H,H,H,T,H,T,H,H,H,T,H
T,T,T,T,T,T,T,H,H,T,H,H,H,H,H,H,T
T,T,T,T,T,T,T,H,H,H,T,T,H,T,H,T,T
T,T,T,T,T,T,H,H,T,T,H,H,T,H,H,H,T
T,T,T,T,T,T,H,T,H,T,T,T,T,H,H,H,H
T,T,T,T,T,H,T,T,T,T,H,T,H,H,H,H,T
T,T,T,T,T,H,H,H,T,H,T,H,T,H,T,T,T
T,T,T,T,T,T,T,H,H,T,H,H,H,T,H,H,T
T,T,T,T,T,T,T,T,T,T,T,H,H,T,T,H,T
T,T,T,T,T,H,H,H,H,H,H,T,T,H,H,H,T
T,T,T,T,T,H,H,T,H,H,T,H,H,H,H,H,H
T,T,T,T,T,T,H,H,T,H,H,T,H,H,H,H,H


In [45]:
!head -n 12 B

T,T,T,T,T,H,H,H,T,H,H,H,H,H,T,T,H,H,H,H,H,H,T,H,H,H,H,H,H,H,H,H,H
T,T,T,T,T,T,H,H,T,H,T,H,T,T,H,H,H,H,H,T,H,T,H,T,T,T,T,H,H,H,H,H,H
T,T,T,T,T,T,H,H,H,T,H,H,H,H,H,H,H,T,H,H,H,T,T,T,H,H,H,H,H,H,H,H,H
T,T,T,T,T,T,T,T,T,T,H,T,T,T,H,H,H,H,T,T,H,H,T,H,T,T,H,T,T,T,H,H,T
T,T,T,T,T,H,T,H,H,T,H,T,T,T,T,T,H,H,H,H,H,H,H,H,H,H,T,H,H,H,H,H,H
T,T,T,T,T,T,T,T,H,H,H,H,H,T,H,H,H,H,H,H,H,T,T,H,H,H,H,H,H,H,H,H,H
T,T,T,T,T,H,H,H,T,H,H,T,T,H,H,T,H,H,H,H,H,T,H,T,H,H,T,H,H,H,H,H,H
T,T,T,T,T,T,T,H,T,H,T,H,H,H,H,T,T,T,H,H,H,T,T,T,T,H,T,H,H,H,T,H,H
T,T,T,T,T,T,H,T,H,T,H,T,T,T,T,T,T,H,H,H,H,H,T,T,H,H,T,H,T,H,H,H,T
T,T,T,T,T,T,T,H,T,T,T,T,H,H,T,H,T,H,H,H,H,T,T,H,H,H,H,H,T,T,H,H,H
T,T,T,T,T,H,H,H,H,T,H,H,T,H,H,T,H,T,H,H,T,T,T,H,H,T,H,T,H,T,H,H,H
T,T,T,T,T,H,T,H,H,H,T,H,H,H,H,H,H,H,H,T,H,H,T,H,H,H,H,H,H,H,T,T,H
