# Final Project Report

* Class: DS 5100
* Student Name: Tomas Tsega
* Student Net ID:hhn5nx@virginia.edu
* This URL: montecarlo_simulator_project

# Instructions

Follow the instructions in the Final Project isntructions notebook and put evidence of your work in this notebook.

Total points for each subsection under **Deliverables** and **Scenarios** are given in parentheses.

Breakdowns of points within subsections are specified within subsection instructions as bulleted lists.

This project is worth **50 points**.

# Deliverables

## The Monte Carlo Module (10)

- URL included, appropriately named (1).
- Includes all three specified classes (3).
- Includes at least all 12 specified methods (6; .5 each).

Put the URL to your GitHub repo here.

Repo URL: https://github.com/tsegatomas/DS5100--tsegatomas-.git

Paste a copyy of your module here.

NOTE: Paste as text, not as code. Use triple backticks to wrap your code blocks.

In [7]:
# A code block with your classes.

      '''
      import os
import numpy as np
import pandas as pd

class Die:
    """
    A class representing a die with N sides, each side having a unique symbol and weight.
    """
    def __init__(self, faces):
        if not isinstance(faces, np.ndarray):
            raise TypeError("Faces must be a NumPy array.")
        if len(faces) != len(set(faces)):
            raise ValueError("Faces must be distinct values.")
        
        self._faces = faces
        self._weights = np.ones(len(faces))  # Default weight of 1.0 for each face
        self._die_df = pd.DataFrame({
            'face': faces,
            'weight': self._weights
        }).set_index('face')

    def change_weight(self, face, new_weight):
        if face not in self._die_df.index:
            raise IndexError("Face value not found in die.")
        if not isinstance(new_weight, (int, float)) or new_weight < 0:
            raise TypeError("Weight must be a positive numeric value.")
        
        self._die_df.at[face, 'weight'] = float(new_weight)

    def roll(self, rolls=1):
        return self._die_df.sample(n=rolls, weights='weight', replace=True).index.tolist()
    
    def show(self):
        return self._die_df.copy()


class Game:
    """
    A class representing a game consisting of rolling one or more dice.
    """
    def __init__(self, dice):
        if not isinstance(dice, list) or not all(isinstance(die, Die) for die in dice):
            raise TypeError("All elements must be Die objects.")
        self.dice = dice
        self._results = None

    def play(self, rolls=1):
        results = {f"Die_{i}": die.roll(rolls) for i, die in enumerate(self.dice)}
        self._results = pd.DataFrame(results)
        self._results.index.name = 'Roll'

    def show(self, form='wide'):
        if self._results is None:
            raise ValueError("No results available. Please play the game first.")
        if form == 'wide':
            return self._results
        elif form == 'narrow':
            return self._results.stack().to_frame('Outcome')
        else:
            raise ValueError("Invalid form. Use 'wide' or 'narrow'.")


class Analyzer:
    """
    A class for analyzing the results of a game of dice.
    """
    def __init__(self, game):
        if not isinstance(game, Game):
            raise TypeError("Input must be a Game object.")
        self.game = game
        self.results = game.show(form='wide')

    def jackpot(self):
        return (self.results.nunique(axis=1) == 1).sum()

    def face_counts_per_roll(self):
        return self.results.apply(pd.Series.value_counts, axis=1).fillna(0)

    def combo_count(self):
        sorted_rolls = self.results.apply(lambda x: tuple(sorted(x)), axis=1)
        return sorted_rolls.value_counts().to_frame('Count')

    def permutation_count(self):
        perm_rolls = self.results.apply(lambda x: tuple(x), axis=1)
        return perm_rolls.value_counts().to_frame('Count')

    def scrabble_word_analysis(self, word_list):
        valid_words = []
        for roll in self.results.apply(lambda x: ''.join(sorted(x)), axis=1):
            if roll in word_list:
                valid_words.append(True)
            else:
                valid_words.append(False)
        
        return pd.DataFrame({'Roll': self.results.index, 'Is_Valid_Word': valid_words})

    def letter_frequency_analysis(self, letter_frequencies):
        rolled_letter_counts = self.results.apply(pd.Series.value_counts).sum().fillna(0)
        comparison = pd.DataFrame({'Rolled_Count': rolled_letter_counts})
        comparison['Expected_Frequency'] = comparison.index.map(letter_frequencies)
        comparison['Expected_Frequency'] = comparison['Expected_Frequency'].fillna(0)
        return comparison


# Adjust file paths for scrabble words and English letter frequencies
current_dir = os.path.dirname(__file__)
scrabble_words_path = os.path.join(current_dir, 'scrabble_words.txt')
english_letters_path = os.path.join(current_dir, 'english_letters.txt')

# Load scrabble words
try:
    with open(scrabble_words_path) as f:
        scrabble_words = [line.strip() for line in f]
except FileNotFoundError:
    raise FileNotFoundError(f"Could not find scrabble_words.txt at {scrabble_words_path}")

# Load English letter frequencies
try:
    letter_frequencies = {}
    with open(english_letters_path) as f:
        for line in f:
            letter, frequency = line.strip().split()
            letter_frequencies[letter] = float(frequency)
except FileNotFoundError:
    raise FileNotFoundError(f"Could not find english_letters.txt at {english_letters_path}")
    '''

## Unitest Module (2)

Paste a copy of your test module below.

NOTE: Paste as text, not as code. Use triple backticks to wrap your code blocks.

- All methods have at least one test method (1).
- Each method employs one of Unittest's Assert methods (1).

In [2]:
# A code block with your test code.
'''
# In[1]:





# In[2]:


import unittest
import numpy as np
import pandas as pd
from montecarlo_simulator.montecarlo import Die, Game, Analyzer

class TestDie(unittest.TestCase):
    def setUp(self):
        self.faces = np.array([1, 2, 3, 4, 5, 6])
        self.die = Die(self.faces)

    def test_initialization(self):
        with self.assertRaises(TypeError):
            Die([1, 2, 3])  # Not a NumPy array
        with self.assertRaises(ValueError):
            Die(np.array([1, 1, 2]))  # Non-unique faces

    def test_change_weight(self):
        self.die.change_weight(1, 3.0)
        self.assertEqual(self.die.show().at[1, 'weight'], 3.0)
        with self.assertRaises(IndexError):
            self.die.change_weight(7, 2.0)  # Face does not exist
        with self.assertRaises(TypeError):
            self.die.change_weight(1, 'high')  # Invalid weight type

    def test_roll(self):
        outcomes = self.die.roll(10)
        self.assertEqual(len(outcomes), 10)
        self.assertTrue(all(face in self.faces for face in outcomes))

    def test_show(self):
        state = self.die.show()
        self.assertEqual(state.shape, (6, 1))  # 6 faces, 1 weight column


class TestGame(unittest.TestCase):
    def setUp(self):
        self.die1 = Die(np.array(['A', 'B', 'C', 'D', 'E', 'F']))
        self.die2 = Die(np.array(['A', 'B', 'C', 'D', 'E', 'F']))
        self.game = Game([self.die1, self.die2])

    def test_initialization(self):
        with self.assertRaises(TypeError):
            Game(['not_a_die'])  # Invalid list of Die objects

    def test_play(self):
        self.game.play(5)
        results = self.game.show()
        self.assertEqual(results.shape, (5, 2))  # 5 rolls, 2 dice

    def test_show(self):
        self.game.play(5)
        wide_results = self.game.show('wide')
        self.assertEqual(wide_results.shape, (5, 2))
        narrow_results = self.game.show('narrow')
        self.assertEqual(narrow_results.shape, (10, 1))  # 5 rolls * 2 dice
        with self.assertRaises(ValueError):
            self.game.show('invalid')  # Invalid form argument


class TestAnalyzer(unittest.TestCase):
    def setUp(self):
        die1 = Die(np.array(['A', 'B', 'C']))
        die2 = Die(np.array(['A', 'B', 'C']))
        self.game = Game([die1, die2])
        self.game.play(5)
        self.analyzer = Analyzer(self.game)

    def test_initialization(self):
        with self.assertRaises(TypeError):
            Analyzer('not_a_game')  # Not a Game object

    def test_jackpot(self):
        jackpots = self.analyzer.jackpot()
        self.assertTrue(isinstance(jackpots, (int, np.integer)), f"Expected jackpots to be an int, but got {type(jackpots)}")
        self.assertGreaterEqual(jackpots, 0)

    def test_face_counts_per_roll(self):
        face_counts = self.analyzer.face_counts_per_roll()
        self.assertEqual(face_counts.shape[0], 5)  # 5 rolls
        self.assertTrue(all(col in ['A', 'B', 'C'] for col in face_counts.columns))

    def test_combo_count(self):
        combo_counts = self.analyzer.combo_count()
        self.assertIsInstance(combo_counts, pd.DataFrame)

    def test_permutation_count(self):
        permutation_counts = self.analyzer.permutation_count()
        self.assertIsInstance(permutation_counts, pd.DataFrame)

    def test_scrabble_word_analysis(self):
        scrabble_words = ['AA', 'BB', 'CC']
        analysis = self.analyzer.scrabble_word_analysis(scrabble_words)
        self.assertEqual(analysis.shape[0], 5)  # 5 rolls
        self.assertIn('Is_Valid_Word', analysis.columns)

    def test_letter_frequency_analysis(self):
        letter_frequencies = {'A': 8, 'B': 3, 'C': 5}
        frequency_analysis = self.analyzer.letter_frequency_analysis(letter_frequencies)
        self.assertTrue(all(col in ['Rolled_Count', 'Expected_Frequency'] for col in frequency_analysis.columns))


if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False)

'''

## Unittest Results (3)

Put a copy of the results of running your tests from the command line here.

Again, paste as text using triple backticks.

- All 12 specified methods return OK (3; .25 each).

## Import (1)

Import your module here. This import should refer to the code in your package directory.

- Module successuflly imported (1).

In [6]:
# e.g. import montecarlo.montecarlo 


## Help Docs (4)

Show your docstring documentation by applying `help()` to your imported module.

- All methods have a docstring (3; .25 each).
- All classes have a docstring (1; .33 each).

In [4]:
# help(montecarlo)

## `README.md` File (3)

Provide link to the README.md file of your project's repo.

- Metadata section or info present (1).
- Synopsis section showing how each class is called (1). (All must be included.)
- API section listing all classes and methods (1). (All must be included.)

URL:

## Successful installation (2)

Put a screenshot or paste a copy of a terminal session where you successfully install your module with pip.

If pasting text, use a preformatted text block to show the results.

- Installed with `pip` (1).
- Successfully installed message appears (1).

# Scenarios

Use code blocks to perform the tasks for each scenario.

Be sure the outputs are visible before submitting.

## Scenario 1: A 2-headed Coin (9)

Task 1. Create a fair coin (with faces $H$ and $T$) and one unfair coin in which one of the faces has a weight of $5$ and the others $1$.

- Fair coin created (1).
- Unfair coin created with weight as specified (1).

Task 2. Play a game of $1000$ flips with two fair dice.

- Play method called correclty and without error (1).

Task 3. Play another game (using a new Game object) of $1000$ flips, this time using two unfair dice and one fair die. For the second unfair die, you can use the same die object twice in the list of dice you pass to the Game object.

- New game object created (1).
- Play method called correclty and without error (1).

Task 4. For each game, use an Analyzer object to determine the raw frequency of jackpots — i.e. getting either all $H$s or all $T$s.

- Analyzer objecs instantiated for both games (1).
- Raw frequencies reported for both (1).

Task 5. For each analyzer, compute relative frequency as the number of jackpots over the total number of rolls.

- Both relative frequencies computed (1).

Task 6. Show your results, comparing the two relative frequencies, in a simple bar chart.

- Bar chart plotted and correct (1).

## Scenario 2: A 6-sided Die (9)

Task 1. Create three dice, each with six sides having the faces 1 through 6.

- Three die objects created (1).

Task 2. Convert one of the dice to an unfair one by weighting the face $6$ five times more than the other weights (i.e. it has weight of 5 and the others a weight of 1 each).

- Unfair die created with proper call to weight change method (1).

Task 3. Convert another of the dice to be unfair by weighting the face $1$ five times more than the others.

- Unfair die created with proper call to weight change method (1).

Task 4. Play a game of $10000$ rolls with $5$ fair dice.

- Game class properly instantiated (1). 
- Play method called properly (1).

Task 5. Play another game of $10000$ rolls, this time with $2$ unfair dice, one as defined in steps #2 and #3 respectively, and $3$ fair dice.

- Game class properly instantiated (1). 
- Play method called properly (1).

Task 6. For each game, use an Analyzer object to determine the relative frequency of jackpots and show your results, comparing the two relative frequencies, in a simple bar chart.

- Jackpot methods called (1).
- Graph produced (1).

## Scenario 3: Letters of the Alphabet (7)

Task 1. Create a "die" of letters from $A$ to $Z$ with weights based on their frequency of usage as found in the data file `english_letters.txt`. Use the frequencies (i.e. raw counts) as weights.

- Die correctly instantiated with source file data (1).
- Weights properly applied using weight setting method (1).

Task 2. Play a game involving $4$ of these dice with $1000$ rolls.

- Game play method properly called (1).

Task 3. Determine how many permutations in your results are actual English words, based on the vocabulary found in `scrabble_words.txt`.

- Use permutation method (1).
- Get count as difference between permutations and vocabulary (1).

Task 4. Repeat steps #2 and #3, this time with $5$ dice. How many actual words does this produce? Which produces more?

- Successfully repreats steps (1).
- Identifies parameter with most found words (1).