##### The Game of Life: Analyzing the Impact of Initial Conditions on Game Duration 
- This project simulates John Conway's Game of Life through different matrix sizes and initial percent of alive cells
- The primary goal of this project is to understand how these different starting conditions impact the behavior of the simulation, including the duration of the simulation and end state of the game.
- By comparing different groupings, the objective is to determine the statistical significance between the starting conditions and how they influence the outcomes.


#### Project File Contents
-  Module Folders
    - There are three module folders in the project file; analysis, simulation and visulization. Each contain the following functions
      - analysis
        - cohens_f(f_stat, df_groups, df_obs) 
          - Calculates Cohen's f, a measure of effect size for ANOVA. Takes the F-statistic and degrees of freedom for groups and observations to compute the proportion of variance explained and return the corresponding effect size. 
    
        - anova_analysis(df) 
          - Performs one-way ANOVA for each unique matrix size in the dataset to test if the mean number of generations differs significantly between initial percent alive groups. Also calculates Cohen’s f for each matrix size and returns a summary DataFrame containing the F-statistic, p-value, and Cohen’s f. 

        - calc_matrix_size_and_initial_percent_alive_group_stats(df) 
          - Computes descriptive statistics including mean, standard deviation, standard error, and 95% confidence interval for the number of generations per matrix size and initial percent alive group based on simulation results. Returns a sorted DataFrame by confidence interval width. 

        - determine_sample_size(df) 
          - Uses the Cohen's f values from ANOVA results to calculate the minimum sample size required for each matrix size group to detect an effect with 90% power at a 5% significance level using F-test power analysis. Returns a dictionary mapping matrix size to required sample size. 
    
        - required_n_per_group(grouped_stats, min_sample_size ,half_width=10, confidence=0.95)
          - Estimates the sample size required for each group to achieve a confidence interval width no larger than a specified value (default: ±10), given the group's standard deviation and desired confidence level. Returns a dictionary with keys of matrix size and initial percent alive with required sample sizes. 
          - Selects the appropriate critical value based on the current sample size: 
            - If n >= 30, the z-statistic is used. 
            - If n < 30, the t-statistic with appropriate degrees of freedom is used. 
        - sample_v_final_stat_comparison(sample, final) 
          - Compares group statistics between sample and final simulation results. Calculates and returns deltas for means, standard deviations, standard errors, and confidence interval widths to observe and evaluate the impact of using optimized sample sizes. 
      - simulation
        - generate_rand_matrix(rows, cols, p_ones) 
          - Creates a binary matrix of size (rows × cols) where a proportion p_ones of cells are alive (1) and the rest are dead (0), randomly distributed. Used to simulate initial conditions. 
        - get_neighbors(mat) 
          - Builds a dictionary mapping each cell’s coordinates to a list of values of neighbors 
        - next_gen(mat) 
          - Generates the next state of the matrix using Conway’s Game of Life rules. Applies birth, survival, and death logic based on sum of neighbor to return the new matrix. 
        - matrix_data_collection(mat, gen_num, run_num, collection_dict, alive_percent, matrix_size) 
          - Records the number of alive cells and simulation description data at a given generation. Updates a shared dictionary with values for later aggregation and analysis. 
        - run_game(mat, run_num, matrix_size, master_data_collection_dict, alive_percent, max_steps=1000) 
          - Simulates Conway’s Game of Life from a given matrix until reaching a steady state, all cells are dead, or the maximum number of steps is reached. Logs generation-level and run-level data into a shared dictionary. 
        - sample_runs(number_of_sample_per_matrix_size_and_percent_alive, max_steps) 
          - Executes a fixed number of simulations for every combination of matrix size (10×10 to 100×100) and initial alive percentage (5% to 95%). Returns a DataFrame with merged generation and run data across all trials. 
        - final_run(dicts, max_steps) 
          - Runs simulations using optimized sample sizes per condition (matrix size and initial percentage alive), as determined from prior analysis. Returns a DataFrame of merged results for final evaluation.

      - visulization
        - show_end_state_circle_chart(df) 
          - Creates a donut-style pie chart showing the distribution of how simulations terminated: 
            - dead– all cells died out
            - steady"– reached a steady state
            - max – hit the maximum number of allowed generation 
        - show_random_sample_runs(df) 
          - Selects 6 random simulation runs with more than one generation and plots line charts of alive cell counts over generations. Subplots are labeled with matrix size, initial alive percentage, and termination reason. 
        - show_heat_map(df)
          - Generates a heatmap showing the average number of generations until termination for each combination of matrix size and initial percent alive groups 
            - matrix size (y-axis) 
            - Initial alive cell percentage (x-axis) 
        - show_initial_percent_alive_ci_avg(df) 
          - Calculates and plots the average 95% confidence interval width for each matrix size and initial percent alive group. Gives a sense of variability and across trials per group. 
        - show_ci_delta(df, top_bottom, n) 
          - Plots a bar chart of the top or bottom n matrix size and initial alive percentage group based on largest position of negative change in confidence interval width between sample and final simulation
        - difference_in_average_ci_percent_alive_groups(df,df2) 
          - Compares the difference between average confidence interval width of the sample outputs (df) to the average confidence interval width of the final outputs (df2) 
          - Plots a bar chart for ease of comparison as well as a table of results

- simulation_run_notebook
  - A Jupyter Notebook that walks through the full workflow of the project:
    - Generates the initial set of simulation runs using sample_runs()
    - Analyzes results and determines optimal sample sizes per group using 
    - calc_matrix_size_and_initial_percent_alive_group_stats() and required_n_per_group()
    - Executes the final simulations with final_run()
    - Optionally visualizes and compares initial and final simulation results using functions from the visualization module
    

##### How to Execute the Program
1. Download the Project
    - Ensure the simulation, analysis and visualization modules are downloaded into your directory
2. Create a new virtual environment using either venv or conda
  - Ensure python 3.13.1 is used in your environment
3. Install all required python packages using the requirements.txt file
  - `pip install -r requirements.txt`
4. Import the following essential functions from their respective modules into either a Jypter Notebook or Python file
  - `from simulation import generate_rand_matrix,get_neighbors,next_gen,matrix_data_collection,run_game,sample_runs, final_run`
  - `from analysis import calc_matrix_size_and_initial_percent_alive_group_stats,required_n_per_group`
5. Running the Simulation
    - Initial Sampling: 
      - Call sample_runs() with your desired number of samples per group and the maximum number of generations for each run. 
      - `sample_data = sample_runs(samples_per_group=10, max_generations=100) `
    - Calculate Group Statistics: 
      - Pass the simulation results to calc_matrix_size_and_initial_percent_alive_group_stats() to compute group mean, standard deviation, variance, and 95% confidence intervals. 
      - `stats = calc_matrix_size_and_initial_percent_alive_group_stats(sample_data)` 
    - Determine Sample Sizes: 
      - Feed the statistics into required_n_per_group() to calculate how many runs are needed to tighten the confidence intervals to an acceptable width. 
        - `required_sample_sizes = required_n_per_group(stats) `
    - Final Simulation: 
      - Run final_run() with the optimized sample sizes to generate final output. 
        - `final_run(required_sample_sizes) `
4. Optional: Visualizations and Analysis
    - You can use any of the available visualization and analysis functions by passing in the outputs from either the sample or final simulation. While these are not essential to running the simulation, they are useful for exploring and presenting the results. 

