# 1. Population Genetics Simulation

Create a program that simulates the allelic frequency in a finite diploid population for a certain number of generations.

The program takes as input the initial allele frequencies, the fitness of each genotype, the population size, and the number of generations. Because these simulations are stochastic each run of the simulation will give a different result, to allow an idea of the behavior of the allelic frequencies, your program should repeat the simulations many times for each parameter set and plot all the results in a single graph. The number of simulations should also be determined by the user. You can start your program using the variable definitions in the cell below.

Your program should output two graphs. The first should show the allele frequency at each generation, and the other should be a histogram with the final values of the allele frequency. Something like this:

![simulation](Sim1.png)

![histogram](Sim2.png)

Last year a student used this homework as the starting point for her project to create a population genetics simulator for BIOL040. You can see the final project here: http://dna.pomona.edu:5006/pop_gen_sim

In [154]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px


def a_simulator(number_of_generations, popsize, simulations, WAA, WAa, Waa, frequency_A):
    """Callable function 'a_simulator' which takes as inputs the number of generations, population size, number of simulationos, 
    fittnes and frequency and returns a graph showing the allele frequency at each generation, and a histogram with the final values 
    of the allele frequency."""
    simulation_results = []
    for sim in range(simulations):
        allele_freqs = [frequency_A] 
        for generation in range(1, number_of_generations):
            FA = allele_freqs[-1]
            Fa = 1 - FA
            
            # Genotype frequencies
            fAA = FA**2
            fAa = 2 * FA * Fa
            faa = Fa**2
            
            M = WAA*fAA + WAa*fAa + Waa*faa

            # Frequency of each genotype in the next gen
            ffAA = (WAA* fAA) / M
            ffAa = (WAa * fAa) / M
            ffaa = (Waa * faa) / M
            
            # expected allele frequencies
            FA_freq = ffAA + (0.5 * ffAa)
            
            # Use binomial distribution to account for random fluctuations
            next_gen_freq = np.random.binomial(popsize, FA_freq) / popsize
            allele_freqs.append(next_gen_freq)
        simulation_results.append(allele_freqs)

    ##
    # FIGURES
    ##
        
    fig = go.Figure()

    # make a line for each sim
    for i, freqs in enumerate(simulation_results):
        fig.add_trace(go.Scatter(
            x=list(range(number_of_generations)),
            y=freqs,
            mode='lines',
            name=f"Simulation {i+1}"
        ))

    # formatting
    fig.update_layout(
        plot_bgcolor='white',
        width=800,
        height=500
    )
    fig.show()


    # Histogram
    final_frequencies = [freqs[-1] for freqs in simulation_results]  # take the last freq from each simulation
    df_hist = pd.DataFrame({'A Frequency': final_frequencies})  # create df with final freqs
    fig_hist = px.histogram(df_hist, x='A Frequency', nbins=30)
    fig_hist.update_layout(
        width=800,
        height=500,
        plot_bgcolor='white'
    )
    fig_hist.show()

# call the function with fitness values of 1
a_simulator(100, 1000, 80, 1, 1, 1, 0.5)