# Generating Data
Lets say you want to generate synthetic data of the number of new visitors to an Emergency Department over time.  
based on the mean number of visitors
for that we would need to 
- Create the date range
- Create the random data based on the poisson distribution

In [None]:
import pandas as pd
import numpy as np

### The date range
use the `date_range()` function see [docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html)


be aware of american formatting

In [None]:
pd.date_range(start='1/1/2018', end='1/08/2018')

I would strongly suggest that you use the formatt year first.

In [None]:
pd.date_range(start='2023-01-01', end='2023-01-04')

You can use freq to say that we want hourly data

In [None]:
pd.date_range(start='2023-01-01', end='2023-01-04', freq="H")

Lets store this in a variable that I will make the index of my Data Frame

In [None]:
idx = pd.date_range(start='2023-01-01', end='2023-01-04', freq="H")

ok lets generate some ramdom data (the same lenght of the idx)


In [None]:
rng = np.random.default_rng()

In [None]:
df = pd.DataFrame(index = idx, data = rng.random(len(idx)), columns=["admissions"])
df.head(3)

Ok that does not look like proper data. (hard to treat .3 of a person :) )  
so we need a discrete set of random numbers that match a mean....... so we use the *poisson distribution*  

see [docs](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.poisson.html)  
also more information https://en.wikipedia.org/wiki/Poisson_distribution


In [None]:
mean_admissions = 15


we will use poisson distribution  
`poisson(lam=the_mean, size= number_to_return)`

In [None]:
df = pd.DataFrame(index = idx, data = rng.poisson(lam=mean_admissions, size= len(idx)), columns=["admissions"])
df.head(3)

In [None]:
import seaborn as sns

In [None]:
sns.lineplot(data=df,x = df.index, y="admissions")

In [None]:
df["admissions"].mean()

#### This does not take into account the time of the day 
we can pass in an array into lam to reflect the changing averages though out the day


In [None]:
x = np.linspace(-np.pi, np.pi, 24)
means = np.sin(x) +15
import matplotlib.pylab as plt
plt.plot(x, means)

In [None]:
df = pd.DataFrame(index = idx, data = rng.poisson(lam=means, size= len(idx)), columns=["admissions"])
df.head(3)

In [None]:
len(idx)

In [None]:
#x = np.linspace(-np.pi, 3*np.pi, len(73))
number_ofdays = len(idx)/24
multiplier =( number_ofdays *2) -1
x = x = np.linspace(-np.pi, multiplier*np.pi , len(idx))
admissions_means = (np.sin(x) * (mean_admissions/2)) + mean_admissions
import matplotlib.pylab as plt
plt.plot(x, admissions_means)


In [None]:
df = pd.DataFrame(index = idx, data = rng.poisson(lam=admissions_means, size= len(idx)), columns=["admissions"])
df.head(3)

In [None]:
sns.lineplot(data=df,x = df.index, y="admissions")

### References
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html
- https://en.wikipedia.org/wiki/Poisson_distribution
- https://www.w3schools.com/python/numpy/numpy_random_poisson.asp
- https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.poisson.html