Small snippet to illustrate how to get a timeseries data that is grouped by date, to be massaged to fill missing dates and then presented as calendar of weeks with _Monday_ as starting day of week. 

This can be done in SQL too with similar logic. PostgreSQL gives you _values()_ and _generate_series_ functions that can be used to make the left pseudo table of all dates in the range. It is a bit hard in databases that don't support it - either you need to have a temporary table or have an inner sql which is a massive union like below.
```sql
select dts.dt, x.v from 
(
select '2018-07-02' dt union
select '2018-07-03' dt union
...
) as dts left join (
select dt, sum(metric) as v from mytable group by dt
) as x on dts.dt = x.dt
```
Assuming one has to do multiple slice and dice; and SQL can tend to be expensive over large sets, I prefer running just one group by in SQL and get the data into a csv and then do pandas to run on that in various ways.

In [1]:
import pandas as pd
import numpy as np

In [2]:
data_file = 'pandas-timeseries-data.csv' 
#assumes 2 columns viz dt and v. dt is date and v is the value; in this case, number of visitors
#obviously on holidays, there are no records. Perfect candidate for illustration of holes in matrix.
start_week_date='20180702'  #ensure this is a monday
how_many_weeks=6

### Make Data Frame
There is no need to have a separate key; we will use the date as the key.

In [3]:
# read data frame
dfi = pd.read_csv(data_file)  
# let us fix the input key from yymmdd to yyyy-mm-dd aka date
# we can fix it using a lambda in date_converter in read_csv too
dfi['dt'] = pd.to_datetime(dfi.dt, format='%y%m%d')
# and make that as the index; no need for pseudo index
dfi = dfi.set_index('dt')
dfi.head()

Unnamed: 0_level_0,v
dt,Unnamed: 1_level_1
2018-07-02,163
2018-07-03,220
2018-07-04,65
2018-07-05,70
2018-07-06,58


### Left side of all dates

In [4]:
# fill up weekly matrix
# starting daterange has to be monday
dates = pd.date_range(start_week_date, periods=how_many_weeks*7)
dfd = pd.DataFrame(0, index=dates, columns=['v'])
dfd.index.name = 'dt'
dfd.head()

Unnamed: 0_level_0,v
dt,Unnamed: 1_level_1
2018-07-02,0
2018-07-03,0
2018-07-04,0
2018-07-05,0
2018-07-06,0


### Merge data from actuals

In [5]:
#now, do a left join with actual data
dfd = dfd.merge(dfi, how='left',on='dt')
del dfd['v_x']
dfd.columns = ['v']
dfd['v'].fillna(0, inplace=True)
dfd.v = dfd.v.astype(int)   #nuisance remover. we won't have float.
dfd.head()

Unnamed: 0_level_0,v
dt,Unnamed: 1_level_1
2018-07-02,163
2018-07-03,220
2018-07-04,65
2018-07-05,70
2018-07-06,58


### Make a new dataframe by week

In [6]:
# make a dataframe that has distribution by week
wkv = dfd.values.reshape(how_many_weeks,7) 
dfw = dfd[dfd.index.weekday_name == 'Monday']
dfw = pd.DataFrame(wkv, index=dfw.index, columns=['mon','tue','wed','thu','fri','sat','sun'])
dfw.index.name='week'
dfw.sort_index(ascending=False, inplace=True) #want to see latest first

In [7]:
dfw

Unnamed: 0_level_0,mon,tue,wed,thu,fri,sat,sun
week,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2018-08-06,108,16,156,54,61,4,0
2018-07-30,313,189,29,38,50,13,8
2018-07-23,303,54,73,64,63,54,0
2018-07-16,45,194,119,88,111,2,0
2018-07-09,81,61,31,33,111,0,0
2018-07-02,163,220,65,70,58,1,0


### TODO: Add a weekly total

### TODO: Add average for weekdays

### TODO: Detect Anomalies
eg: Week of 7-30, there were visitors on a Sunday. That Saturday and previous Saturday, plenty of visitors compared to usual.