There are classical effective time series models that deal with both seasonal and nonseasonal data, such as ARMA(autoregressive–moving-average) and ARIMA(autoregressive integrated moving average). In practice, I found it hard to apply these standard models, due to different objectives and more seriously, very fragmented and nonuniform times series data. 

This showed up in two recent Kaggle competitions, [Santander](https://www.kaggle.com/c/santander-value-prediction-challenge/discussion) and [Home Credit](https://www.kaggle.com/c/home-credit-default-risk). For each customer, there are many time-stamped data such as transaction amount and payment and overdue information. The goal can be, for example, to predict the next nonzero transaction amount or the whether the customer will pay debt on time next time. However, data can be highly fragmented, i.e. data are only available for certain nonconsecutive past months. Also, data are highly nonuniform, i.e. the availability of data varies a lot across customers. This is either due to some customers having more transactions to begin with, or because there were issue getting data for certain customers. In both these cases, only relative time is given. The current time for different samples might be different but unknown.

I came up with a very simple way to extract features out of such messy situations. They are not necessarily all useful, but machine learning models can automatically pick up the useful ones. What is important is that we are able to manufacture reasonably meaningful features very easily. These extra features allowed me to move up the Kaggle leaderboard by thousands.

---
The starting point is the basic idea that more recent data in these cases should matter more, so we give more recent data higher exponential weights. To deal with the previously mentioned issues caused by fragmented and nonuniform data, we compute the weighted average of only the available data, and also treat the denominator and numerator as new features as well. The denominator is analogous to "count" in plain aggregation, and the numerator is analogous to "sum".

The problem left now is how to choose the weighting. Due to the bad behavior of the data (many customers may have only a few nonzero data entries, as opposed to some others with hundreds of pieces of data history), it is not possible to run a typical time series model to train the weighting. Therefore we just handpick a set of decay constants and produce a set of features from each of them. In practice, I chose the following set:

`decay_rates = [0.99, 0.95, 0.90, 0.82, 0.67]`

We can compute their relavance range, i.e. how many months we need to go back so that the weight is below 0.1.

* $\frac{log(0.1)}{log(0.99)} \approx 229$ (so this is almost like a plain aggregation, which is not that useful)

* $\frac{log(0.1)}{log(0.95)} \approx 45$ (about 4 years back)

* $\frac{log(0.1)}{log(0.90)} \approx 22$ (about 2 years back)

* $\frac{log(0.1)}{log(0.82)} \approx 12$ (about 1 year back)

* $\frac{log(0.1)}{log(0.67)} \approx 6$ (about 6 months back)

Therefore they cover a wide range of relavant history lengths. They might be all useful to some extent, but, for example in decision trees, have different interactions with other features.

---
Now we go to the implementation. We take advantage of pandas' great `groupby` method. Suppose our data contains the following columns:

* `CustomerID`, the unique ID of the customer associated to the row. There can be many rows corresponding to an ID.

* `Relative_Month`, the number of months back in time from the current time of that row.

* `Data`, the relevant data of that row. There can be multiple other data columns. *Assume that irrelevant zero rows have already been removed.*

In [1]:
import numpy as np
import pandas as pd
import warnings; warnings.simplefilter('ignore')

In [2]:
def compute_weighted_info(df, rates, name=None):    
    agg_dict=[]
    for r in rates:
        df['Weight_'+str(r)] = card['Relative_Month'].map(lambda x : r**(-x))
        df['Data_'+str(r)] = card['Data'] * card['Data_'+str(r)]
                
        agg_dict.append(('Weight_'+str(r),'sum'))
        agg_dict.append(('Data_'+str(r),'sum'))
        # More items can be calculated, such as weighted max/min
        
    gp = card.groupby('CustomerID')    
    result = pd.DataFrame(index = gp.groups.keys())
    for key, func in agg_dict:
        result[key+"_"+func] = gp[key].agg(func)
    for r in rates:
        result['Data_'+str(r)+'_Average'] = result['Data_'+str(r)+
                                                   '_sum'] / result['Weight_'+str(r)+'_sum'] 
    
    result.rename(columns=lambda x:name+"_"+x, inplace = True)
    return result

In [None]:
decay_rates = [0.99, 0.95, 0.90, 0.82, 0.67]
df = pd.read_csv('chart1.csv')
aggregated = compute_weighted_info(df, decay_rates, name = 'chart1')