# Sidekick - Formal Definition of the Problem

### Single-Project Regression
#### Model
We are approaching the problem as time series regression, considering only one project. Our dataset $\mathcal{D} = \left\{ (x_i, y_i) \mid i = 1, ..., n \right\}$ consists of $N$ observations, with $x_i$ the time index of the amount of money $y_i$. Hence, we have $X = [1, ..., N]^T$ an $(Nx1)$ matrix of time indices and $\mathbf{y} = [y_1, ..., y_N]^T$ and vector of observed values. We model the pledged money $f(\mathbf{x})$ at time indices $\mathbf{x}$ as a Gaussian Process:

$$f(\mathbf{x}) \sim GP \left( m(\mathbf{x}), k(\mathbf{x}, \mathbf{x'}) \right). $$

Our goal is to predict the future values of the pledged money $\mathbf{f}_* = \mathbf{f}_{t:N} = f(X_{t:N})$ at future time indices $X_* = X_{t:N} = [t, ..., N]^T$ after observing the values $\mathbf{y} = \mathbf{y}_{1:t} = [y_1, ..., y_t]^T$ at time indices $X = X_{1:t} = [1, ..., t]$. In the GP framework, we can compute this prediction using

$$\mathbf{f}_* \mid X, \mathbf{y}, X_* \sim \mathcal{N}\left(\overline{\mathbf{f}}_*, \text{ cov}(\mathbf{f}_*)  \right) \\
\overline{\mathbf{f}}_* = K(X_*, X) \left[ K(X, X) + \sigma_n^2I \right]^{-1}\mathbf{y} \\
\text{ cov}(\mathbf{f}_*) = K(X_*, X_*) - K(X_*, X)\left[ K(X, X) + \sigma_n^2I \right]^{-1}K(X, X_*).
$$ 

Finally, the kernel's (hyper)parameters $\theta$ are learned by maximizing the *log  marginal likehood*

$$\theta_* = \underset{\theta} {argmax} \log p(\mathbf{y} \mid X, \theta).$$

#### Results
The major problem in this context was that the predictive mean $\mathbf{f}_*$ always falls back to the mean $m(\mathbf{x})$ very quickly. One solution has been to combine two squared-exponential kernels and initializing one of them to a large length-scale in order to capture the global trend. This yields to some reasonable result. However, when applying the same model to another ones (($\theta$ learned over one project and used to predict another) gives very poor performance.


### Multi-Project Regression
#### Model
Hence, our next idea is to consider several projects at the same time and try to learn the hyperparameters $\theta$ over various time series. Our dataset is now $\mathcal{D} = \left\{ (x_i, \mathbf{y}_i) \mid i = 1, ..., n \right\}$ 









For a given project $n$, we are trying to predict the pledged money $\mathbf{f}_*^{(n)} = \mathbf{y}_{t:T}^{(n)}$ at future time indices $\mathbf{x}_*^{(n)} = \mathbf{x}_{t:T}^{(n)} = \mathbf{x}_{t:T} = [t,...,T]^T$ after observing the values $\mathbf{y}^{(n)} = \mathbf{y}_{1:t}^{(n)}$ at time indices $\mathbf{x} = \mathbf{x}_{1:t}$. 
$$\mathbf{f}_{t:T}^{(n)} \mid \mathbf{x}_{1:t}, \mathbf{y}_{1:t}, \mathbf{\theta} \sim GP(\mathbf{\bar{f}_{t:T}^{(n)}}, cov(\mathbf{f}_{t:T}^{(n)})), $$

The hyperparameters $\mathbf{\theta}$ of the GP are learned over $X$ an $(Mx1)$ matrix of input variables and $Y$ an $(MxN)$ matrix of observations corresponding to the $M$ inputs for $N$ projects, by maximizing the marginal log-likelihood:

$$\theta^* = \underset{\theta} {argmax} \log p(Y \mid X, \theta),$$

where 

$$X = \mathbf{x} = [1, 2, 3, ..., M]^T$$
$$Y = \left[y_{11}, y_{12}, ..., y_{1N} \right].$$

Formally, we were computing:

$$p(\mathbf{f}_{t:T}^{(n)} \mid \mathbf{x}_{1:t}, \mathbf{y}_{1:t}, \mathbf{\theta})$$


where  was learned 

We couldn't obtain good results with this approach, as it seems that the hyperparameters $\theta$ are not learned well. Indeed, the predictions always fall back very quickly to the mean of the GP. By initializing one of the kernel with a large lenghtscale, we could obtain the general trend for one project and hence obtain some quite good results, but this approach didn't work when considering several projects for learning or trying to do the predictions for another project.

Therefore, we tried a second approach, namely classification of successful projects. We train one GP on the successful projects, one on the failed projects and try to determine whether a new project will be successful or not. To do so, we would learn $\theta_s$ the hyperparameters of a GP over the *successful* projects only and $\theta_f$ the hyperparameters of a GP over the *failed* projects. Formally, we maximize the marginal log-likelihoods

$$\theta_s^* = \underset{\theta} {argmax} \log p(Y_s \mid X_s, \theta)$$
$$\theta_f^* = \underset{\theta} {argmax} \log p(Y_f \mid X_f, \theta)$$

where $s$ and $f$ denotes the data of the successful and failed projects respectively. We then determine the class of a new, partially observed project as

$$c_{*} = s\text{ if } \log p(\mathbf{y}_{1:t}^{(n)} \mid \mathbf{x}_{1:t}, \theta_s) > \log p(\mathbf{y}_{1:t}^{(n)} \mid \mathbf{x}_{1:t} \mid \theta_f)$$




Instead of using the time as input and trying to predict the output at new time indices, we now consider the value at each time step as input (`y` becomes `x`) and the last time index as the output. 