Yield curve interpolation using the Nelson-Siegel parametric family
====================================

This jupyter notebook demostrates the Nelson-Siegel parametric family and its usage for interpolation of the yield curve. The notebook is a suplement to the report Ayliffe and Rubin [1] which grew as an extension of Kelly Ayliffe's semester project at Ecole Polytechnique Fédéral de Lausanne, Switzerland under the supervision of Tomas Rubin.

Contacts:
- Ayliffe Kelly, kellyayliffe@gmail.com
- Tomas Rubin, tomas.rubin@gmail.com


Nelson-Siegel factor loading curves
-----------------

The Nelson-Siegel (Nelson and Siegel, 1987) parametric family includes 3 functions, technically called factor loading curves, often used for parsimonious interpolation of yields. 
The 3 Nelson-Siegel factor loading curves depending on the parameter \\(\lambda > 0\\) are defined as
1. **The level function** is constant and determines the long-rung level of the yields
2. **The slope function** is defiend as \\(  (1-e^{-\lambda\tau})(\lambda\tau)^{-1} \\) starts at one but rapidly decrease towards zero. Therefore it determines the short-term yields only.
3. **The curvature function** is defiend as \\(  (1-e^{-\lambda\tau})(\lambda\tau)^{-1} - e^{-\lambda\tau} \\) increases from zero (thus not affecting the short maturities), peaks and then rapidly decreases again. It is therefore responsible for the medium-term maturities.

The three factor loading curves are visualised bellow with varying parameter \\(\lambda > 0\\). We visualise three choices
1. \\(\lambda^{user} \\) is a value of the parameter that the user can choose here how he or she wishes for exploring its impact
2. \\( \lambda^{our} = 0.496 \\) is the value of the parameter determined to be optimal in terms of ordinary least squares fit. Details in presented in our report Ayliffe and Rubin [1].
3. \\( \lambda^{DL} = 0.0609 \\) is the value of the parameter recommended by Diebold and Li [2]. They determined this value by determining the location of the maximum curvature.

The yield curve expressed in Nelson-Siegel parametric family is than of the form
\\[ y(\tau) = L + S \left( \frac{1-e^{-\lambda\tau}}{\lambda\tau} \right) + C \left( \frac{1-e^{-\lambda\tau}}{\lambda\tau} -e^{-\lambda\tau}\right) \\]
where the coefficients \\( L,S,C \\), called the factor, are called **the level**, **the slope**, and **the curvature** respectively. In order to fit

Fitting the Nelson-Siegel parametric family
--------

The classical approach is to fit the Nelson-Siegel parametric family individually for each cross-section, i.e. the yields available at a given day.
Denote \\( y(\tau_i) \\) the yield observed at the maturity \\( \tau_i \\) where \\( i =1,\dots,n\\) ranges through available maturities. For the US threasury, \\[ (\tau_1, \dots, \tau_{11}) = ([1/12,3/12,6/12,1,2,3,5,7,10,20,30)  \\] expressed in years.
The estimation of the factors \\( L,S,C \\) is performed by the ordinary least squares fitting:
\\[ (L,S,C) = \arg\min_{l,s,c} \sum_{i=1}^n \left( y(\tau_i) -
l + s \left( \frac{1-e^{-\lambda\tau_i}}{\lambda\tau_i} \right) + c \left( \frac{1-e^{-\lambda\tau_i}}{\lambda\tau_i} -e^{-\lambda\tau_i}\right)
\right)^2 \\]

References
-----
[1] Ayliffe, Kelly, and Rubin, Tomas (2020). "A Quantitative Comparison of Yield Curve Models in the MINT Economies." *EPFL Infoscience.* URL: https://infoscience.epfl.ch/record/279314

[2] Diebold, Francis X., and Canlin Li. "Forecasting the term structure of government bond yields." Journal of econometrics 

In [None]:
# USER INPUT HERE
user_lambda = 0.1 # choose your value of lambda, should be in (0,+inf)

Import packages and data
---------------

In [None]:
# IMPORT PACKAGES
import numpy as np
import math
import pandas as pd
import matplotlib.pyplot as plt
from yield_curve_functions import DNS_OLS, DNS_formula # import our custom functions

In [None]:
# IMPORT DATA
y_df = pd.read_excel('US_daily.xlsx', columns = ['dates',1/12,3/12,6/12,1,2,3,5,7,10,20,30], index='dates');
y = y_df.to_numpy()
matu = np.array([[1/12,3/12,6/12,1,2,3,5,7,10,20,30]])

dates = y[:,0]

current=1 #this variable keeps track of the dates in the original dataset that have already been added. It is a row index in the original table.
currentDate = np.datetime64(dates[0]) # this variable keeps track of all dates that need to be added.

# The following two tables will be concatenated horizontally to create the full, new dataset
CompleteTable = np.array([y[0,1:]]) #Table with added yields (has copied lines where extra dates have been added)
CompleteDates = np.array([[currentDate]], dtype='datetime64') #Will be the full dates column

AddDay = np.timedelta64(1,'D')

cdnp = np.array([[currentDate]],dtype='datetime64') #single entry array. Used to have a compatible format (np.array) for adding the dates to CompleteDates.

while current<y_df.shape[0]:
    currentDate = currentDate + AddDay
    cdnp[0][0] = currentDate
    CompleteDates = np.hstack((CompleteDates,cdnp))
    dateInTable = np.datetime64(dates[current])
    
    if dateInTable != currentDate:
        CompleteTable = np.vstack((CompleteTable,CompleteTable[-1])) #copies last available line into the table
        
    if dateInTable == currentDate:
        CompleteTable = np.vstack((CompleteTable,y[current,1:])) #adds yield curve corresponding to currentDate
        current = current + 1

#Updating to full table
y = np.hstack((CompleteDates.transpose(), CompleteTable))
dates = np.array([y[:,0]])
y = np.delete(y,0,1) #seperating dates and yields
y = np.array(y,dtype = float)

Fit 4 randomly selected yield curves through Nelson-Siegel family
----------

In [None]:
# SELECT RANDOMLY 4 DAYS TO DISPLAY, CALCULATE THE NELSON-SIEGEL FIT
t_display = np.sort( np.random.choice( y.shape[0], size=4, replace=False ) )
y_now = y[t_display,:]

# OLS fitting of the coefficients
user_ts = DNS_OLS(y_now,matu,user_lambda) 

our_lambda = 0.496 
our_ts = DNS_OLS(y_now,matu,our_lambda) 

DL_lambda = 0.0609
DL_ts = DNS_OLS(y_now,matu,DL_lambda)

In [None]:
# DRAW 4 RANDOMLY CHOOSEN YIELD CURVES, INTERPOLATE WITH VARIOS VALUES OF LAMBDA
# lambda^user .... the lambda defined by the user for exploration
# lambda^ours = 0.496 the lambda used in our analysis, was determined to minimise least squares
# lambda^DL = 0.0609  the value of lambda recommended by Diebold and Li (2006)

# visualise the static NS fit for fixed time
tau_grid = np.linspace(start=0.001, stop=30, num=100)

f, axarr = plt.subplots(4, 1, figsize=(15,15))
for ii in range(4):    
    #f, (ax1, ax2) = plt.subplot(4, 1, ii+1,figsize=(15,15))    
    axarr[ii].plot( tau_grid, DNS_formula( tau_grid, user_ts[ii,:], user_lambda ) )
    axarr[ii].plot( tau_grid, DNS_formula( tau_grid, our_ts[ii,:], our_lambda ) )
    axarr[ii].plot( tau_grid, DNS_formula( tau_grid, DL_ts[ii,:], DL_lambda ) )    
    axarr[ii].scatter(matu,y_now[ii])
    axarr[ii].set_title( dates[0,t_display[ii]].date() )
    axarr[ii].set_ylabel("yield [%]")
    axarr[ii].legend([r'NS with $\lambda^{user}=$'+str(user_lambda),r'NS with $\lambda^{our} = 0.496$',r'NS with $\lambda^{DL} = 0.0609$'])
    if ii == 3:
        axarr[ii].set_xlabel("maturity [years]")
