# Making Latex tables

The ``blue`` module doesn't (currently) offer any built in methods to produce latex tables.
One of the reasons for this is that ``pandas`` has the `df.to_latex()` method which itself is quite powerful.
This notebook demonstrates some of the ways you can use this to make pretty latex tables.

In [1]:
import pandas as pd

To begin with we will not perform a BLUE combination, instead we will just load some input data into a pandas DataFrame and use that to make some latex tables. We will use the ATLAS Run-1 top mass results for this example.

In [2]:
dilep = pd.read_csv('data/atlas_top_mass_dilep8.csv', index_col='Name')
ljets = pd.read_csv('data/atlas_top_mass_ljets8.csv', index_col='Name')
df = pd.concat([dilep, ljets])
df.loc['dilep_8'].loc['FakeShape'] = 0.07
df.loc['dilep_8'].loc['btagging'] = 0.04
df.T

Name,ljets_7,dilep_7,dilep_8,ljets_8
Result,172.33,173.79,172.99,172.08
Stats,0.75,0.54,0.41,0.39
Method,0.11,0.09,0.05,0.13
SignalMC,0.22,0.26,0.09,0.16
Hadronisation,0.18,0.53,0.22,0.15
IFSR,0.32,0.47,0.23,0.08
UE,0.15,0.05,0.1,0.08
CR,0.11,0.14,0.03,0.19
PDF,0.25,0.11,0.05,0.09
BackNorm,0.1,0.04,0.03,0.08


Producing a latex version of this table is as simple as calling ``to_latex()``. ``pandas`` uses the ``booktabs`` latex package to make your tables look pretty so make sure you have this package included in your latex preamble.

In [3]:
print(df.T.to_latex())

\begin{tabular}{lrrrr}
\toprule
Name &  ljets\_7 &  dilep\_7 &  dilep\_8 &  ljets\_8 \\
\midrule
Result        &   172.33 &   173.79 &   172.99 &   172.08 \\
Stats         &     0.75 &     0.54 &     0.41 &     0.39 \\
Method        &     0.11 &     0.09 &     0.05 &     0.13 \\
SignalMC      &     0.22 &     0.26 &     0.09 &     0.16 \\
Hadronisation &     0.18 &     0.53 &     0.22 &     0.15 \\
IFSR          &     0.32 &     0.47 &     0.23 &     0.08 \\
UE            &     0.15 &     0.05 &     0.10 &     0.08 \\
CR            &     0.11 &     0.14 &     0.03 &     0.19 \\
PDF           &     0.25 &     0.11 &     0.05 &     0.09 \\
BackNorm      &     0.10 &     0.04 &     0.03 &     0.08 \\
WZShape       &     0.29 &     0.00 &     0.00 &     0.11 \\
FakeShape     &     0.05 &     0.01 &     0.07 &     0.00 \\
JES           &     0.58 &     0.75 &     0.54 &     0.54 \\
btolightJES   &     0.06 &     0.68 &     0.30 &     0.03 \\
JER           &     0.22 &     0.19 &     0.09 & 

This is a good start, and you may be happy to take this and make some minor edits directly on the output, but we can go further and make this look even nicer. Currently the column that contains the systematic names and the result is labelled "Name", this isn't really a good name so we will change it. Since the column is a mixture of both the result and the uncertainties it might be better to not have a name at all. We can get rid of this by passing ``index_names=False`` to the ``to_latex()`` function.

In [4]:
print(df.T.to_latex(index_names=False))

\begin{tabular}{lrrrr}
\toprule
{} &  ljets\_7 &  dilep\_7 &  dilep\_8 &  ljets\_8 \\
\midrule
Result        &   172.33 &   173.79 &   172.99 &   172.08 \\
Stats         &     0.75 &     0.54 &     0.41 &     0.39 \\
Method        &     0.11 &     0.09 &     0.05 &     0.13 \\
SignalMC      &     0.22 &     0.26 &     0.09 &     0.16 \\
Hadronisation &     0.18 &     0.53 &     0.22 &     0.15 \\
IFSR          &     0.32 &     0.47 &     0.23 &     0.08 \\
UE            &     0.15 &     0.05 &     0.10 &     0.08 \\
CR            &     0.11 &     0.14 &     0.03 &     0.19 \\
PDF           &     0.25 &     0.11 &     0.05 &     0.09 \\
BackNorm      &     0.10 &     0.04 &     0.03 &     0.08 \\
WZShape       &     0.29 &     0.00 &     0.00 &     0.11 \\
FakeShape     &     0.05 &     0.01 &     0.07 &     0.00 \\
JES           &     0.58 &     0.75 &     0.54 &     0.54 \\
btolightJES   &     0.06 &     0.68 &     0.30 &     0.03 \\
JER           &     0.22 &     0.19 &     0.09 &   

Another thing that we might want to do is tidy up the "experiment names", currently "ljets_7", "dilep_7", etc aren't so descriptive. We will use a multi-index to make this look nicer.

In [5]:
new_index = pd.MultiIndex.from_tuples(
  [('7 TeV', '$\ell+$jets'), ('7 TeV', 'dilepton'), 
   ('8 TeV', 'dilepton'), ('8 TeV', '$\ell+$jets')],
    names=['CoM energy', 'Channel']
)

We can now replace our old index with this and use the ``escape=False`` option to avoid pandas trying to be clever with our mathtext.

In [6]:
print(df.set_index(new_index).T.to_latex(escape=False, index_names=False))

\begin{tabular}{lrrrr}
\toprule
{} & \multicolumn{2}{l}{7 TeV} & \multicolumn{2}{l}{8 TeV} \\
{} & $\ell+$jets & dilepton & dilepton & $\ell+$jets \\
\midrule
Result        &      172.33 &   173.79 &   172.99 &      172.08 \\
Stats         &        0.75 &     0.54 &     0.41 &        0.39 \\
Method        &        0.11 &     0.09 &     0.05 &        0.13 \\
SignalMC      &        0.22 &     0.26 &     0.09 &        0.16 \\
Hadronisation &        0.18 &     0.53 &     0.22 &        0.15 \\
IFSR          &        0.32 &     0.47 &     0.23 &        0.08 \\
UE            &        0.15 &     0.05 &     0.10 &        0.08 \\
CR            &        0.11 &     0.14 &     0.03 &        0.19 \\
PDF           &        0.25 &     0.11 &     0.05 &        0.09 \\
BackNorm      &        0.10 &     0.04 &     0.03 &        0.08 \\
WZShape       &        0.29 &     0.00 &     0.00 &        0.11 \\
FakeShape     &        0.05 &     0.01 &     0.07 &        0.00 \\
JES           &        0.58 &     0.7

This looks nice, but I personally prefer the multicolumn labels to be centered. We can use the `multicolumn_format` option to achieve this.

In [7]:
print(df
      .set_index(new_index)
      .T
      .to_latex(escape=False, index_names=False, multicolumn_format='c')
     )       

\begin{tabular}{lrrrr}
\toprule
{} & \multicolumn{2}{c}{7 TeV} & \multicolumn{2}{c}{8 TeV} \\
{} & $\ell+$jets & dilepton & dilepton & $\ell+$jets \\
\midrule
Result        &      172.33 &   173.79 &   172.99 &      172.08 \\
Stats         &        0.75 &     0.54 &     0.41 &        0.39 \\
Method        &        0.11 &     0.09 &     0.05 &        0.13 \\
SignalMC      &        0.22 &     0.26 &     0.09 &        0.16 \\
Hadronisation &        0.18 &     0.53 &     0.22 &        0.15 \\
IFSR          &        0.32 &     0.47 &     0.23 &        0.08 \\
UE            &        0.15 &     0.05 &     0.10 &        0.08 \\
CR            &        0.11 &     0.14 &     0.03 &        0.19 \\
PDF           &        0.25 &     0.11 &     0.05 &        0.09 \\
BackNorm      &        0.10 &     0.04 &     0.03 &        0.08 \\
WZShape       &        0.29 &     0.00 &     0.00 &        0.11 \\
FakeShape     &        0.05 &     0.01 &     0.07 &        0.00 \\
JES           &        0.58 &     0.7

Much better! One thing that is sometimes done in papers with tables like this is replace values of 0.00 with something else, such as a '-'. We can do this by replacing values of 0 with `numpy.nan` and then using the `na_rep` keyword argument in `to_latex()` to display ``NaNs`` as -.

In [8]:
import numpy as np

In [9]:
print(df
      .set_index(new_index)
      .replace(0, np.nan)
      .T
      .to_latex(escape=False, 
                index_names=False, 
                multicolumn_format='c',
                na_rep='-')
     )    

\begin{tabular}{lrrrr}
\toprule
{} & \multicolumn{2}{c}{7 TeV} & \multicolumn{2}{c}{8 TeV} \\
{} & $\ell+$jets & dilepton & dilepton & $\ell+$jets \\
\midrule
Result        &      172.33 &   173.79 &   172.99 &      172.08 \\
Stats         &        0.75 &     0.54 &     0.41 &        0.39 \\
Method        &        0.11 &     0.09 &     0.05 &        0.13 \\
SignalMC      &        0.22 &     0.26 &     0.09 &        0.16 \\
Hadronisation &        0.18 &     0.53 &     0.22 &        0.15 \\
IFSR          &        0.32 &     0.47 &     0.23 &        0.08 \\
UE            &        0.15 &     0.05 &     0.10 &        0.08 \\
CR            &        0.11 &     0.14 &     0.03 &        0.19 \\
PDF           &        0.25 &     0.11 &     0.05 &        0.09 \\
BackNorm      &        0.10 &     0.04 &     0.03 &        0.08 \\
WZShape       &        0.29 &        - &        - &        0.11 \\
FakeShape     &        0.05 &     0.01 &     0.07 &           - \\
JES           &        0.58 &     0.7

This is now a nice looking table in my opinion, of course, we could improve this further by making the uncertainty names nicer, e.g.

In [10]:
print(df
      .set_index(new_index)
      .replace(0, np.nan)
      .rename(columns={
          'Stats': 'Statistics',
          'UE': 'Underlying rvent',
          'CR': 'Colour reconnection',
          'btolightJES': '$b \to$ light JES'})
      .T
      .to_latex(escape=False, 
                index_names=False, 
                multicolumn_format='c',
                na_rep='-')
     )    

\begin{tabular}{lrrrr}
\toprule
{} & \multicolumn{2}{c}{7 TeV} & \multicolumn{2}{c}{8 TeV} \\
{} & $\ell+$jets & dilepton & dilepton & $\ell+$jets \\
\midrule
Result              &      172.33 &   173.79 &   172.99 &      172.08 \\
Statistics          &        0.75 &     0.54 &     0.41 &        0.39 \\
Method              &        0.11 &     0.09 &     0.05 &        0.13 \\
SignalMC            &        0.22 &     0.26 &     0.09 &        0.16 \\
Hadronisation       &        0.18 &     0.53 &     0.22 &        0.15 \\
IFSR                &        0.32 &     0.47 &     0.23 &        0.08 \\
Underlying rvent    &        0.15 &     0.05 &     0.10 &        0.08 \\
Colour reconnection &        0.11 &     0.14 &     0.03 &        0.19 \\
PDF                 &        0.25 &     0.11 &     0.05 &        0.09 \\
BackNorm            &        0.10 &     0.04 &     0.03 &        0.08 \\
WZShape             &        0.29 &        - &        - &        0.11 \\
FakeShape           &        0.05 &   

## Adding the combination result

We have nicely formatted our input data, but let's say that we now want to perform a combination and add a new column with the results of that to our table.

(Do the combination elsewhere and save the dateframe to a csv)

In [11]:
def my_array_helper(a, b, c, d, e, f):
    """This function helps us create correlations
    directly from the numbers in the paper"""
    out = np.zeros((4, 4))
    out[0, 1] = a
    out[0, 2] = b
    out[0, 3] = d
    # numbers no longer in order due to reading from two papers
    out[1, 2] = c
    out[1, 3] = e
    out[2, 3] = f
    out += out.T
    np.fill_diagonal(out, 1)
    return out

In [12]:
correlations = {
    'Stats': 0,
    'Method': 0,
    'SignalMC': 1,
    'Hadronisation': my_array_helper(1, 1, 1, -1, -1, -1),
    'IFSR': my_array_helper(-1, -1, 1, -1, 1, 1),
    'UE': my_array_helper(-1, -1, 1, -1, 1, 1),
    'CR': my_array_helper(-1, -1, 1, 1, -1, -1),
    'PDF': my_array_helper(0.57, -0.29, 0.03, 0.72, 0.72, -0.48),
    'BackNorm': my_array_helper(1, 0.23, 0.23, -0.74, -0.77, -0.06),
    'WZShape': 0,
    'FakeShape': my_array_helper(0.23, 0.20, -0.08, 0, 0, 0),
    'JES': my_array_helper(-0.23, 0.06, 0.35, -0.29, 0.18, -0.54),
    'btolightJES': 1,
    'JER': my_array_helper(-1, 0, 0, 0, 0, 0.22),
    'JetRecoEff': 1,
    'JVF': my_array_helper(-1, 1, -1, 1, -1, 1),
    'btagging': my_array_helper(-0.77, 0, 0, 0, 0, -0.23),
    'leptons': my_array_helper(-0.34, -0.52, 0.96, -0.17, -0.08, 0.11),
    'Etmiss': my_array_helper(-0.15, 0.25, -0.24, 0.22, -0.12, 0.97),
    'Pileup': 0,
}

In [13]:
from blue import Blue

In [14]:
combination = Blue(df, correlations)
combination.combined_result

172.49914976824516

In [15]:
x = combination.combined_uncertainties
x['Result'] = combination.combined_result

In [16]:
x = pd.Series(x, name='Combination')

In [17]:
comb_df = df.append(x)
comb_df.T

Name,ljets_7,dilep_7,dilep_8,ljets_8,Combination
Result,172.33,173.79,172.99,172.08,172.49915
Stats,0.75,0.54,0.41,0.39,0.27303
Method,0.11,0.09,0.05,0.13,0.059551
SignalMC,0.22,0.26,0.09,0.16,0.137312
Hadronisation,0.18,0.53,0.22,0.15,0.060112
IFSR,0.32,0.47,0.23,0.08,0.071007
UE,0.15,0.05,0.1,0.08,0.049766
CR,0.11,0.14,0.03,0.19,0.083806
PDF,0.25,0.11,0.05,0.09,0.065972
BackNorm,0.1,0.04,0.03,0.08,0.027225


In [18]:
# There seems to be a bug in pandas to_latex() that doesn't recognise 
# an empty multiindex, we will call the first level of the combination
# index x for now

final_index = pd.MultiIndex.from_tuples(
  [('7 TeV', '$\ell+$jets'), ('7 TeV', 'dilepton'), 
   ('8 TeV', 'dilepton'), ('8 TeV', '$\ell+$jets'),
  ('{}', 'Combination')],
    names=['CoM energy', 'Channel']
)


print(comb_df
      .set_index(final_index)
      .replace(0, np.nan)
      .rename(columns={
          'Stats': 'Statistics',
          'UE': 'Underlying rvent',
          'CR': 'Colour reconnection',
          'btolightJES': '$b \to$ light JES'})
      .T
      .to_latex(escape=False, 
                index_names=False, 
                multicolumn_format='c',
                na_rep='-')
     )    

\begin{tabular}{lrrrrr}
\toprule
{} & \multicolumn{2}{c}{7 TeV} & \multicolumn{2}{c}{8 TeV} &          {} \\
{} & $\ell+$jets & dilepton & dilepton & $\ell+$jets & Combination \\
\midrule
Result              &      172.33 &   173.79 &   172.99 &      172.08 &  172.499150 \\
Statistics          &        0.75 &     0.54 &     0.41 &        0.39 &    0.273030 \\
Method              &        0.11 &     0.09 &     0.05 &        0.13 &    0.059551 \\
SignalMC            &        0.22 &     0.26 &     0.09 &        0.16 &    0.137312 \\
Hadronisation       &        0.18 &     0.53 &     0.22 &        0.15 &    0.060112 \\
IFSR                &        0.32 &     0.47 &     0.23 &        0.08 &    0.071007 \\
Underlying rvent    &        0.15 &     0.05 &     0.10 &        0.08 &    0.049766 \\
Colour reconnection &        0.11 &     0.14 &     0.03 &        0.19 &    0.083806 \\
PDF                 &        0.25 &     0.11 &     0.05 &        0.09 &    0.065972 \\
BackNorm            &        

So we have added the combination column. There are a couple of ways to fix the precision of the final column.

1) Use the `df.round` function to round the dataframe.
2) Use the latex `siunitx` package to round for us (allows us to adjust the rounding directly in the latex code)

Here is option 1

In [19]:
print(comb_df
      .round(2)  # here we use the DataFrame.round() function
      .set_index(final_index)
      .replace(0, np.nan)
      .rename(columns={
          'Stats': 'Statistics',
          'UE': 'Underlying rvent',
          'CR': 'Colour reconnection',
          'btolightJES': '$b \to$ light JES'})
      .T
      .to_latex(escape=False, 
                index_names=False, 
                multicolumn_format='c',
                na_rep='-')
     )    

\begin{tabular}{lrrrrr}
\toprule
{} & \multicolumn{2}{c}{7 TeV} & \multicolumn{2}{c}{8 TeV} &          {} \\
{} & $\ell+$jets & dilepton & dilepton & $\ell+$jets & Combination \\
\midrule
Result              &      172.33 &   173.79 &   172.99 &      172.08 &      172.50 \\
Statistics          &        0.75 &     0.54 &     0.41 &        0.39 &        0.27 \\
Method              &        0.11 &     0.09 &     0.05 &        0.13 &        0.06 \\
SignalMC            &        0.22 &     0.26 &     0.09 &        0.16 &        0.14 \\
Hadronisation       &        0.18 &     0.53 &     0.22 &        0.15 &        0.06 \\
IFSR                &        0.32 &     0.47 &     0.23 &        0.08 &        0.07 \\
Underlying rvent    &        0.15 &     0.05 &     0.10 &        0.08 &        0.05 \\
Colour reconnection &        0.11 &     0.14 &     0.03 &        0.19 &        0.08 \\
PDF                 &        0.25 &     0.11 &     0.05 &        0.09 &        0.07 \\
BackNorm            &        

And here is option 2. Here we use the `column_format` option in `to_latex` to write out `S` columns, which is the column format that `siunitx` defines. The siunitx package can make really nice looking tables in my opinion but requires some more work. Anything in the S column that is not a number needs to be escaped by '{}'.

In [20]:
siunitx_index = pd.MultiIndex.from_tuples(
  [('7 TeV', '{$\ell+$jets}'), ('7 TeV', '{dilepton}'), 
   ('8 TeV', '{dilepton}'), ('8 TeV', '{$\ell+$jets}'),
  ('{}', '{Combination}')],
    names=['CoM energy', 'Channel']
)

print('\sisetup{round-mode=places, round-precision=2}')
print(comb_df
      .set_index(siunitx_index)
      .replace(0, np.nan)
      .rename(columns={
          'Stats': 'Statistics',
          'UE': 'Underlying rvent',
          'CR': 'Colour reconnection',
          'btolightJES': '$b \to$ light JES'})
      .T
      .to_latex(escape=False, 
                index_names=False, 
                multicolumn_format='c',
                na_rep='{-}', column_format='lSSSSS')
     )    

\sisetup{round-mode=places, round-precision=2}
\begin{tabular}{lSSSSS}
\toprule
{} & \multicolumn{2}{c}{7 TeV} & \multicolumn{2}{c}{8 TeV} &            {} \\
{} & {$\ell+$jets} & {dilepton} & {dilepton} & {$\ell+$jets} & {Combination} \\
\midrule
Result              &        172.33 &     173.79 &     172.99 &        172.08 &    172.499150 \\
Statistics          &          0.75 &       0.54 &       0.41 &          0.39 &      0.273030 \\
Method              &          0.11 &       0.09 &       0.05 &          0.13 &      0.059551 \\
SignalMC            &          0.22 &       0.26 &       0.09 &          0.16 &      0.137312 \\
Hadronisation       &          0.18 &       0.53 &       0.22 &          0.15 &      0.060112 \\
IFSR                &          0.32 &       0.47 &       0.23 &          0.08 &      0.071007 \\
Underlying rvent    &          0.15 &       0.05 &       0.10 &          0.08 &      0.049766 \\
Colour reconnection &          0.11 &       0.14 &       0.03 &          0

Et voila! A nice looking table ready to be put into a paper!