# Tabulating Data for LaTeX

This script gives a function for taking data in Python and generating the TeX code for a LaTeX table.

Additionally, it gives a slightly different function for generating a similar table from a Pandas dataframe.

### Preamble

In [1]:
import pandas as pd

# 1 - `latextable`

This function takes data in the form(s) specified below and generates the TeX code for a LaTeX table.

### Required Arguments

- **`data`**

    `data` is, unsurprisingly, the data for the table. This should be a list of lists (or numpy array of numpy arrays, or generally any twice iterable object). The first (sub)list should be the first row of data, the second should be the second row of data, and so forth...
    

- **`rownames`** 

    `rownames` should be a list or list-like object of names for each row. These can be numerical or string objects (the function makes everything into a string).

    If the top-left cell of the table is to be populated, this should be the first element of the list and `top_left_cell_empty` should be set to `False`. The second element will then be the name of the first row, and so forth...
    

- **`colnames`**

    `colnames` should be a list or list-like object of names for each column. Again, these can be numerical or string objects.
    

### Additional Arguments

#### Decimal Places

`decimal_places` allows one to specify the number of decimal places for each datum. 
    
A couple of things...
    
   - This feature adds zeros to integers, as well as to floats with too few decimal places. If `decimal_places = 2` is argued, then `1` becomes `1.00` and `0.3` becomes `0.30`.
   - This feature works even when some data are non-numerical; the function distinguishes between numerical and non-numerical data before attempting to round.
   - This feature has no default; if nothing is specified the data will appear as given.
    
    
#### Populating the Top-left Cell

For `top_left_cell_empty`, see the above description of `rownames`. The default value is `True`.

#### Math Environments

By setting `rownames_math_env`, `colnames_math_env`, and/or `data_math_env` to be `True`, each cell in the respective part(s) of table is contained within a math environment.

Two things...

- For the function to print the LaTeX escape character `\`, one must include `\\` in the appropriate Python string. The function will print `"\\frac{\\mu}{n}"` as `\frac{\mu}{n}`. 
- The top-left cell, if populated, comes under `rownames_math_env`.

#### Cell Alignments

- `default_alignment`

    `default_alignment` is the alignment for the whole table, set to `"c"` by default. Naturally, `"l"` and `"r"` can also be argued.
    
    
- `rownames_alignment`, `colnames_alignment`, and `top_left_cell_alignment`

    The row names, column names and top left cell can be given different alignments from the rest of the table by specifying`"l"`, `"c"`, or `"r"` for each/any. 
    
    If nothing is specified, `rownames_alignment`, `colnames_alignment`, and/or `top_left_cell_alignment` are the same as `default_alignment`. 

In [2]:
def latextable(data, rownames, colnames, decimal_places = None, top_left_cell_empty = True, rownames_math_env = False, 
               colnames_math_env = False, data_math_env = False, default_alignment = 'c', rownames_alignment = None, 
               colnames_alignment = None, top_left_cell_alignment = None):
    
    ## SORTING TOP LEFT CELL
    
    if top_left_cell_empty == True:
        
        # If executed, this just adds an empty string to the rownames list.
        
        rownames = [""] + rownames
    
    ## CHECKING COLUMN LENGTHS
    
    if len(data) != (len(rownames) - 1):
        
        print("Length of columns of data does not match number of rows. Note that the top-left cell of the table is included in rownames, so rownames should have one more element than the number of elements in each column of data.")
        
        return
    
    ## CHECKING ROW LENGTHS
    
    rowlengths = set()
              
    for row in data:
              
        rowlengths.add(len(row))
        
    if rowlengths != set([len(colnames)]):
        
        # This will execute if rowlegnths is not a one-element set containing the number of columns
        
        print("At least one row of data does not match number of columns.")
        
        return
    
    ## CHECKING ALIGNMENTS
    
    if colnames_alignment not in [None,'l','c','r']:
        
        print("colnames_alignment must be 'l', 'c', or 'r'.")
        
        return
    
    if rownames_alignment not in [None,'l','c','r']:
        
        print("rownames_alignment must be 'l', 'c', or 'r'.")
        
        return
    
    if top_left_cell_alignment not in [None,'l','c','r']:
        
        print("top_left_cell_alignment must be 'l', 'c', or 'r'.")
        
        return
    
    ## STARTING THE TABLE
    
    master_string = "\\begin{tabular}{|" + (default_alignment + "|")*(len(colnames) + 1) + "} \n"
              
    master_string = master_string + "\\hline \n"
    
    ## TOP LEFT CELL
    
    # Below, note that rownames_math_env incorporates the top left cell. I figure if these differ it won't take much effort to
    # add/remove dollar signs from the top left cell manually; specifying the math environment of the top left cell seemed
    # like an unnecessary complication as a feature.
    
    if rownames_math_env == True and top_left_cell_alignment != None:
    
        master_string = master_string + "\\multicolumn{1}{|" + top_left_cell_alignment + "|}{" + "$" + str(rownames[0]) + "$} & "
    
    elif rownames_math_env == False and top_left_cell_alignment != None:
        
        master_string = master_string + "\\multicolumn{1}{|" + top_left_cell_alignment + "|}{" + str(rownames[0]) + "} & "
    
    elif colnames_math_env == True and top_left_cell_alignment == None:
        
        master_string = master_string + "$" + str(rownames[0]) + "$ & "
    
    else:
        
        master_string = master_string + str(rownames[0]) + " & "
    
    ## COLUMN NAMES
    
    ncol = len(colnames)
    
    for i in range(0,ncol):
        
        if i < (ncol - 1):
            
            # This executes for all but the rightmost column.
            
            if colnames_math_env == True and colnames_alignment != None:
            
                master_string = master_string + "\\multicolumn{1}{" + colnames_alignment + "|}{" + "$" + str(colnames[i]) + "$} & "
                
            elif colnames_math_env == False and colnames_alignment != None:
                
                master_string = master_string + "\\multicolumn{1}{" + colnames_alignment + "|}{" + str(colnames[i]) + "} & "
            
            elif colnames_math_env == True and colnames_alignment == None:
                
                master_string = master_string + "$" + str(colnames[i]) + "$ & "
                
            else:
                
                master_string = master_string + str(colnames[i]) + " & "
            
        else:
            
            # This executes for the rightmost column, where a newline has to be started.
            
            if colnames_math_env == True and colnames_alignment != None:
            
                master_string = master_string + "\\multicolumn{1}{" + colnames_alignment + "|}{" + "$" + str(colnames[i]) + "$} \\\\ \n"
                
            elif colnames_math_env == False and colnames_alignment != None:
                
                master_string = master_string + "\\multicolumn{1}{" + colnames_alignment + "|}{" + str(colnames[i]) + "} \\\\ \n"
            
            elif colnames_math_env == True and colnames_alignment == None:
                
                master_string = master_string + "$" + str(colnames[i]) + "$ \\\\ \n"
                
            else:
                
                master_string = master_string + str(colnames[i]) + " \\\\ \n"
    
    master_string = master_string + "\\hline \n"
    
    ## ROWNAMES AND DATA
    
    for i in range(0,len(data)):
        
        # This loop runs once for each row of data.
        
        # The below is for the row name
        
        if rownames_math_env == True and rownames_alignment != None:
            
            master_string = master_string + "\\multicolumn{1}{|" + rownames_alignment + "|}{" + "$" + str(rownames[i+1]) + "$} & "
        
        elif rownames_math_env == False and rownames_alignment != None:
            
            master_string = master_string + "\\multicolumn{1}{|" + rownames_alignment + "|}{" + str(rownames[i+1]) + "} & "
        
        elif rownames_math_env == True and rownames_alignment == None:
        
            master_string = master_string + "$" + str(rownames[i+1]) + "$ & "
        
        else:
            
            master_string = master_string + str(rownames[i+1]) + " & "
        
        # The below runs for the data
        
        for j in range(0,ncol):
            
            if decimal_places != None and data_math_env == True:
                
                # If executed this gives each datum the right number of decimal places. Note that it can also handle
                # non-numerical data.
                
                try:
                    
                    # This'll only run for numerical data
                    
                    add = "$" + str(round(data[i][j], decimal_places))
                
                    if "." in add:
                        
                        # Executes for non-integers
                        
                        split = add.split(".")
                    
                        after = len(split[1])
                    
                        if after < decimal_places:
                            
                            # If executed this adds some zeros to get a float up to the right number of decimal places
                            
                            add = add + "0"*(decimal_places - after) + "$"
                        
                        else:
                            
                            add = add + "$"
                        
                    else:
                        
                        # If executed this adds some decimal places to integers.
                        
                        add = add + "." + "0"*decimal_places + "$"
                
                    if j < (ncol - 1):
                        
                        # All but the rightmost data in the row.
                        
                        master_string = master_string + add + " & "
                
                    else:
                        
                        # The rightmost datum in the row.
                
                        master_string = master_string + add + " \\\\ \n"
            
                except:
                    
                    # Executes for non-numerical data
                    
                    add = "$" + str(data[i][j]) + "$"
            
                    if j < (ncol - 1):
                
                        master_string = master_string + add + " & "
                
                    else:
                
                        master_string = master_string + add + " \\\\ \n"
            
            elif decimal_places != None and data_math_env == False:
                
                # If executed this gives each datum the right number of decimal places. Note that it can also handle
                # non-numerical data.
                
                try:
                    
                    # This'll only run for numerical data
                    
                    add = str(round(data[i][j], decimal_places))
                
                    if "." in add:
                        
                        # Executes for non-integers
                
                        split = add.split(".")
                    
                        after = len(split[1])
                    
                        if after < decimal_places:
                            
                            # If executed this adds some zeros to get a float up to the right number of decimal places
                        
                            add = add + "0"*(decimal_places - after)
                        
                    else:
                        
                        # If executed this adds some decimal places to integers.
                    
                        add = add + "." + "0"*decimal_places
                
                    if j < (ncol - 1):
                        
                        # All but the rightmost data in the row.
                
                        master_string = master_string + add + " & "
                
                    else:
                        
                        # The rightmost datum in the row.
                
                        master_string = master_string + add + " \\\\ \n"
            
                except:
                    
                    # Executes for non-numerical data
                    
                    add = str(data[i][j])
            
                    if j < (ncol - 1):
                
                        master_string = master_string + add + " & "
                
                    else:
                
                        master_string = master_string + add + " \\\\ \n"
            
            elif decimal_places == None and data_math_env == True:
            
                add = "$" + str(data[i][j]) + "$"
                
                if j < (ncol - 1):
                    
                    # All but the rightmost data in the row.
                    
                    master_string = master_string + add + " & "
                
                else:
                    
                    # The rightmost datum in the row.
                
                    master_string = master_string + add + " \\\\ \n"
            
            else:
                
                add = str(data[i][j])
                
                if j < (ncol - 1):
                    
                    # All but the rightmost data in the row.
                
                    master_string = master_string + add + " & "
                
                else:
                    
                    # The rightmost datum in the row.
                
                    master_string = master_string + add + " \\\\ \n"
    
    ## TABLE END
    
    master_string = master_string + "\\hline \n\\end{tabular}"
    
    ## PRINT
    
    print(master_string)

# 2 - `latextablefrompandas`

This is the nearly the same function as above; the key difference is that a Pandas dataframe `df` is argued in place of `data`, `rownames` and `colnames`.

Otherwise, the features are almost exactly the same as the function in section 1.

The only thing to note is that if `top_left_cell_empty = False` is specified, the name of the index in the dataframe (`df.index.name`) is used for the top left cell.

In [3]:
def latextablefrompandas(df, decimal_places = None, rownames_math_env = False, colnames_math_env = False, 
                         data_math_env = False, top_left_cell_empty = True, default_alignment = 'c', 
                         colnames_alignment = None, rownames_alignment = None, top_left_cell_alignment = None):
    
    ## GENERATING ROWNAMES, COLNAMES AND DATA
    
    if top_left_cell_empty == True or df.index.name == None:
        
        rownames = [""] + list(df.index)
    
    else:
        
        rownames = [df.index.name] + list(df.index)
    
    colnames = df.columns
    
    data = df.values
    
    ## CHECKING COLUMN LENGTHS
    
    if len(data) != (len(rownames) - 1):
        
        print("Length of columns of data does not match number of rows. Note that the top-left cell of the table is included in rownames, so rownames should have one more element than the number of elements in each column of data.")
        
        return
    
    ## CHECKING ROW LENGTHS
    
    rowlengths = set()
              
    for row in data:
              
        rowlengths.add(len(row))
        
    if rowlengths != set([len(colnames)]):
        
        # This will execute if rowlegnths is not a one-element set containing the number of columns
        
        print("At least one row of data does not match number of columns.")
        
        return
    
    ## CHECKING ALIGNMENTS
    
    if colnames_alignment not in [None,'l','c','r']:
        
        print("colnames_alignment must be 'l', 'c', or 'r'.")
        
        return
    
    if rownames_alignment not in [None,'l','c','r']:
        
        print("rownames_alignment must be 'l', 'c', or 'r'.")
        
        return
    
    if top_left_cell_alignment not in [None,'l','c','r']:
        
        print("top_left_cell_alignment must be 'l', 'c', or 'r'.")
        
        return
    
    ## STARTING THE TABLE
    
    master_string = "\\begin{tabular}{|" + (default_alignment + "|")*(len(colnames) + 1) + "} \n"
              
    master_string = master_string + "\\hline \n"
    
    ## TOP LEFT CELL
    
    # Below, note that rownames_math_env incorporates the top left cell. I figure if these differ it won't take much effort to
    # add/remove dollar signs from the top left cell manually; specifying the math environment of the top left cell seemed
    # like an unnecessary complication as a feature.
    
    if rownames_math_env == True and top_left_cell_alignment != None:
    
        master_string = master_string + "\\multicolumn{1}{|" + top_left_cell_alignment + "|}{" + "$" + str(rownames[0]) + "$} & "
    
    elif rownames_math_env == False and top_left_cell_alignment != None:
        
        master_string = master_string + "\\multicolumn{1}{|" + top_left_cell_alignment + "|}{" + str(rownames[0]) + "} & "
    
    elif colnames_math_env == True and top_left_cell_alignment == None:
        
        master_string = master_string + "$" + str(rownames[0]) + "$ & "
    
    else:
        
        master_string = master_string + str(rownames[0]) + " & "
    
    ## COLUMN NAMES
    
    ncol = len(colnames)
    
    for i in range(0,ncol):
        
        if i < (ncol - 1):
            
            # This executes for all but the rightmost column.
            
            if colnames_math_env == True and colnames_alignment != None:
            
                master_string = master_string + "\\multicolumn{1}{" + colnames_alignment + "|}{" + "$" + str(colnames[i]) + "$} & "
                
            elif colnames_math_env == False and colnames_alignment != None:
                
                master_string = master_string + "\\multicolumn{1}{" + colnames_alignment + "|}{" + str(colnames[i]) + "} & "
            
            elif colnames_math_env == True and colnames_alignment == None:
                
                master_string = master_string + "$" + str(colnames[i]) + "$ & "
                
            else:
                
                master_string = master_string + str(colnames[i]) + " & "
            
        else:
            
            # This executes for the rightmost column, where a newline has to be started.
            
            if colnames_math_env == True and colnames_alignment != None:
            
                master_string = master_string + "\\multicolumn{1}{" + colnames_alignment + "|}{" + "$" + str(colnames[i]) + "$} \\\\ \n"
                
            elif colnames_math_env == False and colnames_alignment != None:
                
                master_string = master_string + "\\multicolumn{1}{" + colnames_alignment + "|}{" + str(colnames[i]) + "} \\\\ \n"
            
            elif colnames_math_env == True and colnames_alignment == None:
                
                master_string = master_string + "$" + str(colnames[i]) + "$ \\\\ \n"
                
            else:
                
                master_string = master_string + str(colnames[i]) + " \\\\ \n"
    
    master_string = master_string + "\\hline \n"
    
    ## ROWNAMES AND DATA
    
    for i in range(0,len(data)):
        
        # This loop runs once for each row of data.
        
        # The below is for the row name
        
        if rownames_math_env == True and rownames_alignment != None:
            
            master_string = master_string + "\\multicolumn{1}{|" + rownames_alignment + "|}{" + "$" + str(rownames[i+1]) + "$} & "
        
        elif rownames_math_env == False and rownames_alignment != None:
            
            master_string = master_string + "\\multicolumn{1}{|" + rownames_alignment + "|}{" + str(rownames[i+1]) + "} & "
        
        elif rownames_math_env == True and rownames_alignment == None:
        
            master_string = master_string + "$" + str(rownames[i+1]) + "$ & "
        
        else:
            
            master_string = master_string + str(rownames[i+1]) + " & "
        
        # The below runs for the data
        
        for j in range(0,ncol):
            
            if decimal_places != None and data_math_env == True:
                
                # If executed this gives each datum the right number of decimal places. Note that it can also handle
                # non-numerical data.
                
                try:
                    
                    # This'll only run for numerical data
                    
                    add = "$" + str(round(data[i][j], decimal_places))
                
                    if "." in add:
                        
                        # Executes for non-integers
                        
                        split = add.split(".")
                    
                        after = len(split[1])
                    
                        if after < decimal_places:
                            
                            # If executed this adds some zeros to get a float up to the right number of decimal places
                            
                            add = add + "0"*(decimal_places - after) + "$"
                        
                        else:
                            
                            add = add + "$"
                        
                    else:
                        
                        # If executed this adds some decimal places to integers.
                        
                        add = add + "." + "0"*decimal_places + "$"
                
                    if j < (ncol - 1):
                        
                        # All but the rightmost data in the row.
                        
                        master_string = master_string + add + " & "
                
                    else:
                        
                        # The rightmost datum in the row.
                
                        master_string = master_string + add + " \\\\ \n"
            
                except:
                    
                    # Executes for non-numerical data
                    
                    add = "$" + str(data[i][j]) + "$"
            
                    if j < (ncol - 1):
                
                        master_string = master_string + add + " & "
                
                    else:
                
                        master_string = master_string + add + " \\\\ \n"
            
            elif decimal_places != None and data_math_env == False:
                
                # If executed this gives each datum the right number of decimal places. Note that it can also handle
                # non-numerical data.
                
                try:
                    
                    # This'll only run for numerical data
                    
                    add = str(round(data[i][j], decimal_places))
                
                    if "." in add:
                        
                        # Executes for non-integers
                
                        split = add.split(".")
                    
                        after = len(split[1])
                    
                        if after < decimal_places:
                            
                            # If executed this adds some zeros to get a float up to the right number of decimal places
                        
                            add = add + "0"*(decimal_places - after)
                        
                    else:
                        
                        # If executed this adds some decimal places to integers.
                    
                        add = add + "." + "0"*decimal_places
                
                    if j < (ncol - 1):
                        
                        # All but the rightmost data in the row.
                
                        master_string = master_string + add + " & "
                
                    else:
                        
                        # The rightmost datum in the row.
                
                        master_string = master_string + add + " \\\\ \n"
            
                except:
                    
                    # Executes for non-numerical data
                    
                    add = str(data[i][j])
            
                    if j < (ncol - 1):
                
                        master_string = master_string + add + " & "
                
                    else:
                
                        master_string = master_string + add + " \\\\ \n"
            
            elif decimal_places == None and data_math_env == True:
            
                add = "$" + str(data[i][j]) + "$"
                
                if j < (ncol - 1):
                    
                    # All but the rightmost data in the row.
                    
                    master_string = master_string + add + " & "
                
                else:
                    
                    # The rightmost datum in the row.
                
                    master_string = master_string + add + " \\\\ \n"
            
            else:
                
                add = str(data[i][j])
                
                if j < (ncol - 1):
                    
                    # All but the rightmost data in the row.
                
                    master_string = master_string + add + " & "
                
                else:
                    
                    # The rightmost datum in the row.
                
                    master_string = master_string + add + " \\\\ \n"
    
    ## TABLE END
    
    master_string = master_string + "\\hline \n\\end{tabular}"
    
    ## PRINT
    
    print(master_string)

# 3 - Examples

The LaTeX generation of these can be found in *table_examples.pdf* in this repository. 

### Example 3.1 - Table Sent to Me

The below is a replication of the table sent to me in the email motivating this repository. I artificially 'unrounded' the data to demonstrate the functionality of the `decimal_places` feature.

#### Data Generation

In [4]:
rows = ["x =", "mean(x)", "\\sigma_x / \\sigma_y", "corr(x,Y)"]

columns = ["Y","I","C","N","K","r"]

data = [[0.512354,0.423874,0.088458,0.336942,1.182005,0.042228],
        ["(2.099)",0.493421,4.018519,0.617801,0.54,0.421052],
       [1,0.87240,0.94878,0.926538,0.106055,0.678699]]

#### Use of Function

In [5]:
latextable(data, rows, columns, decimal_places = 3, rownames_math_env = True, colnames_math_env = True, 
           top_left_cell_empty = False, top_left_cell_alignment = 'r')

\begin{tabular}{|c|c|c|c|c|c|c|} 
\hline 
\multicolumn{1}{|r|}{$x =$} & $Y$ & $I$ & $C$ & $N$ & $K$ & $r$ \\ 
\hline 
$mean(x)$ & 0.512 & 0.424 & 0.088 & 0.337 & 1.182 & 0.042 \\ 
$\sigma_x / \sigma_y$ & (2.099) & 0.493 & 4.019 & 0.618 & 0.540 & 0.421 \\ 
$corr(x,Y)$ & 1.000 & 0.872 & 0.949 & 0.927 & 0.106 & 0.679 \\ 
\hline 
\end{tabular}


### Example 3.2

A multifaceted table.

#### Data Generation

In [11]:
rows = ["Alabama","Alaska","Arizona","Arkansas","California"]

columns = ["GDP per Capita", "Proportion of National GDP (\\%)", "Governor Affiliation"]

data = [["\\$47,735", 1.1, "Republican"],
        ["\\$76,220", 0.3, "Repbulican"],
        ["\\$51,179", 1.7, "Republican"],
        ["\\$44,808", 0.6, "Republican"],
        ["\\$80,563", 14.6, "Democrat"]]

df = pd.DataFrame(data, index = rows, columns = columns)

#### Use of Function

In [12]:
latextablefrompandas(df, rownames_alignment = 'r')

\begin{tabular}{|c|c|c|c|} 
\hline 
 & GDP per Capita & Proportion of National GDP (\%) & Governor Affiliation \\ 
\hline 
\multicolumn{1}{|r|}{Alabama} & \$47,735 & 1.1 & Republican \\ 
\multicolumn{1}{|r|}{Alaska} & \$76,220 & 0.3 & Repbulican \\ 
\multicolumn{1}{|r|}{Arizona} & \$51,179 & 1.7 & Republican \\ 
\multicolumn{1}{|r|}{Arkansas} & \$44,808 & 0.6 & Republican \\ 
\multicolumn{1}{|r|}{California} & \$80,563 & 14.6 & Democrat \\ 
\hline 
\end{tabular}
