# Demonstrating the TableOne package

In research papers, it is common for the first table ("Table 1") to display summary statistics of the study data.  

The `tableone` package is used to create this table. 

## Installation

The distribution is hosted on PyPI and directly installable via pip without needing to clone or download this repository. 

To install the package from PyPI, run the following command in your terminal:

``pip install tableone``

## Import libraries

Before using the tableone package, we need to import it. We will also import pandas, a popular package for working with data.

In [1]:
# import libraries
from tableone import TableOne
import pandas as pd

## Load sample data

We begin by loading the data that we would like to summarize into a Pandas DataFrame. 
- Variables are in columns
- Encounters/observations are in rows.

In [2]:
# load sample data into a pandas dataframe
url="https://raw.githubusercontent.com/tompollard/data/master/primary-biliary-cirrhosis/pbc.csv"
data=pd.read_csv(url, index_col='id')

In [3]:
data.head()

Unnamed: 0_level_0,time,status,trt,age,sex,ascites,hepato,spiders,edema,bili,chol,albumin,copper,alk.phos,ast,trig,platelet,protime,stage
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1
1,400,2,1.0,58.765229,f,1.0,1.0,1.0,1.0,14.5,261.0,2.6,156.0,1718.0,137.95,172.0,190.0,12.2,4.0
2,4500,0,1.0,56.44627,f,0.0,1.0,1.0,0.0,1.1,302.0,4.14,54.0,7394.8,113.52,88.0,221.0,10.6,3.0
3,1012,2,1.0,70.072553,m,0.0,0.0,0.0,0.5,1.4,176.0,3.48,210.0,516.0,96.1,55.0,151.0,12.0,4.0
4,1925,2,1.0,54.740589,f,0.0,1.0,1.0,0.5,1.8,244.0,2.54,64.0,6121.8,60.63,92.0,183.0,10.3,4.0
5,1504,1,2.0,38.105407,f,0.0,1.0,1.0,0.0,3.4,279.0,3.53,143.0,671.0,113.15,72.0,136.0,10.9,3.0


## Example 1: Simple summary of data with Table 1

In this example we provide summary statistics across all of the data.

In [4]:
# view the tableone docstring
TableOne??

In [5]:
# create an instance of TableOne with the input arguments
# firstly, with no grouping variable
overall_table = TableOne(data)

In [6]:
# view first 10 rows of tableone
overall_table.tableone.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,isnull,overall
variable,level,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
n,,,,418
age,,,0.0,50.74 (10.45)
albumin,,,0.0,3.5 (0.42)
alk.phos,,,106.0,1982.66 (2140.39)
ascites,0.0,,106.0,288 (92.31)
ascites,1.0,,,24 (7.69)
ast,,,106.0,122.56 (56.7)
bili,,,0.0,3.22 (4.41)
chol,,,134.0,369.51 (231.94)
copper,,,108.0,97.65 (85.61)


**Summary of the table**:
- the first row ('`n`') displays a count of the encounters/observations in the input data.
- the '`isnull`' column displays a count of the null values for the particular variable.
- if categorical variables are not defined in the arguments, they are detected automatically.
- continuous variables (e.g. '`age`') are summarized by '`mean (std)`'.
- categorical variables (e.g. '`ascites`') are summarized by '`n (% of non-null values)`'.

## Example 2: Table 1 without stratification

In this example we provide summary statistics across all of the data, specifying columns, categorical variables, and non-normal variables.

In [7]:
# list of columns to be included in tableone
columns = ['time','age','bili','chol','albumin','copper',
           'alk.phos','ast','trig','platelet','protime',
           'status', 'ascites', 'hepato', 'spiders', 'edema', 
           'stage', 'sex', 'trt']

# list of columns containing categorical variables
categorical = ['status', 'ascites', 'hepato', 'spiders', 'edema', 
           'stage', 'sex']

# optionally, a list of non-normal variables
nonnormal = ['bili']

In [8]:
# create an instance of TableOne with the input arguments
# firstly, with no grouping variable
overall_table = TableOne(data, columns, categorical, nonnormal=nonnormal)

In [9]:
# view first 10 rows of tableone
overall_table.tableone.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,isnull,overall
variable,level,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
n,,,,418
age,,,0.0,50.74 (10.45)
albumin,,,0.0,3.5 (0.42)
alk.phos,,,106.0,1982.66 (2140.39)
ascites,0.0,,106.0,288 (92.31)
ascites,1.0,,,24 (7.69)
ast,,,106.0,122.56 (56.7)
bili,,,0.0,"1.4 [0.8, 3.4]"
chol,,,134.0,369.51 (231.94)
copper,,,108.0,97.65 (85.61)


**Summary of the table**:

- as before, except that the variables are explicitly defined in the input arguments.
- continuous variables are now summarized by '`median [IQR]`' if specified as `nonnormal`.

## Example 3: Table 1 with stratification

In this example, we group data across a categorical variable.

In [10]:
# optionally, a categorical variable for stratification
groupby = 'trt'

In [11]:
# create an instance of TableOne with the input arguments
grouped_table = TableOne(data, columns, categorical, groupby, nonnormal)

In [12]:
# view first 10 rows of tableone
grouped_table.tableone.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,isnull,Unnamed: 3_level_0,trt=1.0,trt=2.0
variable,level,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
n,,,,158,154
age,,0.0,,51.42 (11.01),48.58 (9.96)
albumin,,0.0,,3.52 (0.44),3.52 (0.4)
alk.phos,,106.0,,2021.3 (2183.44),1943.01 (2101.69)
ascites,0.0,106.0,,144 (91.14),144 (93.51)
ascites,1.0,,,14 (8.86),10 (6.49)
ast,,106.0,,120.21 (54.52),124.97 (58.93)
bili,,0.0,,"1.4 [0.8, 3.2]","1.3 [0.72, 3.6]"
chol,,134.0,,365.01 (209.54),373.88 (252.48)
copper,,108.0,,97.64 (90.59),97.65 (80.49)


**Summary of the table**:
- data is now summarized across the groups specified in the `groupby` argument.
- as before, the summary statistics are either '`mean (std)`', '`median [IQR]`', or '`n (% of non-null values)`'.

## Compute p values

We can run a test to compute p values by setting the ``pval`` argument to True.

In [13]:
# create grouped_table with p values
grouped_table = TableOne(data, columns, categorical, groupby, nonnormal, pval = True)



In [14]:
# view first 10 rows of tableone
grouped_table.tableone.head(10)

Unnamed: 0_level_0,Unnamed: 1_level_0,isnull,Unnamed: 3_level_0,trt=1.0,trt=2.0,pval,ptest
variable,level,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
n,,,,158,154,,
age,,0.0,,51.42 (11.01),48.58 (9.96),0.018,One_way_ANOVA
albumin,,0.0,,3.52 (0.44),3.52 (0.4),0.874,One_way_ANOVA
alk.phos,,106.0,,2021.3 (2183.44),1943.01 (2101.69),0.747,One_way_ANOVA
ascites,0.0,106.0,,144 (91.14),144 (93.51),0.567,Chi-squared
ascites,1.0,,,14 (8.86),10 (6.49),,
ast,,106.0,,120.21 (54.52),124.97 (58.93),0.46,One_way_ANOVA
bili,,0.0,,"1.4 [0.8, 3.2]","1.3 [0.72, 3.6]",0.842,Kruskal-Wallis
chol,,134.0,,365.01 (209.54),373.88 (252.48),0.748,One_way_ANOVA
copper,,108.0,,97.64 (90.59),97.65 (80.49),0.999,One_way_ANOVA


**Summary of the table**:
- the '`ptest`' column displays the name of the test used to compare the groups.
- the '`pval`' column displays the p value generated by the test in the '`ptest`' column, to 3 decimal places.

## Export the table to file (LaTeX, Markdown, CSV, etc)

Tables can be exported to file in various formats, including:
- LaTeX
- Markdown
- CSV
- HTML

To export the table, call the relevant `to_<format>()` method on the DataFrame.

In [15]:
# Save table to LaTeX
fn = 'tableone.tex'
grouped_table.tableone.to_latex(fn)

In [16]:
# Save table to HTML
fn2 = 'tableone.html'
grouped_table.tableone.to_html(fn2)