Add the statistics of hypothesis testing #135

260147169 · 2022-10-07T16:27:21Z

Add an option 'test_stat' to display statistics of hypothesis testing (default: False). The statistics are already computed. This option is only displaying.

jraffa · 2022-10-11T18:42:51Z

Thanks for the idea and the PR. A couple suggestions:

using the README.md example:


import pandas as pd
data=load_dataset('pn2012')
columns = ['Age', 'SysABP', 'Height', 'Weight', 'ICU', 'death']
categorical = ['ICU', 'death']
groupby = ['death']
nonnormal = ['Age']
labels={'death': 'mortality'}
mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=True,test_stat=True))

Works, but the table could use a little cleanup:

                         Grouped by mortality
                                      Missing           Overall                 0                 1 Test-stat P-Value
n                                                          1000               864               136
Age, median [Q1,Q3]                         0  68.0 [53.0,79.0]  66.0 [52.8,78.0]  75.0 [62.0,83.0]    23.882  <0.001
SysABP, mean (SD)                         291      114.3 (40.2)      115.4 (38.3)      107.6 (49.4)     1.510   0.134
Height, mean (SD)                         475      170.1 (22.1)      170.3 (23.2)      168.5 (11.3)     1.030   0.304
Weight, mean (SD)                         302       82.9 (23.8)       83.0 (23.6)       82.3 (25.4)     0.277   0.782
ICU, n (%)          CCU                     0        162 (16.2)        137 (15.9)         25 (18.4)    20.093  <0.001
                    CSRU                             202 (20.2)        194 (22.5)           8 (5.9)    20.093
                    MICU                             380 (38.0)        318 (36.8)         62 (45.6)    20.093
                    SICU                             256 (25.6)        215 (24.9)         41 (30.1)    20.093
mortality, n (%)    0                       0        864 (86.4)       864 (100.0)                     991.508  <0.001
                    1                                136 (13.6)                         136 (100.0)   991.508

There is some redundancy wrt to the Test-stat column. There should only be one test-stat, as p-value is done.

Changing pval to False breaks it:

 mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=False,test_stat=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jraffa/temp/tableone/tableone/tableone.py", line 424, in __init__
    self.cat_table = self._create_cat_table(data, overall)
  File "/home/jraffa/temp/tableone/tableone/tableone.py", line 1348, in _create_cat_table
    table = table.join(self._htest_table[['Test-stat']])
AttributeError: 'TableOne' object has no attribute '_htest_table'

I also don't think the Fisher test is handled appropriately. There really isn't a test stat for it, so it should be blank, but I believe it reports the Chisq's test statistic and the Fisher p-value:

td = pd.DataFrame({'a':[0,0,0,1]*10 + [1],'b':[1,1,1,1]*10 + [0]})
TableOne(td,columns=['a','b'],categorical=['a','b'],pval=True,groupby="b",test_stat=True)

           Grouped by b
                Missing    Overall          0           1 Test-stat P-Value
n                               41          1          40
a, n (%) 1            0  11 (26.8)  1 (100.0)   10 (25.0)     0.280   0.268
         0               30 (73.2)              30 (75.0)     0.280
b, n (%) 0            0    1 (2.4)  1 (100.0)                 9.744   0.024
         1               40 (97.6)             40 (100.0)     9.744

I think t-test, ANOVA, MW, and KW all have test-stats. @tompollard are there any other tests we should worry about. I don't think the mode test is reported like this, so it should be safe.

260147169 · 2022-10-22T12:55:40Z

Thanks so mush for collaborator's @jraffa and owner's @tompollard help and suggestion.
The update contains the following:

1.After cleaning up redundancy. There will be one test-stat, as p-value.
The code is same as above. The results are as follow:

		Missing	Overall	0	1	Test-stat	P-Value
n			1000	864	136
Age, median [Q1,Q3]		0	68.0 [53.0,79.0]	66.0 [52.8,78.0]	75.0 [62.0,83.0]	23.882	<0.001
SysABP, mean (SD)		291	114.3 (40.2)	115.4 (38.3)	107.6 (49.4)	1.510	0.134
Height, mean (SD)		475	170.1 (22.1)	170.3 (23.2)	168.5 (11.3)	1.030	0.304
Weight, mean (SD)		302	82.9 (23.8)	83.0 (23.6)	82.3 (25.4)	0.277	0.782
ICU, n (%)	CCU	0	162 (16.2)	137 (15.9)	25 (18.4)	20.093	<0.001
	CSRU		202 (20.2)	194 (22.5)	8 (5.9)
	MICU		380 (38.0)	318 (36.8)	62 (45.6)
	SICU		256 (25.6)	215 (24.9)	41 (30.1)
mortality, n (%)	0	0	864 (86.4)	864 (100.0)		991.508	<0.001
	1		136 (13.6)		136 (100.0)

2.When pval=False, it will not break.

mytable = TableOne(data, columns=columns, categorical=categorical, groupby=groupby, nonnormal=nonnormal, rename=labels, pval=False,test_stat=True)

		Missing	Overall	0	1	Test-stat
n			1000	864	136
Age, median [Q1,Q3]		0	68.0 [53.0,79.0]	66.0 [52.8,78.0]	75.0 [62.0,83.0]	23.882
SysABP, mean (SD)		291	114.3 (40.2)	115.4 (38.3)	107.6 (49.4)	1.510
Height, mean (SD)		475	170.1 (22.1)	170.3 (23.2)	168.5 (11.3)	1.030
Weight, mean (SD)		302	82.9 (23.8)	83.0 (23.6)	82.3 (25.4)	0.277
ICU, n (%)	CCU	0	162 (16.2)	137 (15.9)	25 (18.4)	20.093
	CSRU		202 (20.2)	194 (22.5)	8 (5.9)
	MICU		380 (38.0)	318 (36.8)	62 (45.6)
	SICU		256 (25.6)	215 (24.9)	41 (30.1)
mortality, n (%)	0	0	864 (86.4)	864 (100.0)		991.508
	1		136 (13.6)		136 (100.0)

3.Fisher's test doesn't calculate statistics. The test_stat of Fisher's test is set to None. And the warning message will prompt the users.

		Missing	Overall	0	1	Test-stat
n			41	1	40
a, n (%)	1	0	11 (26.8)	1 (100.0)	10 (25.0)	nan
	0		30 (73.2)		30 (75.0)
b, n (%)	0	0	1 (2.4)	1 (100.0)		nan
	1		40 (97.6)		40 (100.0)

[1] Fisher's test did not caompute statistics of hypothesis testing. The following variables are affected: a, b.

Add the statistics of hypothesis testing

cad10c5

Add an option 'test_stat' to display statistics of hypothesis testing (default: False). The statistics are already computed. This option is only displaying.

tompollard mentioned this pull request Oct 7, 2022

Statistics of hypothesis testing in tableone #136

Open

Update test_stat

849f56f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the statistics of hypothesis testing #135

Add the statistics of hypothesis testing #135

260147169 commented Oct 7, 2022

jraffa commented Oct 11, 2022

260147169 commented Oct 22, 2022

Add the statistics of hypothesis testing #135

Are you sure you want to change the base?

Add the statistics of hypothesis testing #135

Conversation

260147169 commented Oct 7, 2022

jraffa commented Oct 11, 2022

260147169 commented Oct 22, 2022