# Estimating the total biomass of terrestrial protists
After searching the literature, we could not find a comprehensive account of the biomass of protists in soils. We generated a crude estimate of the total biomass of protists in soil based on five methodologies. The first two methodologies are based on direct counts of protists in soils, where as the last three methods are based on molecular techniques. We detail below the calculation of the global protist biomass using each method. Our best estimate for the total biomass of soil protists is the geometric mean of the estimates from the five different methodologies.

## Direct biomass density measurements
Our first method for estimating the total biomass of protists relies on measurement of the average biomass density of protists is soils. We collected data from several studies, which we list below:

In [50]:
import pandas as pd
import numpy as np
from scipy.stats import gmean
import sys
sys.path.insert(0,'../../statistics_helper/')
from fraction_helper import *
from CI_helper import *
pd.options.display.float_format = '{:,.1f}'.format
data = pd.read_excel('terrestrial_protist_data.xlsx')
data

Unnamed: 0,Reference,DOI,Method,Group,Habitat,Site,Biomass density [g C m^-2],Remarks
0,Schröter et al.,http://dx.doi.org/10.1034/j.1600-0579.2003.120...,Direct counts,Testate amoebae,Coniferous forest,North Sweden,0.2,Taken from table 3
1,Schröter et al.,http://dx.doi.org/10.1034/j.1600-0579.2003.120...,Direct counts,Testate amoebae,Coniferous forest,South Sweden,0.6,Taken from table 3
2,Schröter et al.,http://dx.doi.org/10.1034/j.1600-0579.2003.120...,Direct counts,Testate amoebae,Coniferous forest,Germany,1.0,Taken from table 3
3,Schröter et al.,http://dx.doi.org/10.1034/j.1600-0579.2003.120...,Direct counts,Testate amoebae,Coniferous forest,France,0.6,Taken from table 3
4,Zwart et al.,http://dx.doi.org/10.1016/0167-8809(94)90043-4,Direct counts,Ameboa and Flagellates,Cropland,Netherlands,1.2,"Top 25 cm, Taken from table 1"
5,De Ruiter et al.,http://dx.doi.org/10.2307/2404274,Direct counts,Ameboa and Flagellates,Cropland,Netherlands,0.6,"Top 25 cm (85% in top 10 cm), Taken from Table 1"
6,Schaefer,http://dx.doi.org/10.1007/BF00318544,Direct counts,"Flagellates, Ameboa, Testate amoebae",Beech forest,Germany,0.8,Taken from table 1 assuming 50% carbon content
7,Stapleton et al.,http://dx.doi.org/10.1016/j.soilbio.2005.03.016,Direct counts,Heterotrophic flagellates and Testate amoeba,Tundra,Svalbard,8.3,Values extracted from Figure 2 in the control ...
8,Stapleton et al.,http://dx.doi.org/10.1016/j.soilbio.2005.03.016,Direct counts,Heterotrophic flagellates and Testate amoeba,Tundra,Svalbard,8.5,Values extracted from Figure 2 in the control ...
9,Stapleton et al.,http://dx.doi.org/10.1016/j.soilbio.2005.03.016,Direct counts,Heterotrophic flagellates and Testate amoeba,Tundra,Svalbard,2.5,Values extracted from Figure 2 in the control ...


To generate our best estimate based on this method, we first calculate the geometric mean of values for each study. We then calculate the geometric mean for each habitat, and then calculate the geometric mean of the average values from different habitats:

In [63]:
# Define the function to calculate the geometric mean for each study
def groupby_gmean(input):
    mean = gmean(input['Biomass density [g C m^-2]'])
    habitat = np.unique(input['Habitat'])[0]
    return pd.Series({'Habitat': habitat, 'Biomass density [g C m^-2]': mean})

# Calculate the geometric mean for each study
study_mean = data.groupby('Reference').apply(groupby_gmean)

# Calculate the geometric mean of the biomass density at each habitat
habitat_mean = data.groupby('Habitat')['Biomass density [g C m^-2]'].apply(gmean)

# Calculate the geometric mean of biomass densities from different habitats
direct_biomass_mean = gmean(habitat_mean)

print('Our best estimate for the biomass density of  protists in soil based on direct biomass density measurements is ≈%.1f g C m^-2' % direct_biomass_mean)

Our best estimate for the biomass density of  protists in soil based on direct biomass density measurements is ≈1.2 g C m^-2


To generate our estimate for the total biomass of protists using the direct biomass density measurement method, we multiply the our best estimate for the biomass density by the total area of ice-free land surface, which is ≈$1.3×10^{14} m^2$:

In [3]:
ice_free_area = 1.3e14

# Calculate the total biomass of soil protists
method1_estimate = direct_biomass_mean*ice_free_area 

print('Our best estimate for the biomass of soil protists using direct biomass density measurements is ≈%.1f Gt C' % (method1_estimate/1e15))

Our best estimate for the biomass of soil protists using direct biomass density measurements is ≈0.2 Gt C


## Number of individuals and carbon content
In this method, in order to calculate the total biomass of soil protists we calculate a characteristic number of individual protists in a gram of soil for each one of the morphological groups of protists (flagellates, ciliates, and naked and testate ameobae). We combine these estimates with estimates for the carbon content of each morphological group.

# Characteristic carbon content of protists
We estimate the characteristic carbon content of a single protist from each of the morphological groups of protists  based on data from three sources.

The first source is [Finlay & Fenchel](http://dx.doi.org/10.1078/1434-4610-00060). We calculate the average cell length for each group. 

For flagellates, the estimates on the number of individuals per gram of soil distinguishes between small and large flagellates (defined as flagellates below or above 15 µm in diameter). We thus estimate the average length of small and large flagellates by dividing them into these two size categories.

In [4]:
# Load data from Finlay & Fenchel
ff_data = pd.read_excel('terrestrial_protist_data.xlsx', 'Finlay & Fenchel', skiprows=1)

# Define the function to calculate the weighted average for each group of protists
def weighted_av_groupby(input):
    return np.average(input['Length [µm]'],weights=input['Abundance [# g^-1]'])

cell_lengths = ff_data.groupby('Protist type').apply(weighted_av_groupby)

We convert the cell length to biovolume according the the allometric relation decribed in Figure 10 of Finlay & Fenchel. The relation between cell volume and cell length is given by the equation: 
$$V = 0.6×L^{2.36}$$
Where V is the cell volume in $µm^3$ and L is the cell length in µm.

In [5]:
cell_volumes = 0.6*cell_lengths**2.36
cell_volumes

Protist type
All protozoa         593.2
Ciliate            5,404.3
Large Flagellate   1,085.1
Naked amoebae      1,355.9
Small Flagellate      87.5
Testate amoebae    3,634.9
dtype: float64

We convert cell volumes to carbon content assuming ≈150 fg C µm$^3$:

In [37]:
ff_carbon_content = cell_volumes*150e-15
pd.options.display.float_format = '{:,.1e}'.format
ff_carbon_content

Protist type
All protozoa       8.9e-11
Ciliate            8.1e-10
Large Flagellate   1.6e-10
Naked amoebae      2.0e-10
Small Flagellate   1.3e-11
Testate amoebae    5.5e-10
dtype: float64

Our second source for estimating the carbon content of soil protists is [Persson et al.](http://www.jstor.org/stable/20112829), which reports the dry weight of individuals from different morphological types:

In [7]:
persson_data = pd.read_excel('terrestrial_protist_data.xlsx', 'Persson', skiprows=1)
persson_data

Unnamed: 0,Morphological type,Mean body dry weight [g]
0,Cilliates,1.5e-09
1,Flagellates,4e-10
2,Rhizopoda,8e-10


Our third source is [Schaefer](http://dx.doi.org/10.1007/BF00318544), which reports the total number of cells and the total biomass for three morphological groups of protists - Flagellates, Amoebae and Testate amoebae. We calculate the characteristic carbon content for each group by dividing the total biomass by the total number of individuals:

In [41]:
# Load the data from Schaefer
schaefer_data = pd.read_excel('terrestrial_protist_data.xlsx', 'Schaefer', skiprows=1,index_col='Group')

# Calculate the characteristic carbon content for each of the groups of protists
schaefer_cc = schaefer_data['Biomass density [g C m^-2]']/schaefer_data['Number of individuals (# m^-2) ']
schaefer_cc

Group
Flagellates       1.0e-11
Naked amoebae     1.6e-10
Testate amoebae   2.0e-09
dtype: float64

Our best estimate for the carbon content of each morphological group is the geometric mean of the estimates from Finlay & Fenchel, Persson et al. and Schaefer. Persson reports values for Rhizopoda, which include naked amoebae, and does not report values for Testate amoebae. Schaefer reports values for naked and testate amoebae but not for ciliates. For flagellates, we use only the data from Finlay & Fenchel as it is able to distinguish between small and large Flagellates.

In [44]:
carbon_content = pd.DataFrame()

ciliate_cc = gmean([ff_carbon_content['Ciliate'],persson_data.loc[0]['Mean body dry weight [g]']/2])
small_flagellate_cc = ff_carbon_content['Small Flagellate']
large_flagellate_cc = ff_carbon_content['Large Flagellate']
naked_amoebae_cc = gmean([ff_carbon_content['Naked amoebae'],persson_data.loc[2]['Mean body dry weight [g]']/2,schaefer_cc.loc['Naked amoebae']])
testate_amoebae_cc = gmean([ff_carbon_content['Testate amoebae'],schaefer_cc.loc['Testate amoebae']])

carbon_content['Carbon content [g C]'] = pd.Series([ciliate_cc,large_flagellate_cc,naked_amoebae_cc,small_flagellate_cc,testate_amoebae_cc])
carbon_content.set_index(ff_carbon_content.index[1:],inplace=True)
carbon_content

Unnamed: 0_level_0,Carbon content [g C]
Protist type,Unnamed: 1_level_1
Ciliate,7.8e-10
Large Flagellate,1.6e-10
Naked amoebae,2.4e-10
Small Flagellate,1.3e-11
Testate amoebae,1.1e-09


### Number of individuals
We rely on two main sources for our estimate. The first is [Adl & Coleman](http://dx.doi.org/10.1007/s00374-005-0009-x). The second source in [Finlay & Fenchel](http://dx.doi.org/10.1078/1434-4610-00060). For each study, we calculate the geometric mean of measurements for each protist group:

In [45]:
ac_data = pd.read_excel('terrestrial_protist_data.xlsx', 'Adl & Coleman', skiprows=1)

def groupby_mean(input):
    return gmean(input.dropna())
ac_mean = ac_data[['Small flagellates','Large flagellates','Gymnamoebae', 'Ciliates']].apply(groupby_mean)

ff_mean = ff_data.groupby('Protist type')['Abundance [# g^-1]'].apply(sum)

As our best estimate we use the geometric mean of values from Adl & Coleman and from Finlay & Fenchel. For Flagellates, we use only the values from Adl & Coleman, as Finlay & Fenchel rely on the Most Probable Number method to measure the amount of flagellates. This method is based on culturing of protists, which is known to under estimate the actual number of protists. Adl & Coleman do not report a value for testate amoebae, and thus we rely on the values from Finlay & Fenchel.

In [46]:
abund_mean = pd.DataFrame()
ciliate_abun = gmean([ac_mean['Ciliates'], ff_mean['Ciliate']])
naked_amoebae_abun = gmean([ac_mean['Gymnamoebae'],ff_mean['Naked amoebae']])
abund_mean['Abundance [# g^-1]'] = pd.Series([ciliate_abun,ac_mean['Large flagellates'],naked_amoebae_abun,ac_mean['Small flagellates'], ff_mean['Testate amoebae']])
abund_mean.set_index(ff_mean.index[1:],inplace=True)
abund_mean

Unnamed: 0_level_0,Abundance [# g^-1]
Protist type,Unnamed: 1_level_1
Ciliate,740.0
Large Flagellate,23000.0
Naked amoebae,42000.0
Small Flagellate,990000.0
Testate amoebae,10000.0


To calculate the total biomass of protists per gram of soil, we multiply the total number of individuals of each group of protists by their respective carbon conent, and sum over all protist groups:

In [47]:
tot_biomass_density = (carbon_content['Carbon content [g C]']*abund_mean['Abundance [# g^-1]']).sum()
print('Our best estimate for the biomass of protists per gram of soil is ≈%.1e g C' % tot_biomass_density)

Our best estimate for the biomass of protists per gram of soil is ≈3.8e-05 g C


To convert the biomass density per gram of soil to units of biomass per area, we use a soil bulk density of ≈1.5 g cm$^{-3}$. We assume that most biomass is concentrated in the top 20 cm of soil (see the section on terrestrial protists for in the Supplemenray Information for details).

In [48]:
bulk_density = 1.5e6
biomass_depth = 0.2
biomass_per_m2 = tot_biomass_density*bulk_density*biomass_depth
print('Our best estimate for the biomass of protists per m^2 of soil is ≈%.0f g C' % biomass_per_m2)
carbon_content['Carbon content [g C]']*abund_mean['Abundance [# g^-1]']*bulk_density*biomass_depth/biomass_per_m2

Our best estimate for the biomass of protists per m^2 of soil is ≈11 g C


Protist type
Ciliate            1.5e-02
Large Flagellate   9.9e-02
Naked amoebae      2.6e-01
Small Flagellate   3.4e-01
Testate amoebae    2.9e-01
dtype: float64

To calculate the total biomass of protists based on measurements of number of individuals and characteristic carbon contents per individual, we multiply the biomass density per unit area by the total ice-free land surface, which is ≈$1.3×10^{14} m^2$:

In [13]:
method2_estimate = biomass_per_m2*ice_free_area
print('Our best estimate for the biomass of soil protists using measurements of number of individuals and carbon content is ≈%.1f Gt C' % (method2_estimate/1e15))

Our best estimate for the biomass of soil protists using measurements of number of individuals and carbon content is ≈1.4 Gt C


The next three methods for estimating the total biomass of protists are based on mulecular surveys of the abundance of protists in soils. The methods we use to estimate the total biomass of protists are 18S rDNA sequencing, 18S rRNA sequencing and metatranscriptomics. 

The molecular techniques we rely on measure the relative fraction of protists out of the total population of soil eukaryotes. Estimating the total biomass of eukaryotes based on molecular techniques assumes a correlation between the number of reads of a specific taxon and its biomass. Even though this procedure is not well established , we rely on it as one of our sources due to the scarcity of data. 

To generate our estimate of the total biomass of soil protist using these molecular techniques, we multiply the fraction of protists out of the total biomass of soil eukaryotes by our estimate for the total biomass of soil fungi, which we assume dominate the biomass of soil eukaryotes.

## 18S rDNA sequencing
To estimate the total biomass of soil protists from 18S rDNA sequencing data, we calculate the fraction of protists out of the total population of soil eukaryotes based on data from forests ([Tedersoo et al.](http://dx.doi.org/10.1038/ismej.2015.116)), grasslands and croplands ([Chen et al.](http://dx.doi.org/10.3389/fmicb.2015.01149)). Below is a sample of the data:


In [14]:
# Load the data from Chen et al.
chen_data = pd.read_excel('terrestrial_protist_data.xlsx', 'Chen',skiprows=1)
chen_data

Unnamed: 0,Site,Fungi,Protists,Habitat
0,G-0,0.7,0.19,Grassland
1,G-7,0.53,0.29,Grassland
2,G-30,0.61,0.17,Grassland
3,A-0,0.6,0.27,Cropland
4,A-7,0.64,0.22,Cropland
5,A-30,0.69,0.14,Cropland
6,G-F-0,0.58,0.29,Grassland
7,A-F-0,0.61,0.24,Cropland


We first calculate the geometric mean of the values in Chen et al.:

In [15]:
chen_mean = frac_mean(chen_data.groupby('Habitat')['Protists'].apply(frac_mean))

As our best estimate for the fraction of protists out of the population of soil eukaryotes we use the geometric mean of the value from Chen et al. and the value reported in Tedersoo et al. of ≈6%. We calculate the total biomass of soil protists by multiplying the fraction of protists out of the total population of soil eukaryotes by the our estimate of the total biomass of soil fungi:

In [16]:
# The fraction of protists out of the population of soil eukaryotes reported in Tedersoo et al.
tedersoo_frac = 0.06

# Calculate our best estimate for the fraction of soil protists
rDNA_frac = frac_mean(np.array([chen_mean,tedersoo_frac]))

# Our best estimate for the biomass of soil fungi
fungi_biomass = 12e15

# Calculate the total biomass of soil protists based on 18S rDNA sequencing data
method3_estimate = rDNA_frac*fungi_biomass

print('Our best estimate for the biomass of soil protists based on 18S rDNA sequencing data is ≈%.1f Gt C' %(method3_estimate/1e15))

Our best estimate for the biomass of soil protists based on 18S rDNA sequencing data is ≈1.4 Gt C


## 18S rRNA sequencing
To estimate the total biomass of soil protists from 18S rRNA sequencing data, we calculate the fraction of protists out of the total population of soil eukaryotes based on data from beech and spruce forests ([Damon et al.](http://dx.doi.org/10.1371/journal.pone.0028967)). Below is a sample of the data:

In [17]:
# Load the data from Damon et al.
damon_data = pd.read_excel('terrestrial_protist_data.xlsx', 'Damon', skiprows=1)

# Use the data based on 18S rRNA sequencing
rRNA_data = damon_data[damon_data['Method'] == '18S rRNA']
rRNA_data

Unnamed: 0,Sample,Fraction of protists,Method
0,Beech 1A,0.12,18S rRNA
3,Beech 1B,0.12,18S rRNA
6,Spruce 1A,0.12,18S rRNA
9,Spruce 1B,0.12,18S rRNA


We calculate the geometric mean of the values from Damon et al. as our best estimate for the fraction of protists out of the total population of soil eukaryotes. We calculate the total biomass of soil protists by multiplying the fraction of protists out of the total population of soil eukaryotes by the our estimate of the total biomass of soil fungi:

In [18]:
# Calculate the geometric mean of the values from Damon et al.
rRNA_frac = frac_mean(rRNA_data['Fraction of protists'])

# Calculate the total biomass of soil protists based on 18S rRNA sequencing data
method4_estimate = rRNA_frac*fungi_biomass

print('Our best estimate for the biomass of soil protists based on 18S rRNA sequencing data is ≈%.1f Gt C' %(method4_estimate/1e15))

Our best estimate for the biomass of soil protists based on 18S rRNA sequencing data is ≈1.5 Gt C


## Metatranscriptomics
To estimate the total biomass of soil protists from metatranscriptomics data, we calculate the fraction of protists out of the total population of soil eukaryotes based on data from beech and spruce forests ([Damon et al.](http://dx.doi.org/10.1371/journal.pone.0028967)). Below is a sample of the data:

In [23]:
# Use the data based on 18S rRNA sequencing
meta_trans_data = damon_data[damon_data['Method'] == 'Metatranscriptomics']
meta_trans_data

Unnamed: 0,Sample,Fraction of protists,Method
1,Beech 2A,0.036,Metatranscriptomics
2,Beech 3A,0.038,Metatranscriptomics
4,Beech 2B,0.052,Metatranscriptomics
5,Beech 3B,0.036,Metatranscriptomics
7,Spruce 2A,0.029,Metatranscriptomics
8,Spruce 3A,0.026,Metatranscriptomics
10,Spruce 2B,0.043,Metatranscriptomics
11,Spruce 3B,0.034,Metatranscriptomics


We calculate the geometric mean of the values from Damon et al. as our best estimate for the fraction of protists out of the total population of soil eukaryotes. We calculate the total biomass of soil protists by multiplying the fraction of protists out of the total population of soil eukaryotes by the our estimate of the total biomass of soil fungi:

In [24]:
# Calculate the geometric mean of the values from Damon et al.
meta_trans_frac = frac_mean(meta_trans_data['Fraction of protists'])

# Calculate the total biomass of soil protists based on 18S rRNA sequencing data
method5_estimate = meta_trans_frac*fungi_biomass

print('Our best estimate for the biomass of soil protists based on metatranscriptomics data is ≈%.1f Gt C' %(method5_estimate/1e15))

Our best estimate for the biomass of soil protists based on metatranscriptomics data is ≈0.4 Gt C


As our best estimate for the biomass of soil protists, we use the geometric mean of the five estimates we generated from the five differnt methodologies:

In [26]:
# Calculate the geometric mean of the five different estimates we generated
best_estimate = gmean([method1_estimate,method2_estimate,method3_estimate,method4_estimate,method5_estimate])

print('Our best estimate for the biomass of terrestrial protists is ≈%.1f Gt C' %(best_estimate/1e15))

Our best estimate for the biomass of terrestrial protists is ≈0.7 Gt C


# Uncertainty analysis
To assess the uncertainty associated with our estimate of the total biomass of terrestrial protists, we collect available uncertainties for the values reported within studies, between studies using the same method, and between methods. We use the highest uncertainty out of this collection of uncertainties as our best projection for the uncertainty associated wi the estimate of the total biomass of terrestrial protists.

## Intra-study uncertainty
For each study which reports more than one value, we calculate 95% confidence interval of the geometric mean of those values.

### Direct biomass measurement

In [53]:
# Calculate the 95% confidence interval geometric mean for each study
biomass_study_CI = data.groupby('Reference')['Biomass density [g C m^-2]'].apply(geo_CI_calc)
biomass_study_CI

Reference
Bouwman & Zwart    nan
De Ruiter et al.   nan
Schaefer           nan
Schröter et al.    1.8
Stapleton et al.   1.7
Zwart et al.       nan
Name: Biomass density [g C m^-2], dtype: float64

### Carbon content and number of individuals
We calculate the intra-study 95% confience interval around the estimate of the total number of protists per gram of soil from Adl & Coleman:

In [73]:
ac_CI = ac_data[['Small flagellates','Large flagellates','Gymnamoebae', 'Ciliates']].apply(geo_CI_calc)
ac_CI

Small flagellates   4.4
Large flagellates   2.8
Gymnamoebae         1.8
Ciliates            2.3
dtype: float64

### 18S rDNA sequencing
We calculate the 95% confidence interval for the geometric mean of the values from Chen et al.:

In [56]:
print('The intra-study uncertainty of the value from Chen et al. is ≈%.1f-fold' %frac_CI(chen_data['Protists']))

The intra-study uncertainty of the value from Chen et al. is ≈1.2-fold


### 18S rRNA sequencing

In [60]:
print('The intra-study uncertainty associated with the fraction of protists based on 18S rRNA sequencing data of Damon et al. is ≈%.2f-fold' %frac_CI(rRNA_data['Fraction of protists']))

The intra-study uncertainty associated with the fraction of protists based on 18S rRNA sequencing data of Damon et al. is ≈1.01-fold


### 18S rRNA sequencing

In [62]:
print('The intra-study uncertainty associated with the fraction of protists based on metatranscriptomics data of Damon et al. is ≈%.1f-fold' %frac_CI(meta_trans_data['Fraction of protists']))

The intra-study uncertainty associated with the fraction of protists based on metatranscriptomics data of Damon et al. is ≈1.2-fold


## Intra-methd uncertainty
For each method that relies on more than one study, we calculate the 95% confidence interval of the geometric mean of the values from the different studies. The methods which are based on more than one study are the direct biomass measurement-based method, the carbon content and number of individual based method and the 18S rDNA sequencing-based method.

### Direct biomass measurement
To calculate our best estimate for the biomass of terrestrial protists based on direct biomass density measurements, we first calculated the geometric mean of values from the same habitat, generating characteristic values for each habitat.We then calculate the geomteric mean of the characteristic values for each habitat. 

As a measure of the interstudy uncertainty associated with the estimate based on direct biomass density measurements, we first calculate the 95% confidence interval of the characteristic values for each habitat, and then calculate the 95% confidence invertval around the geometric mean of the characteristic values from each habiat

#### Uncertainty within habitats

In [152]:
biomass_intra_habitat_CI = data.groupby('Habitat')['Biomass density [g C m^-2]'].apply(geo_CI_calc)
print('The interstudy uncertainty for studies within the same habitat:')
biomass_intra_habitat_CI

The interstudy uncertainty for studies within the same habitat:


Habitat
Beech forest        nan
Coniferous forest   1.8
Cropland            1.8
Tundra              1.7
Name: Biomass density [g C m^-2], dtype: float64

#### Uncertainty between habitats

In [70]:
biomass_inter_habitat_CI = geo_CI_calc(habitat_mean)
print('The 95 percent confidence interval of the geometric mean of the characteristic biomass densities from each habitat is ≈%.1f-fold' %biomass_inter_habitat_CI)

The 95 percent confidence interval of the geometric mean of the characteristic biomass densities from each habitat is ≈2.8-fold


### Carbon content and number of individuals
As a measure of the interstudy uncertainty associated with the estimate of the biomass of terrestrial protists based on the characteristic carbon content of soil protists and the density of number of individuals per unit area, we first calculate the interstudy uncertainty for the characteristic carbon content of each type of protist:

#### Carbon content of groups of protists
For each group of protists, we calculate the 95% confidence interval around our estimate of the characteristic carbon content of single protists from that group. For flagellates, we rely only on a single source, and thus for the estimate of the carbon content of flagellates we are not able to project an uncertainty.

In [74]:
# Calculate the interstudy 95% confidence interval around the estimate of the carbon content of each group
ciliate_cc_CI = geo_CI_calc([ff_carbon_content['Ciliate'],persson_data.loc[0]['Mean body dry weight [g]']/2])
naked_amoebae_cc_CI = geo_CI_calc([ff_carbon_content['Naked amoebae'],persson_data.loc[2]['Mean body dry weight [g]']/2,schaefer_cc.loc['Naked amoebae']])
testate_amoebae_cc_CI = geo_CI_calc([ff_carbon_content['Testate amoebae'],schaefer_cc.loc['Testate amoebae']])

Our best projection for the uncertainty associated with the estimate of the carbon conent of a single protist is ≈3.6-fold


Next, we calculate the interstudy uncertainty uncertainty associated with the estimate of the total number of individual protists per gram of soil:

#### Number of individuals of 
For each group of protists, we calculate the 95% confidence interval around our estimate of the density of number of individuals from that group per unit area. For flagellates and testate amoebae, we rely only on a single source, and thus we are not able to project an uncertainty.

In [77]:
# Calculate the interstudy 95% confidence interval around the estimate of the number of individuals
# per gram of soil for each group
ciliate_abun_CI = geo_CI_calc([ac_mean['Ciliates'], ff_mean['Ciliate']])
naked_amoebae_abund_CI = geo_CI_calc([ac_mean['Gymnamoebae'],ff_mean['Naked amoebae']])

We propagate the uncertainties associated with the carbon content and number of individuals per gram soil for each group into our final estimate of the biomass of soil protists. In cases we could not calculate the uncertainty associated with the estimate, we use the mean of the uncertainties from the other groups.

In [151]:
# Calculate the average uncertainty associated with the estimate of the carbon content 
# and number of individuals per gram of soil
average_cc_CI = np.mean([ciliate_cc_CI,naked_amoebae_cc_CI, testate_amoebae_cc_CI])
average_abund_CI = np.mean([ciliate_abun_CI,naked_amoebae_abund_CI])

# Propagate the uncertainty in the carbon content and number of individuals for each group
# For cased where no uncertainty projection is available, use the average uncertainty calculate
# above
ciliate_CI = CI_prod_prop(np.array([ciliate_cc_CI, ciliate_abun_CI]))
naked_amoebae_CI = CI_prod_prop(np.array([naked_amoebae_cc_CI,naked_amoebae_abund_CI]))
flagellate_CI = CI_prod_prop(np.array([average_cc_CI,average_abund_CI]))
testate_amoebae_CI = CI_prod_prop(np.array([testate_amoebae_cc_CI,average_abund_CI]))

# Propagate the uncertainties for each group to the total estimate of the biomass of soil protists
method2_inter_CI = CI_sum_prop(estimates= (carbon_content['Carbon content [g C]']*abund_mean['Abundance [# g^-1]']), 
                               mul_CIs=np.array([ciliate_CI,flagellate_CI,naked_amoebae_CI,flagellate_CI,testate_amoebae_CI]))
print('Our best projection for the interstudy uncertainty associated with the estimate of the total biomass of soil protists based on estimates of carbon content and number of individuals is ≈%.0f-fold' % method2_inter_CI)

Our best projection for the interstudy uncertainty associated with the estimate of the total biomass of soil protists based on estimates of carbon content and number of individuals is ≈9-fold


### 18S-rDNA sequencing
Our estimate of the biomass of soil protists based on 18S rDNA sequencing relies on data from two studies (Tedersoo et al. and Chen et al.). We calculate the 95% confidence interval around the geometric mean of values from the two studies as our best projection of the interstudy uncertainty associated with the estimate of the total biomass of terrestrial protists based on 18S rDNA sequencing.

In [153]:
# Calculate the 95% confidence interval around the estimate for the fraction of soil protists
rDNA_frac_CI = frac_CI(np.array([chen_mean,tedersoo_frac]))

print('Our best projection for the interstudy uncertainty associated with the estimate of the total biomass of soil protists based on 18S rDNA sequencing is ≈%.0f-fold' % rDNA_frac_CI)

Our best projection for the interstudy uncertainty associated with the estimate of the total biomass of soil protists based on 18S rDNA sequencing is ≈3-fold


## Inter-method uncertainty
As our best estimate of the total biomass of soil protists we use the geometric mean of the estimates from the five independent methods. As a measure of the uncertainty associated with the geometric mean of estimates from different methods, we calculate the 95% confidence interval around the geometric mean of the estimates.

Because we are less confident in our estimates based on molecular techmiques, we first calculate the geometric mean of the estimates based on the three molecular techniques, and then calculate the 95% confidence interval of the geometric mean of the estimates from the first two methods and the mean of the estimates based on molecular techniques:

In [155]:
# Calculate the geometric mean of estimates based on molecular techniques
mol_estimate = gmean([method3_estimate,method4_estimate,method5_estimate])

# Calculate the 95% confidence interval around the geometric mean of values from 
# the two estimates based on direct measurements and the mean value from molecular
# techniques
inter_method_CI = geo_CI_calc(np.array([method1_estimate,method2_estimate,mol_estimate]))

print('Our best projection for the inter-method uncertainty associated with the estimate of the total biomass of soil protists is ≈%.0f-fold' % inter_method_CI)

Our best projection for the inter-method uncertainty associated with the estimate of the total biomass of soil protists is ≈4-fold


As our best projection for the uncertainty associated with the estimate of the 

In [None]:
mul_CI = 