<h1>Analysis of the Pdi1p substrate proteome in yeast</h1>
<p>The aim of this analysis is the estimation of the number of disulfide bonds processed by the oxidative folding pathway (OPF) in yeast under standard growth conditions (YPD or SC, 30&deg;C). This potentially includes disulfide bonds in all proteins passing through the ER, ie proteins with final destinations in the ER, Golgi, Endosome, Vacuole, cell wall, or extracellular space.</p>
<h2>Identification of relevant gene products</h2>
<p>The main data source to identify this set of proteins is Wiederholt <i>et al.</i> (2010), who review and summarise information on mass spec-based subcellular localisation experiments from 18 different pre-2010 studies. A number of later studies are used to complete this set.</p>
<table>
    <tr>
        <th>Location</th>
        <th>Reference</th>
        <th>Dataset</th>
        <th>File</th>
    </tr>
    <tr>
        <td>All</td>
        <td><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2849710/" tagret=_blank>Wiederholt <i>et al.</i> 2010</a></td>
        <td><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2849710/bin/supp_R900002-MCP200_ST2.xls" target=_blank>Supplemental Table  S2</a></td>
        <td>Dataset_Wiederholt_Supp_2.csv</td>
    </tr>   
    <tr>
        <td>Tubular ER</td>
        <td><a href="https://elifesciences.org/articles/23816" target=_blank>Wang <i>et al.</i> 2017</a></td>
        <td><a href="https://elifesciences.org/download/aHR0cHM6Ly9jZG4uZWxpZmVzY2llbmNlcy5vcmcvYXJ0aWNsZXMvMjM4MTYvZWxpZmUtMjM4MTYtZmlnMy1kYXRhMy12Mi54bHN4/elife-23816-fig3-data3-v2.xlsx?_hash=sDEEHMP7tu5HySCb5TBd4t0h0tq%2F7T9eieWkX%2FB8DgY%3D" target=_blank>Source data 3 for Figure 3</a></td>
        <td>Dataset_Wang_Fig_3.csv</td>
    </tr>
    <tr>
        <td>Post Golgi Vesicles</td>
        <td><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3926324/" target=_blank>Forsmark <i>et al.</i> 2011</a></td>
        <td><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3926324/table/T2/" target=_blank>Table 2</a></td>
        <td>Dataset_Forsmark_Table_2_contaminants_removed.csv</td>>
    </tr>
    <tr>
        <td>Cell Wall</td>
        <td><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4535956/" target=_blank>Hsu <i>et al.</i> 2015</a></td>
        <td><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4535956/table/pone.0135174.t001/" target=_blank>Table 1</a></td>
        <td>Dataset_Hsu_Table1_Gene_names_cleaned.csv</td>
    </tr>
    <tr>
        <td>Secreted</td>
        <td><a href="https://pubs.acs.org/doi/full/10.1021/acs.jproteome.6b00953" target=_blank>Smeekens <i>et al.</i> 2017</a></td>
        <td><a href="http://pubs.acs.org/doi/suppl/10.1021/acs.jproteome.6b00953/suppl_file/pr6b00953_si_002.xlsx" target=_blank>Supplemental Table S1</a></td>
        <td>Dataset_Smeekens_Supp_1.csv</td>
    </tr>
</table>
<p>Source data are mostly available as .xls files. The Wiederholt <i>et al.</i> set was downloaded and proteins with relevant annotations manually extracted. All other data were downloaded and file names in the datasets added to the reference set if not already contained there.</p>

<h2>Gene expression parameters</h2>
<p>ORF names in the PDI substrate set were used to look up the number of Cys residues per gene in <a href="https://yeastmine.yeastgenome.org/" target=_blank>Yeastmine</a>. Intracellular protein abundance data were imported from the curated dataset by <a href="https://www.sciencedirect.com/science/article/pii/S240547121730546X?via%3Dihub" target=_blank>Ho <i>et al.</i> 2018</a> (table S4, "mean molecules per cell"). Protein turnover rates (which in the measured form should also account for export rates) were from <a href="http://www.cell.com/cell-reports/fulltext/S2211-1247(14)00934-6?_returnURL=https%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS2211124714009346%3Fshowall%3Dtrue" target=_blank>Christiano <i>et al.</i> 2014</a> (table S1,"Degradation rates (min-1))".</p>
<p>Overall this analysis identified 859 potential substrates for PDI, of which 829 had associated protein abundance information and 622 had experimentally measured protein turnover information. Genes without associated abundance information were discarded (none of these had associated turnover information, indicating that these are genuinely low abundance proteins). For proteins with abudnance information but without turnover information, the median turnover of proteins with turnover information was substituted.</p>
<p>The combined dataset is saved in file 'PDI_substrates.csv'.</p>

In [49]:
import pandas as pd
dataset = pd.read_csv('PDI_substrates.csv')
dataset.head()

Unnamed: 0,ORF,Cys,proteins_per_cell,degradation
0,YGL212W,1,2424,0.0
1,YGR199W,13,1524,0.0001
2,YKL175W,14,1526,0.0001
3,YIL169C,20,3007,0.0001
4,YGL156W,17,3325,0.0001


<h2>Calculation of Oxidation rates</h2>
<p>Overall rates of cysteine oxidation (in disulfides per minute) were calculated as
$$k_{Cys} = \sum_{i=1}^{n} A_i \cdot \left \lfloor \frac{Cys_i}{2} \right \rfloor \cdot  \big(k_i + \frac{\ln{2}}{t_d}\big)$$
where A<sub>i</sub> is the abundance of the <i>i</i>th protein and Cys<sub>i</sub> is the number of cysteines it contains, k<sub>i</sub> its degradation rate, and t<sub>d</sub> the doubling time of the growing culture (assumed to be 120 minutes, typical for lab strains grown in minimal medium). It is assumed that only even numbers of cysteines can be oxidised.</p>

In [75]:
import pandas as pd
import math

#calculate cys oxidation rates for each gene
cystines_per_minute = []
growth_rate = 120
for row in range(dataset.shape[0]):
    gene = dataset.iloc[row,:]
    cysrate = gene['proteins_per_cell'] * math.floor(gene['Cys'] / 2) * (gene['degradation'] + (math.log(2) / growth_rate))
    cystines_per_minute.append(cysrate)
print('At a growth rate of ' + str(growth_rate) + ' minutes, yeast PDI catalyses ' + str(round(sum(cystines_per_minute),1)) + ' disulfide bonds per minute')

At a growth rate of 120 minutes, yeast PDI catalyses 269189.2 disulfide bonds per minute
