In [1]:
%matplotlib inline

In [2]:
import sys
sys.path.append('src')

from markov import *
from figutils import *

# 2-state system

Condsider a sytem that has two states (State 0 and State 1). This could be a protein with bound and unbound states. Or ATP in a hydrolyzed or unhydrolyzed state.

Let $X=0$ correspond to when the system is in state 0. And let $X=1$ correspond to when the system is in state 1.

## Problem 1

a. Run the code below to generate a time trace when Keq = 1. Save the figure. Repeat for Keq = 5 and Keq = 10.

b. Give a qualitative description of the differences between these conditions.

c. For a two state system, over time, we expect $\frac{p_1}{p_0} = K_{eq}$. Solve for the absolute (not relative) value of $p_0$ as a function of Keq.

d. Run the code and plots below with Keq = 1 and time = 100. What *actual* fraction of time is spent in State 0? State 1? (this will be indicated on the plot). How does the Keq=1 simulation of part (a) compare to the simulation of part (d)? How do these relate to your answer in (c)? Explain any deviations.

e. The variance (the standard deviation squared) of a two-state random variable is given by $V = p_0(1-p_0)$. So the standard deviation of the two-state variable $X$ is $\sigma_X = \sqrt{p_0(1-p_0)}$. What is the $\sigma_X$ when Keq = 1? Repeat for Keq=5. This quantity represents how much $X$ fluctuates over time.

In [3]:
# forward rate
kf = 3
# backward rate
kb = 3
# equilibrium constant
Keq = 1 # f / b
kf *= Keq

# Rate matrix
Q = np.array([[-kf, kf], [kb, -kb]])

In [4]:
time = 10
tr, t = simulate_cmc(Q, time) # returns transitions, times

In [5]:
tr, t = np.array(tr), np.array(t)
t1 = (tr[:-1] * np.diff(t)).sum()
tt = t[-1]
t0 = tt-t1

In [6]:
plt.figure(figsize=(8, 3))
plt.step(t, tr, where='post')
plt.xlabel('Time')
plt.ylabel('State')
plt.yticks([0, 1])
plt.title(f't0: {t0/tt:.02f}   t1: {t1/tt:.02f}')
plt.tight_layout()

# Ensemble of N independent 2-state systems

Now we consider $N$ independent 2-state systems. This could be N independent proteins that could be bound or unbound, or N ATP molecules that could be hydrolyzed or unhydrolyzed.


## Problem 2

Make sure to have reset the `Keq` to 1 in the code cell above for the questions below.

a. Run `liveCMC` (in code cell below) with `nx=5`. Save a snapshot of the simulation. Repeat with `nx=300`. Save a snapshot.

b. Let the total number of blue tiles (state 1) be $N_1$. What do you expect to be the value of $N_1$ in both conditions? Explain.

c. Let $X$ be the state value (0 or 1) of a single tile, and define the ensemble mean state value as $\overline{X} = \frac{N_1}{N_\text{total}}$. What do you expect the value of $\overline{X}$ to be in both conditions? Explain.

d. Run `CMC` with `nx=10` and `time=200`. Plot the total number of blue tiles, $N_1$, of the ensemble over time. Save this figure. 

e. For (d), when does the system first come within 90% of its maximal value? (look at printed output of code cells below)

f. To capture the fluctuations over time, plot the histogram and Gaussian fit of $N_1$ (the number of tiles in state 1) after the system has approached 90% of its maximum. 

g. What is the typical size of a fluctuation in this system (d)? (i.e., what is the $\sigma_{N_1}$?, look at printed output of code cells below)

h. Rerun `CMC` with `nx=100` and the same time. Plot and save the time trace and histogram.

i. What is the typical size of a fluctuation of $N_1$ in (h)? (i.e., what is the standard deviation?). How does this compare to (g, `nx=10`)? Does the standard deviation of $N_1$ increase or decrease with system size? Show that the data follow the relationship below (approximately):

$$\sigma_{N_1} = \sqrt{N_{total}}\ \sigma_X$$

Hint: Recall that $N_{total}$ is `nx` squared. Also, $\frac{\sigma_{N_1}}{\sqrt{N_{total}}}$ should have a constant ratio. Take the values for $\sigma_{N_1}$ and $\sqrt{N_{total}}$ from two the two simulations (from d. and h.) and show this is the case.

In [7]:
nx = 5
time = 3.5
fa, num = liveCMC(Q, time=time, nx=nx, sr=15, ticks=False)

In [8]:
nx = 10
time = 200
ts, num = CMC(Q, time=time, nx=nx, sr=15, ticks=False)

In [9]:
plt.plot(ts, num, label=r'$N_1$')
# plt.fill_between(t, u-d, u+d, color='C0', alpha=0.3, label=r'$N_1 \pm \sigma$')
plt.xlabel('Time (sec)')
plt.ylabel('$N_1$')

f = 0.9
start = (num > f * num.max()).argmax()
n = num[start:]
print(f'Time at which >{f:.02%} max N1: {ts[start]:.02f}')

Time at which >90.00% max N1: 10.29


In [10]:
from scipy.stats import norm

b = np.arange(n.min(), n.max()+1)
_ = plt.hist(n, bins=b, density=1, label='Data', align='left')
plt.xlabel('Number of Systems with State 1')
plt.ylabel('Probability')

loc, scale = norm.fit(n)
xs = np.linspace(n.min(), n.max(), 150)
plt.plot(xs, norm(loc=loc, scale=scale).pdf(xs), label='Gaussian Fit')

plt.legend()

print('sigma(N_1):   {:.02f}'.format(n.std()))
p = 1/(1 + Keq)
sx = np.sqrt(p * (1-p))
# print(f'sigma(X):     {sx}')

sigma(N_1):   4.99


## Problem 3

In problem 2 we looked at fluctuations in **the total** number of systems with state 1 (i.e. $\sigma_{N_1}$). Now we will look at flucutations in **the ensemble mean** (i.e. $\sigma_{\overline{X}}$) as a function of system size. Recall that the ensemble mean is defined as $\overline{X} = \frac{N_1}{N_\text{total}}$

a. Run the code below and plot a scatter plot of 'system size' vs 'fluctuation size'. Save the figure.

b. What is the general trend observed? Give a qualitative explanation for why fluctuations of the **total**, $N_1$,  increase with system size, but fluctuations of the **mean**, $\overline{X}$, decrease with system size?

c. A common statistical tool to analyze deviations is the *standard error of the mean* which is the following:

$$\sigma_\overline{X}={\frac {\sigma_X}{\sqrt{n}}}$$

Do the numbers calculated for the scatter plot in (b) agree with this relationship? Show using two data points.

d. Are the observed deviations errors? Explain. 

e. Plot the log-log scatter plot. Save a figure.

f. Interpret the log-log plot. What is the approximate slope? How does this relate to the equation in (c)?

g. GRB2 is an important membrane binding protein. Assume the following: it is in thermodynamic equilibrium with its partner on the interior of the plasma membrane, its Keq for the bound state relative to the unbound state is 0.1, and there are 1000 copies of GRB2 in the cell.

  * (i) What is the total probability that GRB2 is bound? Follow logic of 1c. How much GRB2 do we expect to be bound at the membrane?
  * (ii) How much does the state of a single GRB2 molecule vary over time? That is, solve for $\sigma_X$ using the equation in 1e.
  * (iii) How much does the total amount of GRB2 bound to the membrane vary over time? That is, solve for $\sigma_{N_1}$ using the equation in 2i.
  * (iv) How much does the ensemble mean state of GRB2 vary over time? That is, solve for $\sigma_{\overline{X}}$ using the equation in 3c.
  * (v) Speculate on which of these quantities (total bound or ensemble mean) is more influential to cellular response and why.

In [11]:
ns = []
ps = []
sd = []
time = 50
for n in [1, 5, 10, 20, 40, 80]:
    ns.append(n*n)
    ts, num = CMC(Q, time=time, nx=n, sr=15, ticks=False)
    start = (num > 0.90 * num.max()).argmax()
    nu = num[start:]
    p = nu / n / n
    ps.append(p.mean())
    sd.append(p.std())
ns = np.array(ns)
ps = np.array(ps)
sd = np.array(sd)

In [12]:
np.set_printoptions(precision=4)
print('Number of tiles: ', ns)
print('Mean of X:       ', ps)
print('Fluctuations:    ', sd)

Number of tiles:  [   1   25  100  400 1600 6400]
Mean of X:        [0.4986 0.5133 0.5068 0.4979 0.4998 0.4993]
Fluctuations:     [0.5    0.0968 0.0493 0.0255 0.0125 0.0063]


In [13]:
plt.scatter(ns, sd)
plt.xlabel("System Size (N)")
plt.ylabel("Standard Deviation")

Text(0, 0.5, 'Standard Deviation')

In [14]:
lns, lsd = np.log(ns), np.log(sd)
plt.figure(figsize=(5, 3))
plt.scatter(lns, lsd)
plt.xlabel("System Size log $N_{total}$")
plt.ylabel(r"log $\sigma_{\overline{X}}$")
plt.tight_layout()