Reducing Stochastic Discrete Models of Biochemical Networks

Show more

1. Introduction

Modelling and simulation of cellular processes are subjects of significant interest in fields such as Computational and Systems Biology. Biological processes at the level of a single cell are commonly portrayed as systems of biochemical reactions. The dynamics of various biochemically reacting systems may exhibit random fluctuations, due to some molecular species which exist in low amounts. These random fluctuations could have critical implications in biology [1] [2]. Then, stochastic models are essential for an accurate description of the system’s dynamics. An extensively used stochastic model of well-stirred biochemical systems is the Chemical Master Equation (CME) [3]. This model represents the state of the system as a Markov process. An exact stochastic simulation algorithm for the Chemical Master Equation was proposed by Gillespie [4] [5].

Biochemical processes at the cellular level may entail many biochemical species undergoing a significant number of reactions. The mathematical models of such biochemical networks have high dimension, and thus their numerical simulation is quite demanding. Moreover, numerous biochemical systems evolve on multiple time scales, leading to stiffness. Stiffness is a challenge for numerical simulations. A large number of the species and/or reactions of some biochemical systems encountered in applications imply that their mathematical models have a significant number of parameters. Usually, these parameters cannot be measured accurately. For many biochemical networks, it is difficult to determine the parts of the system which are critical in determining their behaviour. Therefore, it is critical to design accurate and robust strategies to simplify these complex biochemical systems, which maintain the key properties of the original network. The reduced models will be easier to analyze and simulate numerically. In addition, they can be utilized to predict and control the behaviour of the system. Furthermore, the simplified models have a lower number of parameters, which thus become easier to determine. Reduction techniques for deterministic models of biochemical systems include species and reaction lumping methods [6] [7], sensitivity analysis based schemes [8] and time-scale analysis based strategies [9] [10]. Lumping schemes cause a loss of the physical interpretation of the elementary reactions and of information on some species or reactions. These schemes are useful when poor data exist about certain reactions. Time-scale analysis applies to systems with reactions that can be partitioned into fast and slow events, for which the fast components are in a quasi-steady state. For complex networks of biochemical reactions, finding the constraints for the fast dynamics may be a challenging task. Sensitivity analysis examines the dependence of the system’s behaviour on its parameters. In particular, local sensitivity analysis of models of biochemical systems measures the variation of the system’s state with small perturbations in model’s parameters [11]. Sensitivity analysis makes it possible to identify and discard the reactions which are not essential, those with a negligible parametric sensitivity with respect to their rate parameters [8]. Few reduction techniques for stochastic models of biochemical systems exist in the literature. These include [12] [13] [14] for discrete models and [15] [16] for continuous ones (see also references therein).

This paper proposes a new technique for reducing the complexity of stochastic discrete models of homogeneous biochemical networks. The discrete stochastic model under consideration is the Chemical Master Equation. The Chemical Master Equation accurately describes the dynamics of a wide range of well-stirred, realistic biochemical systems. The reduced reaction mechanisms generated with the new technique are easier to understand and manipulate and have fewer parameters.

Local sensitivity analysis quantifies the variations in the system’s behaviour caused by small changes in its parameters. Our method relies on estimating (local) parametric sensitivities of the Chemical Master Equation model. These parametric sensitivities are approximated using the Coupled Finite Difference (CFD) scheme [17] and the Common Random Number (CRN) method [18]. Sensitivity analysis is performed to detect which parameters, and thus which reactions, are good candidates for elimination. It should be noted that local sensitivity analysis alone may not be a reliable tool for model reduction. It may indicate that a fast reaction can be eliminated, although that reaction is important. Elimination of a fast reaction means that the value of its rate parameter changes significantly. Nevertheless, local sensitivity analysis provides no information on the perturbations in the system’s dynamics induced by large variations in its parameters. To correct this, local sensitivity analysis is coupled with a global approach to model reduction, which requires solving an optimization problem. For achieving this, we shall apply a strategy akin to that proposed in [8] for reducing continuous deterministic models of chemical reactions. This strategy ensures that the simplified biochemical system retains the stability and nonlinear properties of the original network.

The paper is organized as follows. In Section 2, we present the background on stochastic discrete models of homogeneous biochemical systems, and methods to simulate and estimate parametric sensitivities for these models. In Section 3, we propose a new model reduction strategy for the Chemical Master Equation. The advantages of the proposed strategy are illustrated on three critical models arising in applications, namely the infection, the epidermal growth factor receptor signalling pathway and the gemcitabine biochemical systems, in Section 4.

2. Background

2.1. Chemical Master Equation

The evolution in time of homogeneous biochemical systems is governed by the Chemical Master Equation. This model has been successfully utilized to study vital biological processes, such as gene expression and regulation [1] [2].

Consider *N* biochemical species
${S}_{1}\mathrm{,}\cdots \mathrm{,}{S}_{N}$ interacting through *M* chemical reactions,
${R}_{1}\mathrm{,}\cdots ,{R}_{M}$. Assume that the system is homogeneous, and at constant volume and thermal equilibrium. It is described by the state vector:

$X\left(t\right)={\left[{X}_{1}\left(t\right),{X}_{2}\left(t\right),\cdots ,{X}_{N}\left(t\right)\right]}^{\text{T}}$

where
${X}_{i}\left(t\right)$ indicates the amount of
${S}_{i}$ molecules at time *t*. The biochemical system state,
$X\left(t\right)$, is a Markov process continuous in time and discrete in space. To each reaction
${R}_{j}$ it corresponds a state change vector
${\nu}_{j}\equiv {\left({\nu}_{1j}\mathrm{,}\cdots \mathrm{,}{\nu}_{Nj}\right)}^{\text{T}}$, with
${\nu}_{ij}$ representing the number of
${S}_{i}$ molecules consumed or produced when a reaction
${R}_{j}$ fires. The matrix
$\nu ={\left\{{\nu}_{ij}\right\}}_{1\le i\le N\mathrm{,1}\le j\le M}$, where *N* is the number of reactants and *M* is the number of reactions, represents the stoichiometric matrix. In addition, a propensity function
${a}_{j}\left(x\right)$ is used for describing a reaction
${R}_{j}$. By definition,
${a}_{j}\left(x\right)dt$ is the probability of one reaction
${R}_{j}$ firing during
$\left[t\mathrm{,}t+dt\right]$, provided that at time *t* the system was in state
$x$.

Each propensity function follows the mass-action kinetics; thus, for the first order reaction ${S}_{m}\stackrel{{c}_{j}}{\to}\text{products}$, it is ${a}_{j}\left(X\left(t\right)\right)={c}_{j}{X}_{m}\left(t\right)$, and for the second order reaction ${S}_{m}+{S}_{n}\stackrel{{c}_{j}}{\to}\text{products}$, with $m\ne n$, the propensity is ${a}_{j}\left(X\left(t\right)\right)={c}_{j}{X}_{m}\left(t\right){X}_{n}\left(t\right)$. Lastly, the propensity of ${S}_{m}+{S}_{m}\stackrel{{c}_{j}}{\to}\text{products}$ may be expressed as ${a}_{j}\left(X\left(t\right)\right)={c}_{j}{X}_{m}\left(t\right)\left({X}_{m}\left(t\right)-1\right)/2$.

For
$t\ge {t}_{0}$, let
$P\left(x\mathrm{,}t\mathrm{|}{x}_{0}\mathrm{,}{t}_{0}\right)$ be the conditional probability that the state vector at time *t* is
$X\left(t\right)=x$, given that at time
${t}_{0}$ it was
$X\left({t}_{0}\right)={x}_{0}$. Then,
$P\left(x\mathrm{,}t\mathrm{|}{x}_{0}\mathrm{,}{t}_{0}\right)$ satisfies the Chemical Master Equation [3]

$\frac{\partial P\left(x,t|{x}_{0},{t}_{0}\right)}{\partial t}={\displaystyle \underset{j=1}{\overset{M}{\sum}}}\left[{a}_{j}\left(x-{\nu}_{j}\right)P\left(x-{\nu}_{j},t|{x}_{0},{t}_{0}\right)-{a}_{j}\left(x\right)P\left(x,t|{x}_{0},{t}_{0}\right)\right],$ (1)

where ${\nu}_{j}$ is the state-change vector for reaction ${R}_{j}$. The Chemical Master Equation is a refined stochastic discrete model of homogeneous biochemical systems. It is worth mentioning that, generally, the CME is a high dimensional model. Exact Monte Carlo techniques for the Chemical Master Equation model were developed in the literature. They include Gillespie’s algorithm [4] and the Random Time-Change (RTC) algorithm [17] [18] based on the RTC representation of Kurtz [19].

2.2. Gillespie’s Algorithm

The Chemical Master Equation (1) is computationally intractable for the majority of biochemical systems encountered in applications. To deal with this difficulty, Gillespie [4] [5] proposed an exact Monte Carlo strategy for the Chemical Master Equation, referred to as the stochastic simulation algorithm (SSA).

Gillespie’s algorithm proceeds as follows:

1) Initialize $t\leftarrow {t}_{0}$, $X\left({t}_{0}\right)\leftarrow {x}_{0}$.

2) At time *t*, evaluate

${a}_{sum}\left(X\left(t\right)\right):={\displaystyle \underset{k=1}{\overset{M}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{a}_{k}\left(X\left(t\right)\right).$

3) Draw a unit uniform random number ${r}_{1}$, and take

$\tau =\left[1/{a}_{sum}\left(X\left(t\right)\right)\right]\mathrm{ln}\left(1/{r}_{1}\right)\mathrm{.}$

4) Draw a unit uniform random number
${r}_{2}$ and denote by *j* the smallest integer satisfying

$\underset{k=1}{\overset{j}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{a}_{k}\left(X\left(t\right)\right)>{r}_{2}{a}_{sum}\left(X\left(t\right)\right).$

5) Update $X\left(t+\tau \right)\leftarrow X\left(t\right)+{\nu}_{j}$ and $t\leftarrow t+\tau $.

6) Return to step 2 or else stop.

2.3. Sensitivity Analysis

Sensitivity analysis is a key tool for investigating the robustness properties of a model with respect to perturbations in its parameters, for estimating these parameters or for guiding model reduction. Models of biochemically reacting systems depend on several parameters, such as the reaction rate parameters and the initial conditions. In many instances, the values of several such parameters are not available or cannot be evaluated accurately. For this reason, it is critical to investigate how variations in model parameters influence the behaviour of the system. Local sensitivity analysis examines how the system dynamics is influenced by small changes in model parameters. A model is sensitive with respect to one of its parameters if the behaviour of the system varies significantly with minor perturbations in that parameter; else it is robust with respect to variations in that parameter. Parametric sensitivities help determine which components of the biochemical network control the model’s behaviour. If a reaction is such that some parametric sensitivities with respect to its kinetic parameter are large, then it is considered important.

For a deterministic system, the parametric sensitivity of the state vector,
$X\left(t\mathrm{,}c\right)$, with respect to a parameter *c* can be computed by
$S=\partial X\left(t\mathrm{,}c\right)/\partial c$. Estimating parametric sensitivities of a stochastic discrete model can be a daunting task. In this work, we employ finite-difference approximations of the local parametric sensitivities for the Chemical Master Equation model. The sample paths necessary for the finite-difference estimations are generated with Monte Carlo simulations. In the discrete stochastic setting, if
$\mathbb{E}$ is the expected value,
$f$ is a function of interest,
$X\left(t\mathrm{,}c\right)$ represents the system state at time *t*, corresponding to the parameter *c*, then a finite-difference estimation of the sensitivity of
$\mathbb{E}\left(f\left(X\left(t,c\right)\right)\right)$ with respect to *c* is

$S\approx \frac{\mathbb{E}\left[f\left(X\left(t,c+h\right)\right)\right]-\mathbb{E}\left[f\left(X\left(t,c\right)\right)\right]}{h}.$ (2)

Here *h* is a small perturbation of the parameter *c*. In what follows, *c* represents a kinetic rate parameter.

Let
${X}^{\left(n\right)}\left(t\mathrm{,}c+h\right)$ and
${X}^{\left(n\right)}\left(t\mathrm{,}c\right)$ be trajectories generated by Monte Carlo simulations, for the parameters
$c+h$ and *c*, respectively. Here *n* is the trajectory index (
$1\le n\le L$ ) and *L* is the total number of trajectories. In this case, a finite-difference numerical approximation of the sensitivity, utilizing Monte Carlo simulations, is

$S\approx \frac{1}{L}{\displaystyle \underset{n=1}{\overset{L}{\sum}}}\frac{1}{h}\left[f\left({X}^{\left(n\right)}\left(t\mathrm{,}c+h\right)\right)-f\left({X}^{\left(n\right)}\left(t\mathrm{,}c\right)\right)\right].$ (3)

It is widely known that variance reduction strategies allow more precise estimations of the sensitivity, with a smaller numbers of simulations. To reduce the variance of the estimator, finite-difference sensitivity estimators couple the perturbed and unperturbed trajectories. Among the most effective and accurate finite-difference parametric estimators for stochastic discrete models of biochemical systems are the Common Random Number (CRN) method due to Rathinam *et al.* [18] and the Coupled Finite-Difference (CFD) scheme proposed by Anderson [17]. For further information on the CFD method, using the random time change representation [19] of the Markov process
$X\left(t\right)$, we refer the reader to [17]. The CRN method is presented below. Notice that the CRN and CFD techniques are based on exact Monte Carlo simulation schemes for the Chemical Master Equation.

Common Random Number (CRN) The common random number technique is utilized for estimating parametric sensitivities of stochastic discrete biochemical kinetic models. The CRN estimator has a low variance [18]. To estimate the sensitivity of the mean
$\mathbb{E}\left(X\left(t\mathrm{,}c\right)\right)$, with respect to a parameter *c*, two systems will be considered: the unperturbed system
$X\left(t\mathrm{,}c\right)$ and the perturbed counterpart
$X\left(t\mathrm{,}c+h\right)$. The CRN method applies Gillespie’s algorithm with the same string of common of random numbers to generate the coupled perturbed and unperturbed paths, corresponding to the parameters
$c+h$ and *c*, respectively. The use of a common random number sequence reduces the variance of the estimator, thus increases its accuracy for the same number of sample paths. The CRN algorithm consists of the following steps:

1) Take
$c={c}_{j}$. For each
$n\le L$, with *L* being the number of sample paths,

2) Generate an SSA path for the unperturbed system (for parameter *c*), with a sequence of independent uniform (0, 1) random numbers,
${X}^{\left(n\right)}\left(t\mathrm{,}c\right)$.

3) Generate an SSA path for the perturbed system (for parameter $c+h$ ), with the same sequence of independent uniform (0, 1) random number, ${X}^{\left(n\right)}\left(t\mathrm{,}c+h\right)$.

4) Calculate ${s}^{\left(n\right)}\left(t\right)=\left(f\left({X}^{\left(n\right)}\left(t,c+h\right)\right)-f\left({X}^{\left(n\right)}\left(t,c\right)\right)\right)/h$.

5) End loop over *n*.

6) Compute the mean of ${\left\{{s}^{\left(n\right)}\left(t\right)\right\}}_{n\le L}$.

Remark that the perturbation *h* should be small enough to decrease the truncation error associated with finite-difference approximations of derivatives, but large enough to increase the convergence rate of the sensitivity estimation [17]. The CRN algorithm is a popular technique for sensitivity estimation, being efficient and easy to implement. The parametric sensitivity estimators discussed above is utilized to simplify complex stochastic models of homogeneous biochemical networks. Alternatively, a finite-difference scheme may be employed for estimating sensitivities by approximate Monte Carlo simulation techniques [20] [21], such as adaptive tau-leaping methods; these strategies are efficient on moderately stiff to very stiff models.

3. Model Reduction

In this section, we introduce a new procedure to simplify stochastic discrete biochemical networks, which are modelled with the Chemical Master Equation. The goal of this procedure is to determine a reduced biochemical reaction system which retains the essential features of the full system, such as its stability features, nonlinear behaviour and physical interpretation of the elementary reactions. The reduced biochemical model will be faster to solve numerically and easier to investigate and understand. Local sensitivity analysis is applied to uncover the unimportant reactions of the full biochemical network. Also, it seeks to find the essential reactions, which should be retained in a simplified model. Further, our strategy extends the global approach for continuous deterministic chemical system reduction proposed in [8]. Remark that the reduced network generated with the procedure presented below is expected to perform well over a variety of parameter values, in a neighbourhood of the values for which it was obtained.

Initially, a local sensitivity analysis is performed for the stochastic discrete model of the original biochemical system. Of interest are the kinetic parameters,
$c={c}_{i}$. Finite-difference methods, such as the CFD or the CRN algorithms, are employed to estimate the local parametric sensitivities. This allows a *preselection* of the crucial reactions in the network, those for which the molecular amounts of all, or of the important species, are very sensitive with respect to small variations in the associated rate parameter. Let
${\left\{{R}_{i}\right\}}_{i\in {I}_{cr}}$, for some subset
${I}_{cr}$ of indexes
$\left\{\mathrm{1,}\cdots \mathrm{,}M\right\}$, be these crucial reactions. These reaction channels should be kept in the system. Moreover, the sensitivity analysis also helps determine the reactions for which small perturbations in their rate parameters cause insignificant changes in the overall behaviour of the system. More precisely, the kinetic parameter
${c}_{i}$ of such a reaction satisfies

$\left|\frac{{c}_{i}}{\mathbb{E}\left({X}_{k}\left(t,{c}_{i}\right)\right)}\frac{\partial}{\partial {c}_{i}}\mathbb{E}\left({X}_{k}\left(t,{c}_{i}\right)\right)\right|\ll 1,$

for all the quantities of interest ${X}_{k}$. These reactions could be unimportant, and may be considered for deletion.

Stability is a desired property of a simplified reaction network and a reduction technique based on local sensitivity analysis of the original system alone cannot guarantee it. As a consequence, we shall use a global approach to ensure that our strategy leads to a stable reduced system. For this, we consider the following *continuous* *optimization* *problem* for deterministic models (see also [8] ),

$\begin{array}{l}\mathrm{min}\Vert x-z\Vert \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{suchthat}\\ {x}^{\prime}=\nu F\left(x\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\left(0\right)={x}_{0}\\ {z}^{\prime}=\nu DF\left(z\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}z\left(0\right)={x}_{0},\text{\hspace{0.17em}}0\le t\le T\\ \mathcal{l}\le {\displaystyle \underset{i=1}{\overset{M}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{d}_{i}\le u,\text{\hspace{0.17em}}\text{\hspace{0.17em}}0\le {d}_{i}\le 1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{for}\text{\hspace{0.17em}}1\le i\le M,\\ g\left({d}_{1},\cdots ,{d}_{M}\right)\le r.\end{array}$ (4)

Note that
$x$ and
$z$ are the state vectors of the original and the reduced biochemical systems, respectively, modelled with the reaction rate equations. Also,
$\nu $ is the stoichiometric matrix,
$F(\cdot )$ represents the vector of propensities and
$D$ is a diagonal matrix, with
${d}_{i}$ being the *i*-th row diagonal entry. Observe that
${d}_{i}=1$ if the reaction is kept in the reduced system and
${d}_{i}=0$, if it is removed.

Here
$\mathcal{l}$ and *u* are the simplified model’s lower and upper bounds for the total reaction count, and
$r>0$ is a small constant. Also, *g* denotes a non-linear function which becomes 0 when
${d}_{i}$ are either 0 or 1, for all
$1\le i\le M$. We consider the following nonlinear function

$g={\displaystyle \underset{i=1}{\overset{M}{\sum}}}{\left({d}_{i}-{d}_{i}^{2}\right)}^{\beta},$ (5)

with $\beta \ge 2$ being a parameter. In our implementation, $\beta $ is set to 2. By relaxing the condition

$g\left({d}_{1}\mathrm{,}\cdots \mathrm{,}{d}_{M}\right)=0\text{\hspace{1em}}\text{to}\text{\hspace{1em}}g\left({d}_{1}\mathrm{,}\cdots \mathrm{,}{d}_{M}\right)\le r\mathrm{,}$

the discrete optimization is simplified to a continuous optimization problem. Once a solution of the continuous optimization problem is found, the coefficients
${d}_{i}$ are rounded to 0 if they are very small; otherwise they are rounded to 1. This result may yield a local rather than a global minimum. Still, for models with many reactions, *i.e.*, a large parameter space, a local minimum would be considered a satisfactory solution for the simplifying strategy. Also, the choice of the parameters (e.g., *u*, *r*), would affect how closely the dynamics of the reduced system matches that of the original version. There is a trade-off between improving the accuracy of the simplified mechanism and eliminating many reactions. To obtain a simpler optimization problem, after we identified the important reactions,
${\left\{{R}_{i}\right\}}_{i\in {I}_{cr}}$, and we assigned their
${d}_{i}=1$, we solve (4) over a smaller set of parameters,
${\left\{{d}_{j}\right\}}_{j\notin {I}_{cr}}$.

Remark that the optimization problem is formulated in a continuous deterministic framework. Empirically, we found that posing an optimization problem for the reaction rate equations, rather than for the much more challenging Chemical Master Equation model, yields an accurate reduction mechanism for biochemical systems with a low to moderate level of noise. Moreover, this procedure can handle efficiently large biochemical networks as well.

For solving the optimization problem (4), we utilize GEKKO [22], an optimizer toolbox. The modes of operation of GEKKO include parameter regression, data reconciliation, real-time optimization, dynamic simulation, and nonlinear predictive control. For our reduction procedure, we apply dynamic estimation with the APOPT solver. After the optimization problem is solved, we isolate the coefficients ${d}_{i}$ which are very small and set them to 0. This means that the associated reactions will be deleted from the biochemical system. The remaining coefficients ${d}_{i}$ are rounded to 1. In our experiments, we observed that a good number of unimportant reactions identified with the optimization approach are among those selected by the local sensitivity analysis as candidates for elimination. The results are validated by comparing the behaviours of the reduced and full models.

4. Numerical Examples

In this section, we illustrate the advantages of the proposed model reduction technique on three systems of biochemical reactions: one is a low size but key test problem, while the other two are high-dimension biochemical networks, with applications to cancer research. In each case, sensitivity analysis is performed to identify which parts of the network do not play a significant role in its overall behaviour. More importantly, the analysis finds the reactions which are critical for the dynamics of the biochemical system and set their ${d}_{i}=1$. After this preselection, the global optimization strategy is employed to select the reactions which can be removed from the system. Finally, the reduced and full models are simulated with Gillespie’s algorithm over a large ensemble of trajectories. The results are used for comparing the evolution of the mean and standard deviation of the molecular amounts for each species, in the original and simplified models.

4.1. Infectious Disease Model

The infectious disease model is a simple example of a two species reaction network [23]. The reaction channels of the infectious disease model and their rate parameters are given in Table 1. The model is simulated on the interval $\left[\mathrm{0,10}\right]$ with initial conditions ${S}_{1}\left(0\right)=20$ and ${S}_{2}\left(0\right)=40$.

The parametric sensitivities with respect to *c*_{2} are estimated using the CRN and the CFD methods on 10,000 sample paths. Figure 1 depicts the evolution in time of the fully normalized sensitivities of the species *S*_{1} and *S*_{2}, with respect to the kinetic parameter *c*_{2}. The approximations obtained with the two finite-difference sensitivity estimators, the CFD and the CRN algorithms, are in good agreement. We note that, for *i* = 1 and 2, the normalized sensitivities with respect to *c*_{2} are very small. Thus, reaction *R*_{2} is a good candidate for elimination, and this is validated by solving the optimization problem.

The full and reduced (without reaction *R*_{2}) models are simulated with the SSA on 10,000 trajectories. The means and standard deviations of the molecular counts of *S*_{1} and *S*_{2} as functions of time, for the full and reduced models, are plotted in Figure 2 and Figure 3, respectively. The match between the numerical results for the reduced, in which reaction *R*_{2} is deleted, and the full models is excellent in each case.

Table 1. The infectious disease model: the reaction channels and their rate parameters.

Figure 1. The infectious disease model: the evolution in time of the normalized sensitivities for *S*_{1} and *S*_{2} with respect to *c*_{2}, namely
$\left|\frac{{c}_{2}}{\mathbb{E}\left({X}_{i}\left(t\right)\right)}\frac{\partial}{\partial {c}_{2}}\mathbb{E}\left({X}_{i}\left(t\right)\right)\right|$, for
$i=1,2$. The simulations are done over 10,000 paths, with the CRN and the CDF schemes. The integration interval is
$\left[\mathrm{0,10}\right]$.

Figure 2. The infectious disease model: the comparison of the evolution in time of the means for the molecular amounts of the species *S*_{1} and *S*_{2}, for the full system and the reduced one, in which the reaction *R*_{2} is eliminated. The simulations are over 10,000 paths and the integration interval is
$\left[\mathrm{0,10}\right]$.

Figure 3. The infectious disease model: the comparison of the evolution in time of the standard deviations of the molecular amounts of the species *S*_{1} and *S*_{2}, for the full system and the reduced one, in which the reaction *R*_{2} is eliminated. The simulations are over 10,000 paths and the integration interval is
$\left[\mathrm{0,10}\right]$.

4.2. Epidermal Growth Factor Receptor (EGFR) Model

The epidermal growth factor receptor signalling pathway participates in cell differentiation and proliferation. The role of the epidermal growth factor in cancer cell proliferation has recently been the subject of intense research. The biochemical reaction network which models the EGFR signalling pathway consists of 23 molecular species undergoing 47 reaction channels [24].

The EGFR model reactions and their rate parameter values are given in Table 2. The molecular species and their initial molecular counts are listed in Table 3. Assume a cellular volume of 3 × 10^{−12} liters, with a nano-Mole concentration = 1800 (*i.e.*,
$6.023\times {10}^{+23}\times 3.00\times {10}^{-12}\times {10}^{-9}$ ) molecules per cell. For computational purposes, we shall consider molecular amounts rather than concentrations of the species. This requires rescaling the rate parameters. The problem is solved numerically on the time interval
$\left[\mathrm{0,100}\right]$. Remark that the EGFR model is stiff, given that it spans multiple time-scales; indeed, some reactions are slow others are fast. Stiffness is a major challenge for numerical simulation [25].

Initially, we use finite-difference approximations of the sensitivity by employing the CRN scheme with 2000 trajectories. We identify a set of reactions with small normalized parametric sensitivities for all molecular species. A sample plot of the evolution in time of the normalized sensitivities of various species with respect to one of the parameters, *c*_{21}, is displayed in Figure 4. The set identified at this step constitutes an initial selection of reactions which are good candidates

Table 2. The epidermal growth factor receptor (EGFR) signaling pathway model: the reaction channels and the rate parameters. First- and second-order rate parameter values are expressed in s^{−1} and nM^{−1}∙s^{−1}. Here [MM] denotes the reaction rate parameters for Michaelis-Menten kinetics (
${V}_{m}\left[\text{1}/\text{s}\right]$ and
${K}_{m}\left[\text{nM}\right]$ ).

Table 3. The epidermal growth factor receptor (EGFR) signaling pathway model: the molecular species and their initial amounts.

Figure 4. The epidermal growth factor receptor (EGFR) signaling pathway model: the evolution in time of the normalized sensitivities for several species with respect to *c*_{21}.

for deletion. Given the large dimension of the EGFR model, we also carry a sensitivity analysis to choose several reactions which are very important. We assign the value of their coefficients as
${d}_{i}=1$. At this step, we selected a group of 17 such reactions from a total of 47 reactions. This reduces the dimension of the optimization problem. Subsequently, we solve the optimization problem (4) to find a reduced biochemical model. We used the following parameter values:
$\mathcal{l}=2$,
$u=42$ and *r* close to 1. As a result, the following reactions are selected for elimination
${R}_{6}\mathrm{,}{R}_{19}\mathrm{,}{R}_{20}\mathrm{,}{R}_{21}\mathrm{,}{R}_{22}\mathrm{,}{R}_{34}$ and
${R}_{35}$.

Finally, we run Gillespie’s algorithm over 2000 trajectories for the original and the reduced model and compare the output. The means and standard deviations of the molecular amounts of various species, computed for the full and reduced models, are plotted as functions of time in Figure 5 and Figure 6, respectively. Similar results are obtained for the remaining species (not shown, for brevity). Notice the excellent agreement between the results produced with the reduced (with reactions ${R}_{6}\mathrm{,}{R}_{19}\mathrm{,}{R}_{20}\mathrm{,}{R}_{21}\mathrm{,}{R}_{22}\mathrm{,}{R}_{34}$ and ${R}_{35}$ turned off) and the full systems. Consequently, the proposed model reduction strategy provides very good results on this large size, stiff biochemical reaction network.

4.3. Gemcitabine Model

The final numerical experiment is performed on the gemcitabine (2,2-difluorodeoxycytidine, dFdC) biochemical network, which is a critical, real-world model [26] [27] designed to study the mechanisms of resistance to gemcitabine effectiveness. Gemcitabine is a chemotherapy drug, commonly administered for the

Figure 5. The epidermal growth factor receptor (EGFR) signaling pathway model: the mean number of molecules for different species as a function of time, for the full and reduced model. The following reactions are eliminated: ${R}_{6}\mathrm{,}{R}_{19}\mathrm{,}{R}_{20}\mathrm{,}{R}_{21}\mathrm{,}{R}_{22}\mathrm{,}{R}_{34}$ and ${R}_{35}$. The species shown are: (a) EGFR, (b) R, (c) Ra, (d) R2, (e) RP, (f) PLCg, (g) RPL, (h) RPLP and (i) PLCgP. The simulation of 2000 trajectories is performed.

Figure 6. The epidermal growth factor receptor (EGFR) signaling pathway model: the standard deviation of the number of molecules for several species, as a function of time, for the full and reduced model. The reactions ${R}_{6}\mathrm{,}{R}_{19}\mathrm{,}{R}_{20}\mathrm{,}{R}_{21}\mathrm{,}{R}_{22}\mathrm{,}{R}_{34}$ and ${R}_{35}$ are removed from the system. The species shown are: (a) EGFR, (b) R, (c) Ra, (d) R2, (e) RP, (f) PLCg, (g) RPL, (h) RPLP and (i) PLCgP. The computations are over 2000 trajectories.

treatment of non-small cell lung cancer, breast, pancreatic and prostate cancers [26]. Reactions 1, 2, 3 and 4 illustrate how the gemcitabine is carried into cells. Then, in reaction 5, it is phosphorylated by the enzyme (dCK) to become the gemcitabine monophosphate (dFdC-MP). The species dFdC-MP is phosphorylated to its diphosphorylated form (dFdC-DP) in reactions 7 and 8. The triphosphorylated form (dFdC-TP) is created in reactions 9 and 10. Gemcitabine is metabolized to 2,2-difluorodeoxycytidine (dFdU), which can be phosphorylated to its diphosphate (dfdU-DP) and triphosphate (dFdU-TP), these processes are depicted in reactions 11 - 16. Reaction 19 is the DNA incorporation of dFdC-TP, while reaction 21 indicates the synthesis of CDP. Reaction 23 is nucleoside diphosphate kinase. The inhibition of dCK by binding to dCTP is presented in the 25-th reaction.

This is a large model, consisting of 22 species undergoing 29 reactions. The reactions and their rate parameters are listed in Table 4. The initial molecular counts of all species are zero, except for dCK = 1000, RR = 1000, dCMPD = 1000, CDP = 2000 and dFdC-out = 10,000. The model is studied on the time interval $\left[\mathrm{0,20}\right]$.

We begin by performing a local sensitivity analysis of the Gemcitabine model. To approximate its parametric sensitivities, we apply the common reaction number (CRN) algorithm with 1000 trajectories. At this step, we select some reactions which local sensitivity analysis predicts to be possible choices for deletion. These reactions are then considered an initial collection, when solving the optimization problem for this system. The model has many parameters and finding the global minimum in its large-dimensional parameter space is a challenge. Therefore, we use sensitivity analysis to determine a set of reactions deemed critical in driving the behaviour of the system. For this model, we selected 14 important reactions (and set their ${d}_{i}=1$ ) out of a total of 29. Also, we choose the following parameter values for the optimizer: $\mathcal{l}=2$ and $u=23$ and $r\approx 1$. Solving the optimization problem suggests that reactions ${R}_{2}\mathrm{,}{R}_{3}\mathrm{,}{R}_{12}\mathrm{,}{R}_{14}\mathrm{,}{R}_{16}$ and ${R}_{18}$ may be eliminated without a significant impact on the accuracy and stability properties of the model.

To validate the solution of the optimizer, we apply the SSA with 1000 runs to simulate both the reduced (resulted after turning off reactions ${R}_{2}\mathrm{,}{R}_{3}\mathrm{,}{R}_{12}\mathrm{,}{R}_{14}\mathrm{,}{R}_{16}$ and ${R}_{18}$ ) and the original biochemical network. In Figure 7 and Figure 8, we compare the plots of the time-evolution of the the means and standard deviations for the molecular counts predicted by the full and the reduced model. In the reduced network, the following reactions were eliminated: ${R}_{2}\mathrm{,}{R}_{3}\mathrm{,}{R}_{12}\mathrm{,}{R}_{14}\mathrm{,}{R}_{16}$ and ${R}_{18}$. As with the previous models, the proposed reduction technique generates a very accurate simplified model, when compared to the original one. The Gemcitabine model exhibits stiffness and has many reactions and species. Hence, it is quite challenging to simulate and analyze in its original formulation. In conclusion, the proposed simplifying strategy performs very well on this complex and computationally demanding biochemical model.

Figure 7. The gemcitabine biochemical reaction model: the time-evolution of the mean number of molecules for several species, for the full and reduced systems. The following reactions are eliminated according to the proposed reduction scheme: ${R}_{2}\mathrm{,}{R}_{3}\mathrm{,}{R}_{12}\mathrm{,}{R}_{14}\mathrm{,}{R}_{16}$ and ${R}_{18}$. The species shown are: (a) dFdC, (b) dFdU, (c) dFdU-out, (d) dCK, (e) dFdC-MP, (f) dFdC-DP, (g) dFdC-TP, (h) dFdU-MP and (i) RR. The simulation is performed on 1000 trajectories.

Figure 8. The gemcitabine biochemical reaction model: the time-evolution of the standard deviation of the number of molecules of different reacting species, for the full and reduced system. Reactions ${R}_{2}\mathrm{,}{R}_{3}\mathrm{,}{R}_{12}\mathrm{,}{R}_{14}\mathrm{,}{R}_{16}$ and ${R}_{18}$ are turned off in the reduced model. The species shown are: (a) dFdC, (b) dFdU, (c) dFdU-out, (d) dCK, (e) dFdC-MP, (f) dFdC-DP, (g) dFdC-TP, (h) dFdU-MP and (i) RR. The simulations are done on 1000 trajectories.

Table 4. The gemcitabine biochemical reaction model: The reactions and the values of their rate parameters.

5. Conclusions

In the present work, we proposed a numerical procedure for model reduction of homogeneous, discrete stochastic biochemical networks. The dynamics of these networks are governed by the widely used model of the Chemical Master Equation. This model has a diverse range of critical practical applications. Numerous biochemical networks arising in practice are complex, consisting of a large number of species subject to many reaction channels. In addition, the network of interactions of some components of the biochemical system may be quite complicated. Furthermore, biochemical processes present at the cellular level typically entail slow and fast reactions, meaning that their mathematical models are stiff. Stiffness and/or high dimensionality constitute significant challenges for numerical simulation and analysis of these models.

The model reduction procedure developed in this paper utilizes sensitivity analysis of stochastic discrete models of biochemical systems and requires solving a nonlinear optimization problem. The procedure is expected to have an excellent performance on a broad class of biochemical systems, with moderate levels of noise, and various degrees of stiffness. We tested the reduction methodology described above on three realistic biochemical networks, two of these having applications to cancer research. The numerical experiments carried out on these models show that the reduced biochemical network retains the properties of the original version while preserving its overall behaviour.

Acknowledgements

This work was partially supported by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC).

References

[1] McAdams, H. and Arkin, A. (1999) It’s a Noisy Business, Genetic Regulation at the Nanomolar Scale. Trends in Genetics, 15, 65-69.

https://doi.org/10.1016/S0168-9525(98)01659-X

[2] Raser, J.M. and O’Shea, E.K. (2004) Control of Stochasticity in Eukaryotic Gene Expression. Science, 304, 1811-1814.

https://doi.org/10.1126/science.1098641

[3] Gillespie, D.T. (1992) A Rigorous Derivation of the Chemical Master Equation. Physica A: Statistical Mechanics and its Applications, 188, 402-425.

https://doi.org/10.1016/0378-4371(92)90283-V

[4] Gillespie, D.T. (1976) A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions. Journal of Computational Physics, 22, 403-434.

https://doi.org/10.1016/0021-9991(76)90041-3

[5] Gillespie, D.T. (1977) Exact Stochastic Simulation of Coupled Chemical Reactions. The Journal of Physical Chemistry, 81, 2340-2361.

https://doi.org/10.1021/j100540a008

[6] Huang, H., Fairweather, M., Griffiths, J.F., Tomlin, A.S. and Brad, R.B. (2005) A Systematic Lumping Approach for the Reduction of Comprehensive Kinetic Models. Proceedings of the Combustion Institute, 30, 1309-1316.

https://doi.org/10.1016/j.proci.2004.08.001

[7] Li, G. and Rabitz, H. (1989) A General Analysis of Exact Lumping in Chemical Kinetics. Chemical Engineering Science, 44, 1413-1430.

https://doi.org/10.1016/0009-2509(89)85014-6

[8] Petzold, L.R. and Zhu, W.J. (1999) Model Reduction for Chemical Kinetics: An Optimization Approach. AIChE Journal, 45, 869-886.

https://doi.org/10.1002/aic.690450418

[9] Kumar, A., Christofides, P.D. and Daoutidis, P. (1998) Singular Perturbation Modeling of Non-linear Processes with Non-Explicit Time-Scale Separation. Chemical Engineering Science, 53, 1491-1504.

https://doi.org/10.1016/S0009-2509(98)00006-2

[10] Vora, N. and Daoutidis, P. (2001) Nonlinear Model Reduction of Chemical Reaction Systems. AIChE Journal, 47, 2320-2332.

https://doi.org/10.1002/aic.690471016

[11] Varma, A., Morbidelli, M. and Wu, H. (1999) Parametric Sensitivity in Chemical Systems. Cambridge University Press, Cambridge.

https://doi.org/10.1017/CBO9780511721779

[12] Cao, Y., Gillespie, D.T. and Petzold, L. (2005) The Slow-Scale Stochastic Simulation Algorithm. The Journal of Chemical Physics, 122, 014116-014134.

https://doi.org/10.1063/1.1824902

[13] Rao, C.V. and Arkin, A.P. (2003) Stochastic Chemical Kinetics and the Quasi-steady-state Assumption: Application to the Gillespie Algorithm. The Journal of Chemical Physics, 118, 4999-5010.

https://doi.org/10.1063/1.1545446

[14] Samant, A. and Vlachos, D. (2005) Overcoming Stiffness in Stochastic Simulation Stemming from Partial Equilibrium: A Multiscale Monte-Carlo Algorithm. Journal of Chemical Physics, 123, 144114-144122.

https://doi.org/10.1063/1.2046628

[15] Sotiropoulos, V., Contou-Carrere, M.N., Daoutidis, P. and Kaznessis, Y.N. (2009) Model Reduction of Multiscale Chemical Langevin Equations: A Numerical Case Study, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 6, 470-482.

https://doi.org/10.1109/TCBB.2009.23

[16] Ilie, S. and Gholami, S. (2013) Simplifying Stochastic Mathematical Models of Biochemical Systems. Applied Mathematics, 4, 248-256.

https://doi.org/10.4236/am.2013.41A038

[17] Anderson, D.F. (2012) An Efficient Finite Difference Method for Parameter Sensitivities of Continuous Time Markov Chains. SIAM Journal on Numerical Analysis, 50, 2237-2258.

https://doi.org/10.1137/110849079

[18] Rathinam, M., Sheppard, P.W. and Khammash, M. (2010) Efficient Computation of Parameter Sensitivities of Discrete Stochastic Chemical Reaction Networks. The Journal of Chemical Physics, 132, Article No. 034103.

https://doi.org/10.1063/1.3280166

[19] Kurtz, T.G. (1982) Representation and Approximation of Counting Processes. In: Fleming, W.H. and Gorostiza, L.G., Eds., Advances in Filtering and Optimal Stochastic Control, Vol. 42, Springer, Berlin, 177-191.

https://doi.org/10.1007/BFb0004537

[20] Morshed, M., Ingalls, B. and Ilie, S. (2017) An Efficient Finite-Difference Strategy for Sensitivity Analysis of Stochastic Models of Biochemical Systems. BioSystems, 151, 43-52.

https://doi.org/10.1016/j.biosystems.2016.11.006

[21] Morshed, M., Ingalls, B. and Ilie, S. (2018) An Effective Implicit Finite-difference Method for Sensitivity Analysis of Stiff Stochastic Discrete Biochemical Systems. IET Systems Biology, 12, 123-130.

https://doi.org/10.1049/iet-syb.2017.0048

[22] Beal, L.D.R., Hill, D., Martin, R.A. and Hedengren, J.D. (2018) GEKKO Optimization Suite. Processes, 6, Article No. 106.

https://doi.org/10.3390/pr6080106

[23] Janke, T. (2011) On Reduced Models for Chemical Master Equation. Multiscale Modeling & Simulation, 9, 1646-1676.

https://doi.org/10.1137/110821500

[24] Pettigrew, M.F. and Resat, H. (2007) Multinomial Tau-leaping Method for Stochastic Kinetic Simulations. The Journal of Chemical Physics, 126, Article ID: 084101.

https://doi.org/10.1063/1.2432326

[25] Sayyidmousavi, A. and Ilie, S. (2017) An Efficient Hybrid Method for Stochastic Reaction-Diffusion Biochemical Systems with Delay. AIP Advances, 7, Article ID: 125305.

https://doi.org/10.1063/1.5001760

[26] Kahramanogullari, O., Fantaccini, G., Lecca, P., Morpurgo, D. and Priami, C. (2012) Algorithmic Modeling Quantifies the Complementary Contribution of Metabolic Inhibitions to Gemcitabine Efficacy. PLoS ONE, 7, e50176.

https://doi.org/10.1371/journal.pone.0050176

[27] Thanh, V.H. (2019) A Critical Comparison of Rejection-Based Algorithms for Simulation of Large Biochemical Reaction Networks. Bulletin of Mathematical Biology, 81, 3053-3073.

https://doi.org/10.1007/s11538-018-0462-y