 Research article
 Open Access
Stochastic parameter search for events
 Min K Roh^{1}Email author and
 Philip Eckhoff^{1}
https://doi.org/10.1186/s129180140126y
© Roh and Eckhoff; licensee BioMed Central Ltd. 2014
 Received: 15 April 2014
 Accepted: 22 August 2014
 Published: 8 November 2014
Abstract
Background
With recent increase in affordability and accessibility of highperformance computing (HPC), the use of large stochastic models has become increasingly popular for its ability to accurately mimic the behavior of the represented biochemical system. One important application of such models is to predict parameter configurations that yield an event of scientific significance. Due to the high computational requirements of Monte Carlo simulations and dimensionality of parameter space, brute force search is computationally infeasible for most large models.
Results
We have developed a novel parameter estimation algorithm—Stochastic Parameter Search for Events (SParSE)—that automatically computes parameter configurations for propagating the system to produce an event of interest at a userspecified success rate and error tolerance. Our method is highly automated and parallelizable. In addition, computational complexity does not scale linearly with the number of unknown parameters; all reaction rate parameters are updated concurrently at the end of each iteration in SParSE. We apply SParSE to three systems of increasing complexity: birthdeath, reversible isomerization, and SusceptibleInfectiousRecoveredSusceptible (SIRS) disease transmission. Our results demonstrate that SParSE substantially accelerates computation of the parametric solution hyperplane compared to uniform random search. We also show that the novel heuristic for handling overperturbing parameter sets enables SParSE to compute biasing parameters for a class of rare events that is not amenable to current algorithms that are based on importance sampling.
Conclusions
SParSE provides a novel, efficient, eventoriented parameter estimation method for computing parametric configurations that can be readily applied to any stochastic systems obeying chemical master equation (CME). Its usability and utility do not diminish with large systems as the algorithmic complexity for a given system is independent of the number of unknown reaction rate parameters.
Keywords
 Stochastic simulation
 Parameter estimation
 Rare event
 Optimization
Background
Stochastic modeling of biochemical and ecological systems has become increasingly popular due to its ability to represent system dynamics correctly at a detailed level, especially when species are present at low population. Deterministic models, on the other hand, are easier to analyze, yet they may fail to capture even the average behavior when the represented system exhibits nonlinearity [1] or is near extinction. Recent advancements in cloud computing platforms [2],[3] and GPU computing [4][7] have significantly increased the affordability of computational resources. This enables development and use of stochastic algorithms that would have been deemed computationally infeasible in the past. However, there is still a void in stochastic methods that can answer scientifically interesting questions. One such application is in determining reaction rate configurations that yield an event of interest with a set success probability. Most parameter estimation algorithms in stochastic chemical kinetics setting take timeseries data as an input and compute a set of reaction rate parameters that most closely reproduce the data. Methods used to determine these reaction rate parameters include maximum likelihood ratio [8][10], gradient decent [11], and moment closure [12]. While these algorithms are useful in its own right, scientists are often interested in knowing all parameter combinations that yield a specific event of interest. For gene regulatory models, knowledge of all pathways to achieve a specific event, such as bistable transition of lac operon in E. coli[13][15], may be used to guide laboratory experiments. In epidemiological models, all intervention parameter combinations that achieve eradication can be combined with econometrics in computing the most costeffective strategy for eradicating a disease [16]. To authors’ knowledge, no algorithm has been developed in stochastic chemical kinetics setting that computes such parameter combinations.
In this paper, we present Stochastic Parameter Search for Events (SParSE) that finds a parametric hyperplane of reaction rates conferring a userspecified event with prescribed success rate and error tolerance. Our algorithm is robust in that it accurately computes the solution hyperplane for low probability events as well as high probability events. It is also trivial to parallelize the algorithm; initial parameter sets do not need to communicate with each other to find the direction to the unknown solution hyperplane. Once the algorithm finds a point in the solution hyperplane, the ratio between the initial and final rates can be used as the biasing parameters by the doubly weighted stochastic simulation algorithm (dwSSA) [17] to compute the probability of observing the target event with its success rate under the original system description. This allows calculation of the target event probabilities under the original parameters as a powerful side benefit of the algorithm. Lastly, the SParSE runtime per parameter sample is of the same order as that of the stochastic simulation algorithm (SSA), i.e., the algorithm complexity is independent of the number of unknown parameters for a given system. This is achieved by combining a novel modification of dwSSA [17], Rubinstein’s crossentropy method [18], and exponential interpolation of biasing parameters. This feature provides substantial benefits when searching multidimensional parameter space.
Methods
Doubly weighted stochastic simulation
with and the total number of reactions fired in .
It is straightforward to confirm that the product of (3) and (4) equals (1).
where N is the total number of trajectories, J _{ i } represents the i ^{th} simulated dwSSA trajectory, and takes a value of 1 if is visited by J _{ i } and 0 otherwise. The quantity in (5) can be interpreted as the weighted average of successful trajectories, i.e., trajectories reaching , where the weight is computed according to (4). A good set of biasing parameters would yield successful trajectories with weights close to the true probability and thus reduce variance in the probability estimator. The dwSSA computes low variance biasing parameters by minimizing cross entropy using a modified version of Rubinstein’s multilevel crossentropy method [17],[18]. The advantage of minimizing cross entropy over minimizing variance is that the former yields biasing parameters with a closedform solution for (2) where the latter does not. Having a closedform solution is of practical necessity, as the alternative would be to solve a large set of nonlinear equations, significantly decreasing the efficiency of the algorithm if not making the simulation infeasible.
where n _{ ij } is the total number of times reaction j fires in the i ^{th} trajectory, iterates only over trajectories reaching , and l is the stage index in multilevel crossentropy method. Computation for terminates when intermediate rare event reaches , at which point we set .
where is the desired probability of observing by time t, an indicator function for observing during i ^{th} trajectory, and a userspecified absolute error tolerance on . Unlike the dwSSA where the biasing parameters are updated each level of the multilevel crossentropy method according to (6), in SParSE reaction rates are updated instead. We note that it is possible to use the dwSSA Monte Carlo estimator (5) in (7) and update γ ^{(l)} instead of k ^{(l)}. However unless k ^{(0)} is sufficiently close to k ^{*}, the likelihood ratio (4) may become extremely small, i.e., degenerate, and updating reaction rates avoids this problem. We discuss the criteria for updating k in the following section.
Multilevel crossentropy method
The modification of multilevel crossentropy method in SParSE is similar to that of Ref [17]. However, there are three major differences between the multilevel crossentropy method employed by dwSSA and by SParSE: (i) dwSSA only computes a single intermediate event ξ ^{(l)} and the corresponding set of biasing parameters γ ^{(l)} while SParSE may compute multiple such quantities, (ii) SParSE can calculate biasing parameters for initial reaction rates that either over or underperturb the system with respect to . For an overperturbed system, it applies inverse biasing to reaction rates to convert ε from a “sure event” to a “rarer event”, and (iii) dwSSA updates biasing parameters γ ^{(l)} while SParSE updates reaction rates k ^{(l)}. The following subsection explains the first two differences and highlights how SParSE achieves the same time complexity as dwSSA for computing quantities in (i). The next subsection focuses on (iii) and its effect on simulation details.
Concurrent computation of multiple intermediate events and biasing parameters
In dwSSA, N trajectories in level l of multilevel crossentropy method are run until either the final simulation time t _{ final } or the first occurrence of ξ ^{(l)}, where ξ ^{(l)} is an intermediate rare event chosen by the top ⌈ρ N⌉trajectories that evolve farthest in the direction of ε. Typical value of ρ used in dwSSA is 0.01 for N=10^{5}, although any value ρ Є (0,1) can be used in theory [17]. The role of ρ can be thought as a knob that controls the tradeoff between the speed of convergence to and accuracy in . For ρ ^{′}<ρ, we get , thus smaller values for ρ can potentially drive the system toward faster. However the number of trajectories reaching is less than the number of trajectories reaching ξ ^{(l)} since ⌈ρ ^{′} N⌉<⌈ρ N⌉. Having fewer data to compute reduces the confidence on the estimate, therefore it is advised to keep ⌈ρ N⌉ above a set threshold (e.g., 200) in practice. On the other hand, larger value of ρ (e.g., ρ>0.3) implies a less selective intermediate rare event. The resulting biasing parameters may not push the system closer to , causing a failure in convergence to the target event. In our experience, ρ<0.2 and ⌈ρ N⌉>100 yield both reliable computation of biasing parameters and acceptable convergence to .
where f(x(t)) is an event function. Two requirements for f(x(t)) are that it takes x(t) as an input and can be used to evaluate the distance between the current state and (i.e., it can be used to compute extreme values of each trajectory to determine the next intermediate event). The value of φ _{type} indicates initial position of x(t) with respect to at ε t=0. When φ _{type} is equal to 1, maximum value of f(x(t)) in each trajectory is recorded, and N such values are sorted in descending order at the end of the simulation. The reasoning for this is that since , we need to encourage higher f(x(t)) values to get closer to . Similarly, minimum f(x(t)) values are recorded and sorted in ascending order when φ _{type} is 1. For convenience we refer to the sorted array of extreme f(x(t)) values as v _{ k }, where k is the reaction rates used to generate x(t).
where f _{ i }(x(tk)) are event function evaluated with i ^{ t h } SSA trajectory generated with reaction rates k and takes a value of 1 if f _{ i }(x(tk)) contains and 0 otherwise. Once N trajectories are simulated, we can expect one of the following outcomes: (a) the inequality in (7) is satisfied, (b) , or (c) .
In the first case, SParSE exits and returns k as a successful output, i.e., a point in the solution hyperplane (k≡k ^{*}). In the second case, we need to choose extreme values of f(x(t)) evolving furthest to , and we can view as a “rarer event” as in the dwSSA. Thus intermediate events and its respective biasing parameters are computed iteratively, each time taking the system closer to with success rate . The last case corresponds to parameter sets that “overperturb” the system, as was reached with probability greater than . The method used to determine an intermediate event in the classical multilevel crossentropy method cannot be applied here because we do not want trajectories that produce extreme values of f(x(t)). However, the information gathered from such trajectories can be used to quantify the behavior we do not want to observe. We achieve this by collecting the extreme values of f(x(t)) as in case (b), except that each SSA simulation is run until the final simulation time without stopping when is observed. Once intermediate events are chosen and their corresponding biasing parameters are computed, we update j ^{ t h } reaction rate k _{ j } with 1/γ _{ j }. This inverse biasing discourages overperturbation with respect to . Algorithms 2 and 3 in Appendix B of Additional file 1 contain pseudocode for (b) and (c), respectively.
Unlike the multilevel crossentropy method used by dwSSA, where only one intermediate event is computed in each level of multilevel CE method, SParSE may choose multiple intermediate events. While it is not necessary to compute multiple intermediate events to reach the solution hyperplane, doing so greatly improves algorithm efficiency. The caveat here is that the efficiency gain occurs only when the biasing parameters for the multiple intermediate events are computed simultaneously. We start by describing the method SParSE uses to choose multiple intermediate events, all of which can be reached by N trajectories with sufficient frequency. This is attained by choosing multiple values for ρ that is a function of the distance to the desired target event probability . Denoting the distance as , we have two different methods for choosing ρ(δ): one for case (b) and the other for case (c). Handling of the two cases differ, as the result of inverse biasing is not as obvious as the normal biasing for case (b) is. In normal biasing strategy, updating intermediate reaction rates with a particular set of biasing parameters redistributes v _{ k } such that the corresponding intermediate event becomes the mode. However, the inverse biasing operates on the heuristics of discouraging overperturbation without knowing the exact effect on v _{ k }. Thus more conservative values for ρ are used in (11) to compensate for this difference. Lastly, we note that each of these cases can be detected by comparing the sign of δ to the value of φ _{type}, where the equality represents case (b).
As the distance to the target event decreases, SParSE selects less extreme values for intermediate events and vice versa. This reduces the risk of over and underperturbations. We note that the number of elements in ρ(δ) does not necessarily correspond to the number of intermediate events chosen. For example, elements corresponding to positions ⌈0.005 * N⌉ and ⌈0.01*N⌉ of v _{ k } may be the same. We also note that a custom function can be used to compute ρ to better suit a specific system. However, the above default values work well for all examples presented in this paper. Lastly, N can be chosen as a function of min(ρ) and c, where c is the minimum number of data points desired to reliably compute γ ^{(l)}, i.e., N≥c/ min(ρ).
Once intermediate events are computed, they are sorted in ascending order of its probability, i.e., P r o b(ξ ^{(l,1)}) ≤ … ≤P r o b(ξ ^{(l,q)}), where q is the number of unique intermediate events chosen at level l. We note that this sorting is done automatically if elements of ρ are sorted in ascending order, which (10, 11) are.
Now we describe how biasing parameters for all intermediate events are computed concurrently in a single ensemble of N simulations. In each simulation, we check for ξ ^{(l,q)}. If ξ ^{(l,q)} is observed, the statistics gathered up to the time at which ξ ^{(l,q)} was reached are used to compute γ ^{(l,q)}. Then the trajectory continues its course of simulation, this time checking for ξ ^{(l,q1)} while keeping the cumulative statistics. This process repeats until the smaller of t _{ final } and the time at which ξ ^{(l,1)} is observed (i.e., all intermediate events are observed). When q==1, this method is identical to the one used by dwSSA. Although a single trajectory runtime for q>1 is slightly longer than the runtime for q==1, the additional resources spent on concurrent computation is negligible compared to the savings of (q1).N simulations. We note that this process yields biasing parameter sets that are correlated because γ ^{(l,i)} is computed with a subset of data used to compute γ ^{(l,i+1)}. However, this correlation does not affect the validity or the accuracy of the final output as only one set is selected at each level to update the reaction rates, the process for which we explain in the next section.
Updating intermediate reaction rates
SParSE propagates the system towards the solution hyperplane by iteratively updating reaction rates during the modified multilevel crossentropy method. The update process requires choosing one set of biasing parameters from possibly many sets, where the set size is determined by the number of unique intermediate events. The current intermediate reaction rates are then multiplied elementwise by the chosen set to produce the next intermediate reaction rates. The criterion SParSE adopts is straightforward; at level l it chooses the biasing parameter set that, when multiplied to the current intermediate reaction rates, takes the system closest to while preserving the sign of δ(k ^{(g)}),g=0,…,l.
Without loss of generality, we define k ^{(cur)} as the intermediate reaction rates at an arbitrary level l. In order to update the intermediate reaction rates for the next stage, we evaluate how each candidate biasing parameter set γ ^{(l,γ)} performs with respect to the update criterion. We define k ^{(int,i)} as
where q is the number of unique intermediate events. We recall that sgn(δ(k ^{(cur)}))≠φ _{type} corresponds to the case when , which requires inverse biasing to reduce overperturbation.
Starting with i=1, we compute . If , then the algorithm exits with k ^{*}=k ^{(int,i)}. Otherwise, we traverse through available sets of biasing parameters to find for sgn(δ(k ^{(cur)}))==φ _{type} and for sgn(δ(k ^{(cur)}))≠φ _{type}. Since ξ ^{(l,.)} are sorted in ascending order of its probability, k ^{(int,i)} is expected to produce more extreme f(x(t)) values than k ^{(int,i+1)}. Thus it is not necessary to evaluate all possible . For the case of underperturbation we can stop the evaluation at the first occurrence of k ^{(int,i)} that satisfies the inequality and set k ^{(l+1)}←k ^{(int,i)}. For the case of overperturbation, however, we stop the simulation at the first occurrence of k ^{(int,i)} that violates the inequality and set k ^{(l+1)}←k ^{(int,i1)}.
It is possible that all candidate biasing parameter sets fail to satisfy the update criterion. The failure indicates lies between and . Furthermore, this failure is a direct result of manytoone relationship between and . Trajectories simulated with two different sets of reaction rates k and k ^{′}=k+є are likely to differ from each other, resulting in . However, both v _{ k } and may yield the same intermediate events, since they are determined solely by the value of the sorted array at positions ⌈ρ N⌉. Despite the identical ξ, SParSE estimates computed with k and k ^{′} will differ if the proportion of occurrences of is not the same in the two arrays.
In summary, the modified multilevel crossentropy method for SParSE comprises of 3 steps. First we determine intermediate events for the current reaction rates using the SSA. We then employ dwSSA simulations to compute biasing parameters for each of the intermediate events. Lastly we follow steps described in this section to choose one set of biasing parameters to update reaction rates for the next iteration. This process repeats until either k ^{*} is found or until intermediate reaction rates cannot be updated any more. For computational efficiency, we can combine the first and the last steps by computing at the same time as computing . We discard if the estimate does not satisfy the required inequality or if k ^{(int,i)} is not the best candidate for the next intermediate reaction rates.
Exponential interpolation of biasing parameters
Iteratively updating intermediate reaction rates via the modified multilevel crossentropy method described in the previous section may not find k that satisfies (7). Possible reasons for the failure include poor choice of ρ, insufficient N, and nonexistence of candidate intermediate reaction rates that satisfy the update criterion. The first two aligned can lead to slow convergence to , especially for systems near a deterministic regime or for simulations that demand high accuracy (i.e., small values of . Setting a limit on the maximum number of iterations for the multilevel crossentropy method can detect slow or nonconverging reaction rates, and increasing N and/or modifying ρ will increase the rate of convergence in most aligned. The last phenomenon occurs when no suitable biasing parameters exist to update the reaction rates. Here we have , where and , i =1,…,q. The target probability lies between the two estimates and , and the multilevel crossentropy method is unable to finetune intermediate reaction rates to achieve within the specified error tolerance .
It is reasonable to assume that a linear combination of k ^{(u)} and k ^{(v)} may result in k ^{*}. A more sophisticated method for approximating k ^{*} would be to fit an interpolant through past intermediate reaction rates. By making the following two assumptions, SParSE computes candidate biasing parameters such that when multiplied to k ^{(0)}, they may satisfy (7).
Assumption1
k ^{*} exists such that for or for , jє {1,…,M}.
Assumption 2.
can be computed independently from for j≠h.
where p _{ j } and q _{ j } are constants and are normalized version of the intermediate biasing parameters used to compute past SParSE estimates. Output data used in constructing interpolants are the corresponding SParSE estimates multiplied by N, the total number of simulations. This particular form allows for fast solving of p _{ j } and q _{ j } with a first order polynomial curve fitting method. We first transform the data to logarithmic scale, compute for two coefficients in the first order polynomial, and then retransform the output with exponentiation. The reason for scaling the output data with N is to preserve as many significant digits as possible, as logarithmic yscale is used to compute the polynomial coefficients. While other forms of interpolant may yield more accurate interpolation, (15) allows for fast computation while satisfying Assumption (1), as an exponential function is monotonic.
The number of past intermediate reaction rates available for interpolation varies by factors such as k ^{(0)}, , and N. Although all past estimates can be used for interpolation, confining the number of data to X closest estimates of (e.g. X=5) while having at least one estimate on either side of the target probability is recommended, as the accuracy of interpolation may degrade with estimates that are far from the target probability. Due to the construction of the algorithm, there exists at least one estimate on either side of when the algorithm enters the exponential interpolation stage. However, we note that the total number of past intermediate reaction rates can be as few as 2. Once the values of p _{ j } and q _{ j } in (15) are determined for all Minterpolants, SParSE executes the following steps to further increase the efficiency of simulation.
Step 1: For each jє{1,…,M}, project onto the x axis of the interpolant to compute candidate biasing parameters , where the first element (.) in the superscript is the interpolation iteration index and sє{1,…,7} is the index of the candidates.
Step 2: Compute candidate intermediate reaction rates , where
Step 3: Constrain to satisfy for or for , jє{1,…,M}, if necessary. Reverse the signs in inequalities for sgn(δ(k))≠φ _{type}.
Starting with s=4, we compute . We note that corresponds to the reaction rates computed from projecting the exact target probability to the interpolating function. If executing Step 3 results in duplicate candidates, we eliminate the duplicate set(s) and assign the starting index to s such that .
If confers the target event probability within , SParSE exits with . Otherwise, we compute the next estimate with for , and for . The interpolation stage continues until either k ^{*} is found or the end of candidate reaction rates is reached, at which point an additional interpolation may be executed with updated data. On a rare occasion, k ^{*} lies between two candidate reaction rates without satisfying the error tolerance. This can lead to infinite loop of incrementing and decrementing s by 1 without converging to k ^{*}, but the cycle can easily be detected with a mask vector. SParSE implements one by creating a zero vector whose size is equal to the number of candidate biasing parameter sets. Every time a SParSE estimate is computed with candidate reaction rates at index s, we increment s ^{ t h } position of the mask vector. Once the magnitude of any mask vector position becomes greater than 2, we conclude that k ^{*} lies between two candidate biasing reaction rates. At this point, we have refined an upper and lower bound on k ^{*}, as all candidate biasing parameters computed satisfy the inequality in Assumption 1. Once the bounds are sufficiently small, an alternative to an additional interpolation is to take a weighted average of the two candidate reaction rates, where the weight is the distance between the SParSE estimate and . For the examples presented in the following section, we did not encounter any initial reaction rates that required such treatment.
Results and discussion
We illustrate SParSE performance on the following three examples of increasing complexity: a birthdeath process, a reversible isomerization process and a SusceptibleInfectiousRecoveredSusceptible (SIRS) disease transmission model. The first two examples were chosen to demonstrate the algorithm’s accuracy against the exact solution, which for these examples can be computed using the master equation or the infinitesimal generator matrix [1]. We then progress to a more complex SIRS model, which has no closedform analytical solution. For each system, we analyze the SParSE performance on all possible combinations of and , where and denote a desired probability for event and its error tolerance, respectively. We then compare the result with that from comparable SSA simulations whose reaction rates are selected using uniform random sampling (URS). SParSE also employs URS but only to generate a number of initial reaction rates as a starting point, here set to 30. The number of simulations, N, used to estimate per parameter sample is set to 5 × 10^{4} unless mentioned otherwise. We also test the robustness of SParSE by assessing its performance on a low probability event, and for the birthdeath process, and a high probability event, and for the reversible isomerization process. The number of samples generated for SSA simulations with URS equals the total number of SParSE ensembles computed for a specific simulation scenario, which is the sum of the following quantities: the number of intermediate event computations, the number of estimates computed for each intermediate event, and the number of estimates computed in the exponential interpolation stage. Since the same number of trajectories is used for computing an intermediate event and a SParSE estimate, it is straightforward to compare the two strategies with computational fairness. For each simulation scenario, we provide four metrics on performance: the total number of SParSE estimates needed for all 30 initial parameter samples, the number of initial parameters that did not reach the solution hyperplane within 10 iterations of multilevel crossentropy method or 3 iterations of exponential interpolation, the number of parameter sets that required interpolation in addition to the multilevel crossentropy method, and the number of successful parameter sets generated by SSA simulations using URS for sampling reaction rates. Lastly, we provide movie files of SParSE ensemble simulations for two test scenarios: birthdeath with and and SIRS with and .
All computations were run on a desktop with Intel®; Xeon®; CPU E52680, 2.70 GHz processor with 8 cores and 32 GB of RAM. We utilized Matlab’s Parallel Computing Toolbox™ (PCT) and the Coder™. The PCT™ was used to simulate 8 SParSE ensembles in parallel while the Coder™ was used to convert frequentlyused custom Matlab functions into lowlevel C functions for faster computation.
Birthdeath process
Results of SParSE applied to the birthdeath process
${\mathcal{P}}_{\mathcal{E}}$ 
 SParSE samples  Interpolations  Failures  Successful 

URS  
0.40  0.01  286 (35)  29 [1, 7, 19, 3]  0  3 
0.05  182 (33)  23 [7, 22, 1, 0]  0  12  
0.10  133 (29)  19 [11, 19, 0, 0]  0  11  
0.60  0.01  319 (36)  30 [0, 3, 18, 9]  2  2 
0.05  164 (33)  26 [4, 25, 0, 1]  0  8  
0.10  120 (30)  18 [12, 18, 0, 0]  0  12  
0.80  0.01  240 (51)  28 [2, 17, 10, 1]  0  2 
0.05  137 (43)  15 [15, 15, 0, 0]  0  6  
0.10  108 (37)  1 [29, 1, 0, 0]  0  12 
Results of SParSE applied to the birthdeath process

 SParSE samples  Interpolations  Failures  Successful 

URS  
0.010  0.001  251 (36)  27 [3, 17, 9, 1]  0  3 
Reversible isomerization process
Results of SParSE applied to the reversible isomerization process
${\mathcal{P}}_{\mathcal{E}}$ 
 SParSE samples  Interpolations  Failures  Successful 

URS  
0.40  0.01  269 (50)  26 [4 9 14 3]  0  2 
0.05  181 (48)  18 [12 17 1 0]  0  6  
0.10  148 (44)  13 [17 13 0 0]  0  8  
0.60  0.01  313 (58)  28 [2 6 16 6]  0  4 
0.05  198 (51)  18 [12 14 4 0]  0  8  
0.10  156 (47)  13 [17 13 0 0]  0  11  
0.80  0.01  290 (75)  27 [3 17 5 5]  0  1 
0.05  201 (67)  12 [18 11 1 0]  0  6  
0.10  165 (61)  4 [26 4 0 0]  0  18 
Results of SParSE applied to the reversible isomerization process

 SParSE samples  Interpolations  Failures  Successful 

URS  
0.95  0.005  302 (109)  10 [20, 3, 4, 3]  1  2 
Results of SParSE applied to the SIRS model

 SParSE samples  Interpolations  Failures  Successful 

URS  
0.4  0.01  467 (162)  26 [4, 15, 8, 3]  0  5 
0.05  282 (149)  8 [22, 6, 1, 1]  0  6  
0.10  246 (142)  4 [26, 4, 0, 0]  0  14  
0.6  0.01  318 (63)  28 [ 2, 8, 18, 2]  0  3 
0.05  206 (59)  20 [10, 19, 0, 1]  0  8  
0.10  166 (57)  10 [20, 9, 0, 1]  0  8  
0.8  0.01  328 (113)  8 [22, 4, 3, 1]  0  4 
0.05  224 (90)  1 [29, 0, 0, 1]  0  26  
0.10  177 (73)  0 [30, 0, 0, 0]  0  41 
Simple SIRS disease dynamics
As with the previous examples, we tested SParSE on all possible combinations of and and measured the same quantities as in Table 1. Table 5 summarizes the results. We see that SParSE achieved 100% success rate for all scenarios tested. However, statistics on column 1 demonstrates that the total number of estimates computed for any SIRS scenario is greater than the one for the first two examples with the same target probability and error tolerance. SIRS ensembles required up to 198 more samples, except for and , which required one fewer sample than the birthdeath process. If we ignore the intermediate event computations, the number of samples required by all three examples are comparable to each other (mean difference of 17.7 samples). In addition, quantities in column 4 of Tables 1, 3 and 5 indicate that SParSE required fewer interpolations on the SIRS model than it did on the other two examples. These results imply that the multilevel crossentropy method applied to the SIRS model made conservative moves to reach the solution hyperplane; the algorithm required many intermediate event computations to approach the vicinity of but fewer finetuning steps (i.e., exponential interpolations). Although the same ρ(δ) values were used for all three examples, we see that its effect differs depending on the underlying system.
Two expected trends emerge from Table 5; the total number of SParSE samples and the total number of exponential interpolations required to reach the solution hyperplane decrease with increasing . Although numbers in columns 3 and 4 differ among Tables 1, 3, and 5, qualitative algorithmic behavior as a function of remain the same for all three examples. As for its performance, SParSE outperformed SSAURS (by a factor of 1.15 to 10) on all scenarios except one. For and , SSA with URS yielded 41 successful sets, while SParSE yielded 30. We note that the maximum number of successful sets for SParSE cannot exceed the number of initial parameter sets, which is 30 for all examples presented in this paper. Also, the parameter ranges we chose for the SIRS model result in an uneven distribution of the target probability. From Figure 7, we see that a significant portion of the probability volume belongs to high (>0.8) or low (<0.2) probability region. Since the SSAURS success probability is determined solely by the ratio between the volume of the solution hyperplane and the total volume defined by the specified parameter ranges, this particular scenario is biased to be more favorable toward SSAURS. For general applications involving a target event, however, we cannot expect the solution hyperplane to lie within the userspecified parameter ranges, to which SSAURS samples are confined. If this region does not contain the solution hyperplane, SSAURS is unable to produce k ^{*} regardless of the number of samples generated. The current implementation of SParSE, on the other hand, is highly likely to find the closest point (crossentropy metric) in the solution hyperplane through multilevel crossentropy method and exponential interpolation stages, both of which are not limited by the userspecified parameter ranges. In practical situations, it is likely that the user does not have enough systematic insight to identify a region that contains the solution hyperplane for a particular target event. We expect SParSE to be more efficient than SSAURS by orders of magnitude in such aligned, as the performance of SParSE is much less sensitive to the dimensionality of the search space and the volume within of the solution hyperplane than the performance of SSAURS is.
Conclusions
In this paper, we presented SParSE–a novel stochastic parameter estimation algorithm for events. SParSE contains two main research contributions. First, it presents a novel modification of the multilevel crossentropy method that (1) concurrently computes multiple intermediate events as well as their corresponding biasing parameters, and (2) handles overperturbing initial reaction rates as well as underperturbing ones. Second, it uses information from past simulations to automatically find a path to the parametric hyperplane corresponding to the target event with userspecified probability and absolute error tolerance.
By introducing a novel heuristic for handling reaction rates that overperturb the system, SParSE can handle target events whose probability does not need to be rare with respect to the initial reaction rates k ^{(0)}. If the user wishes to compute the probability of observing with respect to k ^{(0)}, however, it can be done by simply running the dwSSA with biasing parameters that are the ratio between the final reaction rates k ^{*} from SParSE and k ^{(0)}. No additional multilevel crossentropy simulations are required by the dwSSA to determine biasing parameters since the final set of reaction rates computed by SParSE contains this information. For this reason, SParSE improves upon the dwSSA in that it can handle an additional type of rare event. The only class of rare events whose probability dwSSA can estimate is the one that is seldom reached by the system using the original reaction rates. SParSE, on the other hand, can also compute the probability of events that are reached too often with respect to the target probability using the original reaction rates. Average frequency of observing such target event with k ^{(0)} would be much higher than the desired frequency (i.e., ), and therefore the probability of observing with success rate and reaction rates k ^{(0)} would be very small, yet its biasing parameters are uncomputable with the dwSSA, but are computable with SParSE.
It is important to note that the computational complexity of SParSE is independent of the number of parameters to be estimated. Like the dwSSA [17], SParSE utilizes informationtheoretic concept of crossentropy to concurrently compute biasing parameters for all reactions in the system. Moreover, SParSE avoids serial computation of biasing parameters for multiple intermediate events at any given stage of multilevel crossentropy method by introducing a clever ordering of intermediate events and data management. Figures 4, 5 and 8 illustrate that SParSE not only is more efficient than SSAURS in finding k ^{*} but also gives a better resolution of the area near the solution hyperplane. This is because intermediate reaction rates computed by SParSE are guaranteed to be closer to k ^{*} than k ^{(0)} is. Thus intermediate reaction rates near k ^{*} can be used to improve the quality of interpolation in constructing the solution hyperplane. Another computational asset of SParSE is that it is highly parallelizable. In large scale application, multiple sets of initial reaction rates can be dispatched separately since each set finds its way to the solution hyperplane independently from each other. In smaller scale, SParSE estimate computation or an ensemble of multilevel crossentropy method simulations also can be parallelized. In simulating examples presented in this paper, we have chosen the latter method; each set of N simulations was distributed among 8 cores using the Parallel Computing Toolbox™ in Matlab. Lastly, a single SParSE trajectory from the multilevel crossentropy method without any biasing (i.e., ) generates the same number of uniform random numbers as the SSA does. The only difference is that SParSE requires additional data management for recording biasing parameter information (two floating point numbers for each reaction [17]), which is used in the next round of multilevel crossentropy method. It is difficult to compare the exact computational cost between the two methods when SParSE utilizes ; depending on the amount of bias applied per reaction, the number of random numbers generated per trajectory will differ between the two methods even if the same reaction rates were used. For the exponential interpolation stage in SParSE, SSA is used to compute , thus the computational cost of SParSE and SSA trajectory are identical for a given set of reaction rates.
One of the inputs required by SParSE is a range of values each parameter can take. There is no theoretical limit on the parameter range SParSE can manage; however, it is required for the following practical reasons. First, the volume of the solution hyperplane could be infinite if we do not confine parameter ranges. For the reversible isomerization process presented in the previous section, all solution hyperplanes from the 9 standard test scenarios are defined by the ratio between the two reaction rate parameters; infinitely many pairs exist that keep this ratio conserved. In addition, a range is required to sample initial reaction rates. If a user wishes to use a distribution other than the uniform distribution to generate initial reaction rates, different statistics (mean, standard deviation, etc.) may be needed.
We remind our readers that although parameter ranges are used to constrain the position of initial reaction rates, the same ranges are not enforced on the final reaction rates on the solution hyperplane. The main reason for this is that there is no guarantee the solution hyperplane intersects with the volume defined by the userspecified parameter ranges. By not limiting the final reaction rates to reside within the userspecified region, SParSE is able to find a set of reaction rates that lie on the solution hyperplane that are close to the userspecified parameter ranges. For example, in Figure 3, white dashed lines represent parameter ranges specified prior to the simulation. We see that 3 of 30 initial sets reached the solution hyperplane but are outside this region. We also see that some intermediate reaction rates (white squares) escape the region but return to it by the time k ^{*} (red square) is found. For most practical applications, we know neither the curvature of the solution hyperplane nor the existence of it within the prescribed parameter ranges. The parameter ranges for all examples in this paper were chosen such that all possible values in (0 1) are captured while the volume of a solution hyperplane for any particular is welldefined within this region. Therefore we expect the computational gain from employing SParSE over SSAURS to be much higher for an arbitrary problem where the user is unable to provide informative parameter ranges for the target event of interest and its desired probability.
Future work will focus on two main areas whose improvement will substantially benefit the algorithm. First, the multilevel crossentropy method for SParSE can improve from employing an adaptive ρ(δ) function, whose values for determining intermediate events would change as the simulation progresses. While SParSE proved to be computationally efficient for all three examples presented in this paper, their results demonstrated that the same ρ(δ) function can produce qualitatively different behavior on how the system approaches the solution hyperplane. We can use past values of ρ(δ) and its effect on to estimate the speed of convergence toward the solution hyperplane. This can potentially reduce the number of multilevel crossentropy method iterations, where reduction of each iteration saves 2×N simulations. The second area of future research will be on efficient sampling of initial reaction rates. Once SParSE finishes simulating first sets of k ^{(0)}, positions of resulting k ^{*} may be far away from each other and thus insufficient to construct an accurate picture of the solution hyperplane. Instead of randomly sampling the next set of initial reaction rates, we can utilize information from the prior ensemble of SParSE simulations to improve the positioning of the next set of k ^{(0)}. For example, we can construct a rough interpolation (e.g., linear interpolation) of the solution hyperplane using k ^{*}s from the first ensemble, and sample the next set from the estimated solution hyperplane, which could be constrained by the userspecified parameter ranges if necessary. A more sophisticated method would be required for highdimensional systems or for target events with discontinuity in the solution hyperplane.
authors’ contributions
MR conceived of the method, coded the algorithm to carry out numerical experiments, and prepared figures and tables. PE participated in the design of the numerical experiments and revising the manuscript. Both authors read and approved the final manuscript.
Additional files
Declarations
Acknowledgements
The authors would like to thank Bill and Melinda Gates for their active support of this work and their sponsorship through the Global Good Fund. Writing assistance from Christopher Lorton and productive discussions with colleagues at the Institute for Disease Modeling are likewise greatly appreciated.
Authors’ Affiliations
References
 van Kampen NG: Stochastic Processes in Physics and Chemistry. 3rd: Elsevier; 2007. [], [http://store.elsevier.com/StochasticProcessesinPhysicsandChemistry/NGVanKampen/isbn9780080475363]
 Aldinucci M, Torquati M, Spampinato C, Drocco M, Misale C, Calcagno C, Coppo M: Parallel stochastic systems biology in the cloud. Brief Bioinform. 2013, 15 (5): 798813. 10.1093/bib/bbt040.View ArticlePubMedGoogle Scholar
 Bunch C, Chohan N, Krintz C, Shams K: Neptune: a domain specific language for deploying hpc software on cloud platforms. In Proceedings of the 2nd International Workshop on Scientific Cloud Computing. New York: ACM; 2011:5968.Google Scholar
 Klingbeil G, Erban R, Giles M, Maini PK: Fat versus thin threading approach on gpus: Application to stochastic simulation of chemical reactions. IEEE Trans Parallel Distr Syst. 2012, 23 (2): 280287. 10.1109/TPDS.2011.157.View ArticleGoogle Scholar
 Dematté L, Prandi D: Gpu computing for systems biology. Brief Bioinform. 2010, 11 (3): 323333. 10.1093/bib/bbq006.View ArticlePubMedGoogle Scholar
 Li H, Petzold LR: Efficient parallelization of the stochastic simulation algorithm for chemically reacting systems on the graphics processing unit. Int J High Perform Comput Appl. 2010, 24 (2): 107116. 10.1177/1094342009106066.View ArticleGoogle Scholar
 Komarov I, D’Souza RM: Accelerating the gillespie exact stochastic simulation algorithm using hybrid parallel execution on graphics processing units. PLoS ONE. 2012, 7 (11): 4669310.1371/journal.pone.0046693.View ArticleGoogle Scholar
 Daigle B, Roh M, Petzold L, Niemi J: Accelerated maximum likelihood parameter estimation for stochastic biochemical systems. BMC Bioinformatics. 2012, 13 (1): 6810.1186/147121051368.PubMed CentralView ArticlePubMedGoogle Scholar
 Poovathingal S, Gunawan R: Global parameter estimation methods for stochastic biochemical systems. BMC Bioinformatics. 2010, 11: 41410.1186/1471210511414.PubMed CentralView ArticlePubMedGoogle Scholar
 Horvath A, Manini D: Parameter estimation of kinetic rates in stochastic reaction networks by the em method. BMEI. 2008, 1 (1): 713717.Google Scholar
 Wang Y, Christley S, Mjolsness E, Xie X: Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent. BMC Syst Biol. 2010, 4: 9910.1186/17520509499.PubMed CentralView ArticlePubMedGoogle Scholar
 Hasenauer J, Wolf V, Kazeroonian A, Theis FJ: Method of conditional moments (mcm) for the chemical master equation: a unified framework for the method of moments and hybrid stochasticdeterministic models. J Math Biol. 2013, 69 (3): 687735. 10.1007/s0028501307115. doi:1010070028501307115,View ArticlePubMedGoogle Scholar
 Yildirim N, Mackey MC: Feedback regulation in the lactose operon: a mathematical modeling study and comparison with experimental data. Biophys J. 2003, 84 (5): 28412851. 10.1016/S00063495(03)700137.PubMed CentralView ArticlePubMedGoogle Scholar
 Griffith JS: Mathematics of cellular control processes ii. positive feedback to one gene. J Theor Biol. 1968, 20 (2): 209216. 10.1016/00225193(68)901902.View ArticlePubMedGoogle Scholar
 Vilar JMG, Guet CC, Leibler S: Modeling network dynamics: the lac operon, a case study. J Cell Biol. 2003, 161 (3): 471476. 10.1083/jcb.200301125.PubMed CentralView ArticlePubMedGoogle Scholar
 Klein DJ, Baym M, Eckhoff P: The separatrix algorithm for synthesis and analysis of stochastic simulations with applications in disease modeling. PLoS ONE. 2014, 9 (7): 10346710.1371/journal.pone.0103467.View ArticleGoogle Scholar
 Bernie J., Daigle J, Roh MK, Gillespie DT, Petzold LR: Automated estimation of rare event probabilities in biochemical systems. J Chem Phys. 2011, 134 (4): 04411010.1063/1.3522769.View ArticleGoogle Scholar
 Rubinstein RY: Optimization of computer simulation models with rare events. Eur J Oper Res. 1997, 99 (1): 89112. 10.1016/S03772217(96)003852.View ArticleGoogle Scholar
 Gillespie DT: Exact stochastic simulation of coupled chemical reactions. J Phys Chem. 1977, 81 (25): 23402361. 10.1021/j100540a008.View ArticleGoogle Scholar
 Gillespie DT, Roh M, Petzold LR: Refining the weighted stochastic simulation algorithm. J Chem Phys. 2009, 130 (17): 17410310.1063/1.3116791.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.