This tool allows users to load SBML models and carry out stochastic
simulation by specifying a set number of runs.
The capabilities provided by this application are shown by mean of an example network that has three internal species and two boundary species.
Data Generation
In the current version of the user interface, the user can generate
data for a selected number of runs once the SBML model is loaded.
The model queries the stochastic simulator for data between a
starting time and ending time. There are two ways in which such data
can be generated. The default option is a grid based data generation
method, in which the simulator computes the species numbers on a
uniform time grid. This is required in order to compute ensemble
averages and power spectral densities. The grid interval is
therefore the parameter that controls the resolution of the results.
The second option for data generation is based on species sample
sizes, where the solution at each time step is advanced until one or
more species sizes reach the target sample size. (It can be seen
that setting the sample size to 1 recovers the raw data for a run)
The user is also provided with an option to ignore the transient
data when performing analysis on the generated data. This is
controlled by means of a scale factor that multiplies the minimum
time estimated by taking inverse of the fastest reaction rate. The
true end time thus a sum of the desired end time and the time
computed as needed to reach steady state. This can also be compared
by plotting the deterministic results over the stochastic data.
Further, the user can continue the selected run again, which
increments the current time, and uses the species numbers at the end
of the previous run to initialize the current run, or alternatively,
the simulation can be reset to original initial conditions.
Probability Density
Each data set for a given run and a selected species has a
probability density from which it is realized. The tool computes the
histogram for each run as well for the entire population of runs for
each species in the network. Users can look at the histograms of the
data for each run, as they are being generated, or alternatively,
the population histogram after all the runs have been generated.
Often, structure in the time history of the data can be seen in the
shape of the histogram, and is an indicator of the dynamics
involved. Further, the bin size can be altered, leading to more
informative distributions if there are large number of data points,
as is the case when large numbers of species are involved.
Ensemble Averages
The ensemble averages for all the species participating in the
network are computed upon completion of the data generation. These
can be accessed in the Ensemble Averages tab, and show the ensemble
average in the top plot and the ensemble standard deviation in the
bottom plot. Additional details for the species of interest
(selected in the display options panel) such as population mean,
standard deviation, variance and the coefficient of variation are
displayed in the panel below the plots. The coefficient of variation
is an entity of particular interest to biologists, and can be used
to infer spread in the standard deviation across the population.
Further, it should be noted that, upon reaching steady state, the
values of variance should roughly correspond to the values of the
population mean, reflecting the underlying Possionian assumption the
stochastic simulation.
Auto and Cross Correlations
Any statistical analysis on data should involve the study of
correlations, as they contain information about how various
participants in the network are coupled dynamically. For any given
data run, one can construct auto-correlations of various species as
well as the cross-correlations between them. These are computed for
each run, and then averaged across the population to yield averaged
correlation matrices. for a network with M species, there are in
total M auto-correlation matrices, and M(M-1) cross-correlation
matrices. The correlations are computed from the covariance matrix
by subtracting the ensemble means, and scaling by the ensemble
standard deviations. It can also be noted that the
auto-correlations are symmetric about the origin, while the
cross-correlations do not share this property. The user has the
option to enable or disable plotting of auto-correlations for each
run as they are computed.
Power Spectral Densities
The power spectral densities (PSDs) are a good indicator of the
frequency content of the network. In other words, if a network has
an oscillatory behavior, the PSDs have a peak at the frequency of
oscillation. These are computed by taking the Fast Fourier Transform
(FFT) of the auto or cross-correlation matrix. In general, the PSD
is a complex valued entity, whose absolute magnitude can be obtained
by taking the magnitude of each of its elements. In the case of
auto-correlations, the symmetry involved results in a real valued
power spectral density function, which has no imaginary values. The
cross-spectrum, which is the FFT of the cross-correlation, can on
the other hand, have complex entries, and hence also involves a
phase angle. The tool computes the PSDs by calling the FFT routines
available in the publicly available Exocortex.DSP library. The attached figure shows the power spectral densities for a linear chain model where a boundary node has been converted to a floating node.
The PSDs are a very useful means of finding feedbacks in biochemical
networks. It has been shown that negative feedbacks
offer the advantage of making a network more robust by pushing noise
at low frequencies into the high frequencies, which can filtered in
downstream circuits. This can be verified by testing a small two
gene network that involves two mRNAs and two proteins. If the second
protein has an inhibitory effect on the expression of the first
gene, it can be observed that the Power Spectral Density develops a
peak, indicating that the noise has been shifted to the higher
frequencies. This example can be easily demonstrated using this
application.
It should be pointed out that the user has the option of iteratively
refining the power spectra, as each correlation set is computed from
each data run. This is a useful option when the data generated is
large, and involve a large number of populations. and yields the
broad picture of the power spectral densities with the first few
data runs, with subsequent runs reducing noise at higher
frequencies.
Transfer Functions
The transfer function for a given network can be defined as the gain
between a selected input and output nodes. This is sometimes easy to
compute analytically for small networks, but for larger networks,
one has to take recourse to methods that generate the power spectrum
and thereby compute the transfer function. Transfer functions have
long been studied in biological systems, but only
recently has it possible to study them at the level of gene protein
networks. Much of the design of electrical circuits involves
designing filters that behave according to a specified transfer
function between input and output, and it is thus hoped that this
tool will enable such a design process, especially, with the advent
and growth of Synthetic Biology. The transfer functions for the
presented example are shown in the attached figure. It can be seen
that there is a fair amount of noise at high frequencies. This can
be reduced further by using a larger population size (here only 50
runs have been used), as well as a finer grid size (here 1) to bring
out finer details in the power spectra as well as the transfer
functions.
Injecting Noise
Often events of interest happen at the boundaries of any network,
and hence it is desirable to know the relationship between an
interior node (where the output is measured) and the boundary node
(which can be treated as input). This relationship is given by te
transfer function, and is given a ratio involving the power spectral
densities and cross-spectra at the input and output. However, in the
case of stochastic simulation, the limitations imposed by SBML do
not allow a boundary node to be treated directly as an internal
species. A boundary node thus stays constant, and hence cannot have
a power spectrum. We have therefore developed a simple SBW module
(SBMLModifier) that interfaces to libSBML and converts one or more
boundary nodes in a network to floating type, without affecting the
states of any other nodes in the network. This is done by adding a
fictitious boundary node, along with associated reactions to keep
the converted boundary node at the same level. Additionally, the
user has the option to control the mean level of noise at this
converted node. The attached figure details how the
SBML modification works on a simple example, and shows how the
transfer functions can be computed for this example.
Testing Application
This tool for stochastic simulation and analysis also has a
companion user interface to test the exactness of the stochastic
simulator being used. This is done by making use of the Discrete
Stochastic Model Test Suite (DSMTS) that involves a set of models
that extensively test the capabilities of the simulator.
Documentation for this test suite can be obtained from
http://www.calibayes.ncl.ac.uk/Resources/dsmts/overview.
This test suite has been implemented in a testing application module
called edu.kgi.StochSimTest that is part of the Systems Biology
Workbench. Current implementation is restricted to the Gillespie
simulator bundled with SBW, but future version will allow users to
test the exactness of their own simulators, and thereby identify
problems.