Volume 7 Supplement 6

## Selected articles from the 24th International Conference on Genome Informatics (GIW2013)

- Research
- Open Access

# An approach for dynamical network reconstruction of simple network motifs

- Masahiko Nakatsui
^{1}, - Michihiro Araki
^{2}Email author and - Akihiko Kondo
^{1}Email author

**7 (Suppl 6)**:S4

https://doi.org/10.1186/1752-0509-7-S6-S4

© Nakatsui et al.; licensee BioMed Central Ltd. 2013

**Published:**13 December 2013

## Abstract

### Background

One of the most important projects in the post-genome-era is the systemic identification of biological network. The almost of studies for network identification focused on the improvement of computational efficiency in large-scale network inference of complex system with cyclic relations and few attempted have been done for answering practical problem occurred in real biological systems. In this study, we focused to evaluate inferring performance of our previously proposed method for inferring biological network on simple network motifs.

### Results

We evaluated the network inferring accuracy and efficiency of our previously proposed network inferring algorithm, by using 6 kinds of repeated appearance of highly significant network motifs in the regulatory network of *E. coli* proposed by Shen-Orr *et al* and Herrgård *et al*, and 2 kinds of network motif in *S. cerevisiae* proposed by Lee *et. al*. As a result, our method could reconstruct about 40% of interactions in network motif from time-series data set. Moreover the introduction of time-series data of one-factor disrupted model could remarkably improved the performance of network inference.

### Conclusions

The results of network inference examination of E. coli network motif shows that our network inferring algorithm was able to apply to typical topology of biological network. A continuous examination of inferring well established network motif in biology would strengthen the applicability of our algorithm to the realistic biological network.

## Keywords

- Gene Regulatory Network
- Biological Network
- Synthetic Process
- Network Motif
- Network Inference

## Background

The investigation of network dynamics in biology is a major issue in systems and synthetic biology. Recent advances in high-throughput technologies for comprehensive observation of cells produce a lot of data for analyzing dynamics of complex system such as gene regulatory networks and metabolic pathways. Time-series with dynamic behavior are one of such data involving enormous amount of information regarding the regulation of biological network *in vivo*. However, as such information is entirely implicit, it requires the development of adequate analytic and computational methods to reconstruct biological systems. The key in developing such computational methods is to build a reliable mathematical model for analyzing biological networks, and to explore parameter values in the model within vast searching space. Tominaga *et al*. and Maki *et al*. have developed a novel method [1, 2] inferring conceptual biological networks by the combination of a dynamical network model called S-system [3] with a traditional parameter estimation based on simple genetic algorithms [4, 5]. The S-system is based on an ordinary differential equation, in which the temporal (time-dependent) dynamic process of system components are characterized by power-law formalism. The S-system is suitable for conceptual modeling and describing complex systems with a loop or a cyclic interaction because the dynamic behavior of the network can be easily obtained by numerical integration and customized [6]. The values of interrelated coefficients in the formalism are directly or indirectly related to the regulation mechanism in the network model. The inferred network structure from the inference of parameters provides one of the best candidates for the biological network structure. However, S-system requires a large number of parameters that must be estimated to identify dynamical biological networks; the number of estimated parameters is 2*n*(*n* + 1) (where *n* is the number of system components).

We previously proposed efficient procedures for inferring biological network based on experimentally observed time-series data of mRNA or metabolites [7–10] using S-system and real-coded genetic algorithms (RCGAs) [11] with a combination of uni-modal normal distribution crossover(UNDX) [12] and minimal generation gap(MGG) [13]. Other groups have also developed several methods to optimize parameters using S-system [14–19], Beside of S-system modeling, a lot of network reconstruction algorithms from time-series have been developed [20–27]. However, most of the works focused on the improvement of computational efficiency in large-scale network inference of complex systems with cyclic relations and few attempts have been done for answering practical problems occurred in real biological systems. Herrgård *et. al*., Shen-Orr *et. al*., and Lee *et. al*. proposed that the gene regulatory network in *Escherichia coli* or *Saccharomyces cerevisiae* identified by experimental studies is composed of the limited number of network motif; each motif has simple form of relationships between transcription factors and genes [28, 29]. Little attention has been paid to evaluate the performance of network inference for such simple network motifs with dynamical modeling, S-system. In this paper, in order to evaluate the inferring performance of our previously proposed network inferring algorithms, we applied our algorithm to 8 kinds of simple form of network motifs proposed by Shen-Orr *et. al*. [29], Herrgård *et. al*. [28], and Lee *et. al*. [30] Shen-Orr *et. al*. and Herrgård *et. al*. suggested repeated appearances of highly significant motifs. Lee *et. al* suggested network motifs based on genome-wide location data.

## Results and discussion

### Results of network identification

However, the lower values of precision were often observed in our previous works applied to other types of networks, so that we have already developed a method to remove the false-positive interactions inferred by parallel computing [7, 8]. Even though we can apply our previously proposed method to improve the precision values, our aim here is to see how both precision and recall values can be improved by altering the information content of time-series data.

*i*(

*α*

_{ i }) set to 0.0. We inferred 8 network candidates from 5 time-series data including wild-type (see Figure 3(E)) and one-factor disrupted strain. The comparison between single and 5 time-series in inferring accuracy and efficiency is shown in Figure 7. The result shows that the performance is remarkably improved compared with the case in single time-series. We applied the same data to other motifs (data not shown) and found that the introduction of time-series data using one-factor disrupted model can improve the performance of our algorithm.

## Conclusions

We applied our previously proposed algorithm to the network motifs proposed by Herrgård and Shen-Orr. As a result, the efficiency (recall) of our method exhibited relatively high in most of network motifs. In particular, in the Regulatory Interactions (RI) model, we reconstructed about 68% of interactions in the model. Interestingly, the performance of network inference for complex regulatory network including cyclic interactions (AR and ML) was better than that for simple network analyzed in this study. It is likely that the abundant information related to dynamic behavior contained in time-series data for complex regulatory network constrains the degree of freedom S-system modeling, for this reason, the false-positive or false-negative interactions for complex network are reduced.

In order to examine how to improve both the accuracy and efficiency, we attempted to infer the network candidates based on 5 time-series data including time-series for one-factor disrupted model. In this situation, the performance of inferring accuracy and efficiency remarkably increased. This result suggests that the inferring performance can be improved by adding other kinds of time-series data.

Note that the present performance is examined by a set of data generated from arbitrary given parameter values. We should test the performance of our method for various structures of networks with different parameters as well as for observed data. From practical point of view, there have been various kinds of data accumulated under different experimental conditions. The differential information content of such data is expected to further improve the performance of our method. A continuous examination of inferring well-established network motifs in biology would strengthen the applicability of our algorithm to the realistic biological network including gene regulatory networks or metabolic pathways.

## Methods

### Material

In order to evaluate the applicability of our inferring algorithm, we prepared 8 kinds of artificial network models, Regulatory Interaction (RI), Regulator Module (RM), Target Module (TM), Feed-Forward (FF), Single Input Module (SIM), Dense Overlapping Regulation (DOR), Autoregulation (AR), and Multicomponent-Loop (ML). Each network model contains a significant network motif in the regulatory network of *Escherichia coli* proposed by Shen-Orr *et. al*. and Herrgård *et al* [28, 29], and that of *Saccharomyces cerevisiae* proposed by Lee *et. al* [30]. We modified the 8 network motifs to network models consisting of 4 nodes (*X*_{1}, *X*_{2}, *X*_{3}, and *X*_{4}) without a loss of each network topology. Figure 1 shows each network structure analyzed in this paper.

Subsequently, we prepared artificial time-series data containing 40 sampling points for each network motif by the numerical integration [6]. The reference time-series data of 8 network models are shown in Figure 2.

### S-system formalism

where *n* is the number of system components (genes or metabolites) in the investigating network, *X*_{
i
} is the experimentally observed response (gene expression level for gene expression network, or concentration of metabolites for metabolic pathway's investigation), *α*_{
i
} and *β*_{
i
} are apparent positive rate constant, and *g*_{
ij
} and *h*_{
ij
} are interrelated coefficients between *X*_{
i
}s.

The first term on the right-hand side of Eq. (1) corresponds to the synthetic process of *X*_{
i
}, and the second term expresses the degradation process of *X*_{
i
}. The value of *g*_{
ij
}(*h*_{
ij
}) express the interactive effects of *X*_{
j
} to the synthetic process (degradation process) of *X*_{
i
}. The value of *g*_{
ij
}(*h*_{
ij
}) also determine the structure of the interactions between *X*_{
i
} and *X*_{
j
}. When the value of *g*_{
ij
}(*h*_{
ij
}) is positive, *X*_{
j
} induces a synthetic process (degradation process) of *X*_{
i
}. On the other hand, when *g*_{
ij
}(*h*_{
ij
}) is negative, *X*_{
j
} suppresses the synthetic process (degradation process) of *X*_{
i
}. When the value of *g*_{
ij
}(*h*_{
ij
}) is zero, then there are no effects of *X*_{
j
} on the synthetic (degradation) process of *X*_{
j
}.

The biological network can be inferred by estimating *α*_{
i
}, *β*_{
i
}, and *h*_{
ij
} in the S-system formula. A representation of S-system parameters to be estimated is shown in Figure 1.

### Real-coded genetic algorithms

The S-system is a formalism of ordinary non-linear differential equations, and thus the system can easily be solved numerically by using numerical integration algorithm customized specifically for this formalism [6]. However, when an adequate time-course of relevant state variable is given, a set of parameter values *α*_{
i
}, *β*_{
i
}, *g*_{
ij
}, and *h*_{
ij
}, in many cases, will not be uniquely determined, because it is highly possible that the other set of parameter values will also show a similar time-course. Therefore, even if one set of parameter values that could explain the observed time-course is obtained, this set is still one of the best candidates that explain the observed time-courses. Our strategy is to explore and exploit these candidates within the immense huge searching space of parameter values.

*t*of state variable

*X*

_{ i }in the

*d*-th data-set, and ${X}_{d,i,t}^{\mathsf{\text{EXP}}}$ represents the experimentally observed time-course at time

*t*of

*X*

_{ i }in the

*d*-th data-set. Sum up the square values of relative error between ${X}_{d,i,t}^{\mathsf{\text{CAL}}}$ and ${X}_{d,i,t}^{\mathsf{\text{EXP}}}$ to get the total relative error

*E*;

where *D* is the total number of data-sets that experimentally observed under the different kind of experimental conditions such as disruption of genes or inhibition of kinase activities, *N* is the total number of experimentally observable state variables and *T* is the total number of sampling points over time in one experimental conditions. The computational task is to find out a set of parameter values that minimizes the objective function *E*. We have developed the efficient computational technique based on real-coded genetic algorithms (RCGAs) as a nonlinear numerical optimization method which is much less likely to be stranded in local minima. This technique is based on the combination of the operator called *uni-modal normal distribution crossover* (UNDX) [12] with the alternation of generation model called *minimal generation gap* (MGG) model [13]. Furthermore, in order to find the skeletal structure (small-size system) of the S-system formalism that explain the experimentally observed response, some of the parameters (*g*_{
ij
} and *h*_{
ij
}), absolute values of which are less than a given threshold value are to be removed (reset to zero) during optimization procedures.

### Evaluation of identified network

_{ i }is the number of true-positive interactions in

*i*-th network candidate, FP

_{ i }is the number of false-positive interactions in

*i*-th network candidate, and

*n*is the number of inferred network candidates. The value of precision shows the inferring accuracy of biological network candidates. We also used recall, which indicates the inferring efficiency of network candidates as follows:

where FN_{
i
} is the number of false-negative interactions in *i*-th network candidates. Both precision and recall values are defined between 0.0 to 1.0, and the best value of precision and recall are 1.0.

## Declarations

### Acknowledgements

This work was partly supported by the commission for Development of Artificial Gene Synthesis Technology for Creating Innovative Biomaterial from the Ministry of Economy, Trade and Industry (METI), Japan. This work was also partly supported by JSPS KAKENHI Grant Number 23700358, Grait-in-Aid for Young Scientists (B), "Development of high accurate method for parameter estimation by the combination of numerical optimization and symbolic computation".

**Declarations**

Publication of this supplement was supported by the commision for Development of Artificial Gene Synthesis Technology for Creating Innovative Biomaterial from the Ministry of Economy, Trade and Industry (METI), Japan.

This article has been published as part of *BMC Systems Biology* Volume 7 Supplement 6, 2013: Selected articles from the 24th International Conference on Genome Informatics (GIW2013). The full contents of the supplement are available online at http://www.biomedcentral.com/bmcsystbiol/supplements/7/S6.

## Authors’ Affiliations

## References

- Tominaga D, Koga N, Okamoto N: Efficient numerical optimization algorithm based on genetic algorithm for inverse problem. Proceedings of the Genetic and Evolutionary Computation Conference. 2000, 251-258.Google Scholar
- Maki Y, Tominaga D, Okamoto M, Watanabe S, Eguchi Y: Development Of A System For The Inference Of Large Scale Genetic Networks. 2001Google Scholar
- Savageau AM: Biochemical Systems Analysis: A study of function and design in molecular biology. 1976, Addison-Wesley, ReadingGoogle Scholar
- Holland JH: Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. 1992, Cambridge, MA, USA: MIT PressGoogle Scholar
- Goldberg DE: Genetic Algorithms in Search, Optimization and Machine Learning. 1989, Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.,, 1Google Scholar
- Irvine H D, Savageau MA: Efficient solution of nonlinear ordinary differential equations expressed in S-system canonical form. SIAM Journal on Numerical Analysis. 1990, 27 (3): 704-735. 10.1137/0727042.View ArticleGoogle Scholar
- Nakatsui M, Ueda T, Maki Y, Ono I, Okamoto M: Method for inferring and extracting reliable genetic interactions from time-series profile of gene expression. Mathematical Biosciences. 2008, 215: 105-114. 10.1016/j.mbs.2008.06.007. [http://www.sciencedirect.com/science/article/pii/S0025556408000953] 10.1016/j.mbs.2008.06.007View ArticlePubMedGoogle Scholar
- Nakatsui M, Ueda T, Maki Y, Ono I, Okamoto M: Efficient inferring method of genetic interactions based on time-series of gene expression profile. Proceedings of 13th International Symposium on Artificial Life and Robotics. 2008, 71-76.Google Scholar
- Shikata N, Maki Y, Nakatsui M, Mori M, Noguchi Y, Yoshida S, Takahashi M, Kondo N, Okamoto M: Determining important regulatory relations of amino acids from dynamic network analysis of plasma amino acids. Amino Acids. 2010, [http://dx.doi.org/10.1007/s00726-008-0226-3]Google Scholar
- Komori A, Maki Y, Nakatsui M, Ono I, Okamoto M: Efficient Numerical Optimization Algorithm Based on New Real-Coded Genetic Algorithm, AREX + JGG, and Application to the Inverse Problem in Systems Biology. Applied Mathematics. 2012, 3: 1463-1470. 10.4236/am.2012.330205.View ArticleGoogle Scholar
- Janikow CZ, Michalewicz Z: An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms. Proc of the 4th International Conference on Genetic Algorithms. Edited by: Belew RK, Booker LB. 1991, Morgan Kaufmann, 151-157.Google Scholar
- Ono I, Sato H: A Real-Coded Genetic Algorithm for Function Optimization Using Unimodal Distribution Crossover. Proceedings of the 7th ICGA. 1997, 249-253.Google Scholar
- Sato H, Ono I, Kobayashi S: A New Generation Alternation Model of Genetic Algorithms and Its Assessment. Journal of Japanese Society for Artificial Intelligence. 1997, 734-744.Google Scholar
- Voit EO, Almeida J: Decoupling dynamical systems for pathway identification from metabolic profiles. Bioinformatics. 2004, 20 (11): 1670-1681. 10.1093/bioinformatics/bth140. [http://dx.doi.org/10.1093/bioinformatics/bth140] 10.1093/bioinformatics/bth140View ArticlePubMedGoogle Scholar
- Tucker W, Kutalik Z, Moulton V: Estimating parameters for generalized mass action models using constraint propagation. Mathematical Biosciences. 2007, 208 (2): 607-620. 10.1016/j.mbs.2006.11.009. [http://dx.doi.org/10.1016/j.mbs.2006.11.009] 10.1016/j.mbs.2006.11.009View ArticlePubMedGoogle Scholar
- Gonzalez OR, Küper C, Jung K, Naval PC, Mendoza E: Parameter estimation using Simulated Annealing for S-system models of biochemical networks. Bioinformatics. 2007, 23 (4): 480-486. 10.1093/bioinformatics/btl522. [http://bioinformatics.oxfordjournals.org/content/23/4/480.abstract] 10.1093/bioinformatics/btl522View ArticlePubMedGoogle Scholar
- Prospero C Naval J, Sison LG, Mendoza E: Metabolic Network Parameter Inference using Particle Swarm Optimization. Proceedings of International Conference on Molecular Systems Biology. 2006Google Scholar
- Maki Y, Takahashi Y, Arikawa Y, Watanabe S, Aoshima K, Eguchi Y, Ueda T, Aburatani S, Kuhara S, Okamoto M: An Integrated Comprehensive Workbench for Inferring Genetic Networks: ::::voyagene::::. J Bioinformatics and Computational Biology. 2004, 2 (3): 533-550. 10.1142/S0219720004000727.View ArticlePubMedGoogle Scholar
- Chou I, Voit EO: Recent developments in parameter estimation and structure identification of biochemical and genomic systems. Mathematical biosciences. 2009, 219 (2): 57-83. 10.1016/j.mbs.2009.03.002.PubMed CentralView ArticlePubMedGoogle Scholar
- Luque B, Lacasa L, Ballesteros F, Luque J: Horizontal visibility graphs: Exact results for random time series. Physical Review E. 2009, 80 (4): 046103-View ArticleGoogle Scholar
- Moles CG, Mendes P, Banga JR: Parameter estimation in biochemical pathways: a comparison of global optimization methods. Genome research. 2003, 13 (11): 2467-2474. 10.1101/gr.1262503.PubMed CentralView ArticlePubMedGoogle Scholar
- Nelander S, Wang W, Nilsson B, She QB, Pratilas C, Rosen N, Gennemark P, Sander C: Models from experiments: combinatorial drug perturbations of cancer cells. Molecular systems biology. 2008, 4:Google Scholar
- Zhang J, Small M: Complex Network from Pseudoperiodic Time Series: Topology versus Dynamics. Phys Rev Lett. 2006, 96: 238701-[http://link.aps.org/doi/10.1103/PhysRevLett.96.238701]View ArticlePubMedGoogle Scholar
- Bezsudnov IV, Gavrilov SV, Snarskii AA: From time series to complex networks: the Dynamical Visibility Graph. ArXiv e-prints. 2012Google Scholar
- Donges JF, Donner RV, Kurths J: Testing time series irreversibility using complex network methods. EPL (Europhysics Letters). 2013, 102: 10004-10.1209/0295-5075/102/10004.View ArticleGoogle Scholar
- Holme P, Saramäki J: Temporal networks. Physics Reports. 2012, 519: 97-125. 10.1016/j.physrep.2012.03.001.View ArticleGoogle Scholar
- Csermely P, Korcsmaros T, Kiss HJM, London G, Nussinov R: Structure and dynamics of molecular networks: A novel paradigm of drug discovery. A comprehensive review. ArXiv e-prints. 2012Google Scholar
- Harrgård MJ, Covert MW, Palson BO: Reconciling Gene Expression Data with Known Genome-Scale Regulatory Network Structure. Genome Research. 2003, 13: 2423-2434. 10.1101/gr.1330003.View ArticleGoogle Scholar
- Shen-Orr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nature Genetics. 2002, 31: 1061-4036.View ArticleGoogle Scholar
- Lee TI, Rinaldi NJ, Robert F, Odom DT, Bar-Joseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional Regulatory Networks in Saccharomyces cerevisiae. Science. 2002, 298 (5594): 799-804. 10.1126/science.1075090. [http://www.sciencemag.org/content/298/5594/799.abstract] 10.1126/science.1075090View ArticlePubMedGoogle Scholar

## Copyright

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.