On Linking Genotype to Phenotype

Well informed linkages between genotype and individual phenotypes will require a completely new generation of simulation models that are suitable for rigorous experimentation.

Rational development of new therapeutic interventions, including new drugs, requires understanding the functional interactions between subcellular networks, the functional units of cells, organs and systems, and how the emergence of disease alters them. If the required information is somehow encoded in the genome, it is currently invisible. Some of it may be hidden within the protocols used by macromolecules to interact within networks and modules (Csete & Doyle 2002). Absent the needed information, our current best alternative is to “copy” nature (build models that reflect our current state of knowledge) and compute these interactions (i.e., simulate) to determine the logic of healthy and diseased states. The impressive growth in bioinformatics databases and the relentless growth in computer power have helped open a door to new methods to explore functionality hierarchically from genes to individual patients. In the mean time, the rapid accumulation of biological information and data is overwhelming our ability to understand it. Many of the newer ideas that we have can not be easily tested.

Genomics has provided us with a massive “parts catalogue.” The details about the individual “parts” and their structures are emerging from proteomics. That is a good start. However, there are very few entries in our “user's guide” for how these parts interact to sustain life or cause disease. Too frequently, the cellular, organ and system functions of these parts are unknown. Yet clues are emerging from homologies in the gene sequences and elsewhere., but that is not enough. Successful development of rational therapeutic interventions will require knowledge of how the parts behave in context, as they interact with the rest of the relevant cellular machinery to generate functions that only become evident at a higher levels. However, there simply is not enough time, people, or resources to get the needed information using the traditional approach of well-controlled experiments that carefully and systematically manipulate one or a few variables at a time. Without this integrative knowledge, we will likely be left in the dark as to which parts are most relevant in disease states.

Searching for patterns in genome and gene expression databases alone will not get us very far in addressing these particular problems. There is a fundamental reason. Genes code for RNA and protein sequences. They do not explicitly code for the blueprint of interactions between macromolecules and other cell components. Nor do they indicate which proteins occupy critical nodes in the hierarchical web of events supporting cell, organelle, and system function in health and disease. Much of the logic within the dynamic network of interactions in living systems is implicit. Wherever possible, nature leaves much of the detail to be “engineered” by context-specific designs or the refinement of the molecules themselves, and to the exceedingly complex way in which their properties have been influenced and exploited during evolution. There is no genetic code for the properties and roles played by water, yet these properties, like many other naturally occurring physicochemical properties, are essential to life on earth. As Noble observed (Noble 2002a), “It is as though the function of the genetic code, viewed as a program, is to build the components of a computer, which then self-assembles [in precise sequences] to run programs about which the genetic code knows nothing.” Similarly, Sydney Brenner (1998) observed that “Genes can only specify the properties of the proteins they code for, and any integrative properties of the system [of which they are a part] must be ‘computed’ by their interactions.” In order to discover and understand these interactions we need to compute them, and so Brenner concluded, “this provides a framework for analysis by simulation.”

Important lessons have been learned from over a decade of research in metabolic engineering. They clearly teach that when the complexity of a system is too great to grasp intuitively we must use computer models to make further real progress (Bailey 1999). Depending on time and place, individual macromolecules may participate in multiple pathways. Individual differences in sex, age, disease, and even internal and external environmental factors can dynamically alter the background against which macromolecular function is expressed. In such a context an important added value of modeling and simulation is that they can be used to hypothesize new approaches, and to identify where gaps in knowledge exist. One can determine whether or not existing data are sufficient to generate the system output under study. When they are not, the models can be used to suggest possible directions for further study and to offer predictions about possible results. For such a process to be successful it is essential to have an iterative interaction between modeling, simulation, and experimentation. We already know that computational modeling and simulation of biological systems can add significant value in the discovery and development of new therapeutic agents (Noble & Colatsky 2000, Noble 2002b). Within such virtual environments the researcher may also conduct experiments to systematically test the possible impact of different conditions. The results allow one to select the best overall design principles in advance of real life studies. They may also be used to help the researcher conduct virtual genetic studies in which cellular components are 'knocked-out' or knocked-in'. The resulting information may then be used to design new drugs, to carry out a more advanced research plan, or to define the optimal therapeutic profile of a new drug prior to chemical synthesis. The researcher can even explore in advance, in a rational and systematic way, whether the most effective treatment is a drug that acts specifically on a single target or one that acts at multiple targets (as is the case for the potent antibiotic pristinamycin), and in what relative proportion these additional activities might be expected to occur. Finally, one can envisage that by combining multiple models one can prospectively investigate issues of clinical safety and efficacy to answer questions about toxicology and pharmacodynamics at the level of the individual.


Bailey JE, 1999. Lessons from metabolic engineering for functional genomics and drug discovery. Nat. Biotechnol. 17:616-618.

Brenner S, 1998. “Biological computation.” In: The limits of reductionism in biology. Wiley, Chichester (Novartis Found. Symp. 213), p 106-116.

Csete ME, Doyle JC, 2002. Reverse engineering of biological complexity. Science 295:1664-69.

Noble D, Colatsky TJ, 2000. A return to rational drug discovery: computer-based models of cells, organs and systems in drug target identification. Emerg. Therap. Targets 4:39-49.

Noble D, 2002a. Modeling the heart—from genes to cells to the whole organ. Science 295:1678-82.

Noble D, 2002b. The rise of computational biology. Nat. Rev. Mol. Cell Biol. 3:460-63.

C. Anthony Hunt, PhD
The University of California, San Francisco
© 2003


The premise of the agent paradigm, its related theory and methodologies together with advances in multilevel modeling of complex systems of interactions opened new frontiers for advancing the physical, natural, social, military, and information sciences and engineering...