Eric Fontain Homepage

Eric Fontain

Protein folding
Genetic algorithms
Reaction generation
Structure generation
Structure-activity relations
Publications

Proteins fold into specific three-dimensional structures to perform their diverse biological functions. Based on Christian Anfinsen's groundbreaking work it is now well established that for small proteins the information contained in the amino acid sequence is sufficient to determine the folded structure. The understanding of protein folding, and the sequence-structure relationship of proteins is a long-standing goal in structural biology. This will play a pivotal role in the post-genomic era, and will have great impact on genetics, biochemistry, and pharmaceutical chemistry.

The ProtEA project involves computational, and theoretical models to simulate and understand the Protein folding process by means of Evolutionary Algorithms. ProtEA defines a formal protein folding language that is used to describe tertiary structures. A large number of folding programs undergo an optimization process using a genetic algorithm. The selection process is controlled by a set of heuristic rules such as hydrophobic core formation, hydrogen bonding, charge distribution etc. The primary objective of the ProtEA project is to concentrate the structure build-up to a small number of steps based on the well-known secondary and super-secondary structure elements of proteins.

The folding language is used to build up and define the protein's tertiary structure. It resembles a formal computer language containing commands to create e.g. helices, strands, turns, sheets, hairpins, bends etc. The command interpreter applies these folding macros to an extended conformation of a protein directly derived from the amino acid sequence.

The ProtEA genetic algorithm keeps a large population of integer vectors. First all vectors are inititialized with random numbers. Each vector (individual) is translated into a protein folding program, applied to the unfolded sequence, and finally assigned a fitness value that is obtained from the heuristic analysis of the produced tertiary structure. After all individuals have been analyzed, the evolutionary process improves the overall fitness of the population by selective reproduction, crossing over, mutation, etc. After a series of generations those folding programs emerge that produce tertiary structures with the highest scores in the evaluation process. If now the evaluation process would be using the proper criteria [;-)] the obtained tertiary structures should resemble the native proteins.

Genetic algorithms

It was already in the early 1970s when John Holland and his co-workers at the University of Michigan invented a basic concept for optimization procedures that completely rely on the rules of natural evolution. They called these types of procedures genetic algorithms (GAs).
A GA describes rules for the selective reproduction and for mutation and crossover events within an artificial population of genome vectors. These in most cases are bit strings that are constructed by a linear concatenation of all independent parameters that describe the problem to be optimized.
The population of vectors is initialized with random sequences, and then undergoes an evaluation process that applies defined fitness criteria. Individuals are reproduced according to their relative fitness within the population. Low fitness may result in "no reproduction". This scheme corresponds to the Darwinian "survival of the fittest".
New genetic information is introduced by random mutation changes in the bit vectors. Crossover events guarantee the efficient mixing and outspreading of genes. After these steps, the population re-enters the generation cycle.
Of course, in addition, all features that are observed in natural evolution, like niche building, sub-populations, multi-populations etc. may be applied.

We were among the first to apply GAs in the field of formal computer chemistry. They were used for the calculation of constitutional similarity in our reaction generation program RAIN. The minimization of chemical distance (a problem of atom-to-atom-mapping) is a graph-theoretical problem, that has the adverse property of np-completeness (a class of algorithmically very tough problems). The implicite parallelism of GAs makes it possible to solve these problems on a time scale that allow the target-directed reaction generation in synthesis planning.

References:
E. Fontain, "Application of Genetic Algorithms in the Field of Constitutional Similarity", J. Chem. Inf. Comput. Sci. 32, 748-752 (1992).
E. Fontain, "The Problem of Atom-to-Atom Mapping. An Application of Genetic Algorithms", Anal. Chim. Acta 265, 227-232 (1992).
E. Fontain, "Kombinatorik und chemische Metrik formaler Reaktions- und Strukturgenerierung", Habilitationsschrift Technische Universität München, (1995).

Reaction generation

Since the mid-1980s, we developed the PC program RAIN (Reactions And Intermediates Networks). It produces reaction pathways using a formal reaction generator, togehter with a reaction network management system. RAIN's reaction generator is guided by a set of formal constraints that mainly limit the complexity of electron redistribution and bond breaking/making processes. It produces isomeric ensembles of molecules (EM) from a starting ensemble. This starting ensemble can be the educt, or the product of a chemical reaction pathway under investigation. Thus RAIN produces pathways that link educt and product in a bilateral manner. The program can be used for mechanistical studies, as well as for synthesis planning.

References:
E. Fontain, J. Bauer, I. Ugi, "Computer Assisted Bilateral Generation of Reaction Networks from Educts and Products", Chem. Lett. 3, 37-40 (1987).
E. Fontain, "Die bilaterale Generierung von Reaktionsnetzwerken", Dissertation Technische Universität München, (1987).
E. Fontain, K. Reitsam, "The Generation of Reaction Networks with RAIN. 1. The Reaction Generator", J. Chem. Inf. Comput. Sci. 31, 96-101 (1991).
E. Fontain, "The Generation of Reaction Networks with RAIN. 2. Resonance Structures and Tautomerism", Tetrahedron Comput. Methodol. 3, 469-477 (1990).
E. Fontain, "Kombinatorik und chemische Metrik formaler Reaktions- und Strukturgenerierung", Habilitationsschrift Technische Universität München, (1995).

Structure generation

The RAIN reaction generator can also be used for the generation of conceivable structural isomers. If all the constraints that limit the amount of electron redistribution are set to an infinite value, RAIN generates the complete family of isomeric EM (FIEM) from a collection of atoms, multiatomic systems, or multivalent fragments. Any list of forbidden, allowed, or required substructures can be set up to control the generating process. The structure generating abilities of RAIN were e.g. used to build up a complete catalogue of conceivable isomers with the molecular formula B₆H₁₄

References:
E. Fontain, "The B₆H₁₄-Problem: Generation of a Catalogue of Conceivable Isomers", Heteroat. Chem. 5, 61-64 (1994).
E. Fontain, "Kombinatorik und chemische Metrik formaler Reaktions- und Strukturgenerierung", Habilitationsschrift Technische Universität München, (1995).

Structure-activity relations

The program CORREL from J.Friedrich produces a complete tree of substructures that are contained in a set of molecules. Using this program we studied the ability of a substructure-based qualitative structure-activity approach to recognize substructures that are responsible for bio-accumulation in fish. A second test set of molecules contained sensoric (taste) qualitities that should be related to molecular substructures.

References:
E. Fontain, "Qualitative Struktur-Wirkungs-Korrelation auf der Basis von Substrukturen an ausgewählten Beispielen", Diplomarbeit Technische Universität München, (1983).