Eric Fontain | |||||
Protein folding Genetic algorithms Reaction generation Structure generation Structure-activity relations Publications | |||||
| |||||
Proteins fold into specific three-dimensional structures to perform their diverse biological functions. Based on Christian Anfinsen's groundbreaking work it is now well established that for small proteins the information contained in the amino acid sequence is sufficient to determine the folded structure. The understanding of protein folding, and the sequence-structure relationship of proteins is a long-standing goal in structural biology. This will play a pivotal role in the post-genomic era, and will have great impact on genetics, biochemistry, and pharmaceutical chemistry. The ProtEA project involves computational, and theoretical models to simulate and understand the Protein folding process by means of Evolutionary Algorithms. ProtEA defines a formal protein folding language that is used to describe tertiary structures. A large number of folding programs undergo an optimization process using a genetic algorithm. The selection process is controlled by a set of heuristic rules such as hydrophobic core formation, hydrogen bonding, charge distribution etc. The primary objective of the ProtEA project is to concentrate the structure build-up to a small number of steps based on the well-known secondary and super-secondary structure elements of proteins. The folding language is used to build up and define the protein's tertiary structure. It resembles a formal computer language containing commands to create e.g. helices, strands, turns, sheets, hairpins, bends etc. The command interpreter applies these folding macros to an extended conformation of a protein directly derived from the amino acid sequence. The ProtEA genetic algorithm keeps a large population of integer vectors. First all vectors are inititialized with random numbers. Each vector (individual) is translated into a protein folding program, applied to the unfolded sequence, and finally assigned a fitness value that is obtained from the heuristic analysis of the produced tertiary structure. After all individuals have been analyzed, the evolutionary process improves the overall fitness of the population by selective reproduction, crossing over, mutation, etc. After a series of generations those folding programs emerge that produce tertiary structures with the highest scores in the evaluation process. If now the evaluation process would be using the proper criteria [;-)] the obtained tertiary structures should resemble the native proteins.
| |||||
| |||||
It was already in the early 1970s when John Holland and his co-workers at the University of Michigan
invented a basic concept for optimization procedures that completely rely on the rules of natural evolution.
They called these types of procedures genetic algorithms (GAs). We were among the first to apply GAs in the field of formal computer chemistry. They were used for the calculation of constitutional similarity in our reaction generation program RAIN. The minimization of chemical distance (a problem of atom-to-atom-mapping) is a graph-theoretical problem, that has the adverse property of np-completeness (a class of algorithmically very tough problems). The implicite parallelism of GAs makes it possible to solve these problems on a time scale that allow the target-directed reaction generation in synthesis planning. References:
| |||||
| |||||
Since the mid-1980s, we developed the PC program RAIN (Reactions And Intermediates Networks). It produces reaction pathways using a formal reaction generator, togehter with a reaction network management system. RAIN's reaction generator is guided by a set of formal constraints that mainly limit the complexity of electron redistribution and bond breaking/making processes. It produces isomeric ensembles of molecules (EM) from a starting ensemble. This starting ensemble can be the educt, or the product of a chemical reaction pathway under investigation. Thus RAIN produces pathways that link educt and product in a bilateral manner. The program can be used for mechanistical studies, as well as for synthesis planning. References:
| |||||
| |||||
The RAIN reaction generator can also be used for the generation of conceivable structural isomers. If all the constraints that limit the amount of electron redistribution are set to an infinite value, RAIN generates the complete family of isomeric EM (FIEM) from a collection of atoms, multiatomic systems, or multivalent fragments. Any list of forbidden, allowed, or required substructures can be set up to control the generating process. The structure generating abilities of RAIN were e.g. used to build up a complete catalogue of conceivable isomers with the molecular formula B6H14 References:
| |||||
| |||||
The program CORREL from J.Friedrich produces a complete tree of substructures that are contained in a set of molecules. Using this program we studied the ability of a substructure-based qualitative structure-activity approach to recognize substructures that are responsible for bio-accumulation in fish. A second test set of molecules contained sensoric (taste) qualitities that should be related to molecular substructures. References:
| |||||
| |||||