Bim
 
   

MoFa - The Molecular Fragment Miner

PDF
PDF

Questions

Answers

1. What is MoFa?

MoFa is a program for finding frequent, discriminative molecular substructures in a set of molecules. The name MoFa is an acronym for Molecular Fragment Miner.

2. What are frequent, discriminative fragments?

Frequent fragments are considered to be parts of molecular structures that occur in many molecules of a database. Consider the molecules as given in the following example:

Database of molecules.
sample molecule database sample molecule database sample molecule database
sample molecule database sample molecule database sample molecule database

Among others, the (sub-)structures given in the following figure are frequent fragments in this database. They are part of all—or at least "many"—molecules of the database.

Some frequent fragments in the database.
Frequent fragments Frequent fragments Frequent fragments
Frequent fragments Frequent fragments Frequent fragments

Frequent, discriminative fragments are substructures of molecules that occur frequently in one set of molecules (the so called focus-set) but infrequently in a second, mostly disjoint, set of molecules (complement set).

3. Why should somebody be interested in those fragments?

Frequent, discriminative fragments can be of great interest to the chemist for various reasons. One field of application is drug discovery research. Frequent fragments may be, for instance, the cause for the activity, i.e. the ability to inhibit a virus or to cure a disease, of certain drug candidates when the occur often in these candidates but only infrequent in the non-active ones. Thus special attention can paid to these fragments when synthesizing new candidates.

4. What is MoFa's input?

The input parameters of the program are:

  • A set of focus molecules. The available implementation of MoFa expects SLN or Smiles descriptions of molecules.
  • A set of complement molecules. (Also in SLN or Smiles description.)
  • A minimum support boundary for the focus set. A fragment is considered to be frequent in the focus set when the ratio of molecules containing the fragment to all molecules in this set is above this value. Only these fragments are subject to be reported. Note: Also the maximum complement and the size constraints must be matched.
  • A maximum support boundary for the complement set. A fragment is considered to be infrequent in the complement set when the ratio of the molecules containing the fragment to all molecules in this set is below this value.
  • A minimum and maximum size constraint of the fragments. That is, an upper and lower limit for the size (as atom count) of the fragments to be reported.
  • A core structure to be extended. This optional parameter offers the opportunity to provide an initial seed. This may speed up the algorithm considerably and will only report fragments containing this core structure.

The accompanying implementation allows also a couple of flags for different treatment of bonds and atoms in rings. See command line help for details (call program with no arguments).

5. What is an SLN description?

SLN, the SYBYL™ Line Notation, is a description language for 2D molecular structures. It is commonly used in software developed at Tripos, Inc. and is included here as Tripos took part in the development of MoFa.

The SLN for the structure in the following picture, for example, is:
C[1]:C:C:C:C[2]:C:@1-N[3]-C(-N-@2)=C(-C(-N-C-@3=O)=O)-C#N .

Image of the Smiles notation

6. What is a Smiles description?

Smiles is another description language for 2-dimensional molecular structures. Unlike SLN, Smiles is free and therefore commonly used. The Smiles representation of the above mentioned molecular structure, e.g., is [c]1[c][c][c]c3c1N2C(=C(C(=O)[N]C2=O)C#N)[N]3 .

7. How can I get an image of a Smiles or SLN description?

The SLN description is used at Tripos, Inc. mainly. Public drawing tools to render an SLN description are therefore not available. For Smiles, however, you can use the Gif Generator webpage at the University of Erlangen to create *.gif or *.png images for a Smiles descriptor.

Copyright © 2004-07 Nycomed Chair for Applied Computer Science, University of Konstanz
All rights reserved.