Molecule mining


Molecule mining is the process of data mining, or extracting and discovering patterns, as applied to molecules. Since molecules may be represented by molecular graphs, this is strongly related to graph mining and structured [data mining]. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity metrics, which has a long tradition in the field of cheminformatics.
Typical approaches to calculate chemical similarities use chemical fingerprints, but this loses the underlying information about the molecule topology. Mining the molecular graphs directly
avoids this problem. So does the inverse QSAR problem which is preferable for vectorial mappings.

Coding(Moleculei,Moleculej≠i)

Kernel methods

  • Marginalized graph kernel
  • Optimal assignment kernel
  • Pharmacophore kernel
  • combining
  • * the marginalized graph kernel between labeled graphs
  • * extensions of the marginalized kernel
  • * Tanimoto kernels
  • * graph kernels based on tree patterns
  • * kernels based on pharmacophores for 3D structure of molecules

Maximum common graph methods

Coding(Moleculei)

Molecular query methods

  • Warmr
  • AGM
  • PolyFARM
  • FSG
  • MolFea
  • MoFa/MoSS
  • Gaston
  • LAZAR
  • ParMol
  • optimized gSpan
  • SMIREP
  • DMax
  • SAm/AIm/RHC
  • AFGen
  • gRed
  • G-Hash

Methods based on special architectures of neural networks

  • BPZ
  • ChemNet
  • CCS
  • MolNet
  • Graph machines