[an error occurred while processing this directive] [an error occurred while processing this directive] spacer

Seminar Wissensentdeckung in der Bioinformatik

Blockseminar: 16.01., 30.01., 06.02., 13.02., Fr 8:30-10:00; Raum: Z 613

Dozenten:

Thomas Gabriel, Michael Berthold

Termine:

  • Vortrag 16.01.: Georg, F.
    Mining Association Rules between Sets of Items in Large Databases
  • Vortrag 16.01.: Schmidt, F.
    New Algorithms for Fast Discovery of Association Rules

  • Vortrag 30.01.: Memmel, Th.
    Molecular Feature Mining in HIV Data
  • Vortrag 30.01.: Barro, H.
    Structural Mining of Molecular Biology Data

  • Vortrag 06.02: Scherer, B.
    Pharmacophore Discovery using the Inductive Logic Programming System Progol

  • Vortrag 13.02.: Graf, S.
    Mining Molecular Fragments: Finding Relevant Substructures of Molecules
  • Vortrag 13.02.: Gundelsweiler, F.
    CloseGraph: Mining Closed Frequent Graph Patterns

  • Themengebiet:

    Angewandte Informatik

    Adressaten:

    Studierende des Information Engineering im Bachelor- und Masterstudium. Es besteht die Möglichkeit dieses Seminar im Rahmen eines Projektpraktikums durchzuführen.

    Inhalt:

    Gegenstand dieses Seminars ist die Entdeckung von Wissen speziell im Gebiet der Bioinformatik. Eine konkrete Vorstellung der Inhalte wird beim ersten Seminartermin gegeben.

    Literatur:

    Mining Association Rules between Sets of Items in Large Databases (1993)
    Rakesh Agrawal, Tomasz Imielinski, and Arun Swami

    Abstract. We are given a large database of customer transactions. Each transaction consists of items purchased by a customer in a visit. We present an efficient algorithm that generates all significant association rules between items in the database. The algorithm incorporates buffer management and novel estimation and pruning techniques. We also present results of applying this algorithm to sales data obtained from a large retailing company, which shows the effectiveness of the algorithm.
    more.

    New Algorithms for Fast Discovery of Association Rules (1997)
    Mohammed Javeed Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, and Wei Li

    Abstract. Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery of frequent itemsets, which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The related database ...
    more.

    Molecular Feature Mining in HIV Data (2001)
    Stefan Kramer, Luc De Raedt, and Christoph Helma

    Abstract. We present the application of Feature Mining techniques to the Developmental Therapeutics Program's AIDS antiviral screen database. The database consists of 43576 compounds, which were measured for their capability to protect human cells from HIV-1 infection. According to these measurements, the compounds were classified as either active, moderately active or inactive. The distribution of classes is extremely skewed: Only 1.3 % of the molecules is known to be active, and 2.7 % is known to be moderately active.Given this database, we were interested in molecular substructures (i.e., features) that are frequent in the active molecules, and infrequent in the inactives. In data mining terms, we focused on features with a minimum support in active compounds and a maximum support in inactive compounds. We analyzed the database using the levelwise version space algorithm that forms the basis of the inductive query and database system MOLFEA (Molecular Feature Miner). Within this framework, it is possible to declaratively specify the features of interest, such as the frequency of features on (possibly different) datasets as well as on the generality and syntax of them. Assuming that the detected substructures are causally related to biochemical mechanisms, it should be possible to facilitate the development of new pharmaceuticals with improved activities.
    more.

    Mining Molecular Fragments: Finding Relevant Substructures of Molecules (2002)
    Christian Borgelt and Michael R. Berthold

    Abstract. We present an algorithm to find fragments in a set of molecules that help to discriminate between different classes of, for instance, activity in a drug discovery context. Instead of carrying out a brute-force search, our method generates fragments by embedding them in all appropriate molecules in parallel and prunes the search tree based on a local order of the atoms and bonds, which results in substantially faster search by eliminating the need for frequent, computationally expensive reembeddings and by suppressing redundant search. We prove the usefulness of our algorithm by demonstrating the discovery of activity-related groups of chemical compounds in the well-known National Cancer Institute’s HIV-screening dataset.
    more.

    Large Scale Mining of Molecular Fragments with Wildcards (2003)
    Heiko Hofer, Christian Borgelt, and Michael Berthold

    Abstract. The main task of drug discovery is to find novel bioactive molecules, i.e., chemical compounds that, for example, protect human cells against a virus. One way to support solving this task is to analyze a database of known and tested molecules with the aim to build a classifier that predicts whether a novel molecule will be active or inactive, so that future chemical tests can be focused on the most promising candidates. In [1] an algorithm for constructing such a classifier was proposed that uses molecular fragments to discriminate between active and inactive molecules. In this paper we present two extensions of this approach: A special treatment of rings and a method that finds fragments with wildcards based on chemical expert knowledge.
    more.

    CloseGraph: Mining Closed Frequent Graph Patterns (2003)
    Xifeng Yan and Jiawei Han

    Abstract. Recent research on pattern discovery has progressed from mining frequent itemsets and sequences to mining structured patterns including trees, lattices, and graphs. As a general data structure, graph can model complicated relations among data with wide applications in bioinformatics, Web exploration, and etc. However, mining large graph patterns is challenging due to the presence of an exponential number of frequent subgraphs. Instead of mining all the subgraphs, we propose to mine closed frequent graph patterns. A graph g is closed in a database if there exists no proper subergraph of g that has the same support as g. A closed graph pattern mining algorithm, CloseGraph, is developed by exploring several interesting pruning methods. Our performance study shows that CloseGraph not only dramatically reduces unnecessary subgraphs to be generated but also substantially increases the efficiency of mining, especially in the presence of large graph patterns.
    more.

    gSpan: Graph-Based Substructure Pattern Mining (2002)
    Xifeng Yan and Jiawei Han

    Abstract. We investigate new approaches for frequent graph-based pattern mining in graph datasets and propose a novel algorithm called gSpan (graph-based Substructure pattern mining) , which discovers frequent substructures without candidate generation. gSpan builds a new lexicographic order among graphs, and maps each graph to a unique minimum DFS code as its canonical label. Based on this lexicographic order, gSpan adopts the depth-first search strategy to mine frequent connected subgraphs efficiently. Our performance study shows that gSpan substantially outperforms previous algorithms, sometimes by an order of magnitude.
    more.

    CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets (2000)
    Jian Pei, Jiawei Han, and Runying Mao

    Abstract. Association mining may often derive an undesirably large set of frequent itemsets and association rules. Recent studies have proposed an interesting alternative: mining frequent closed itemsets and their corresponding rules, which has the same power as association mining but substantially reduces the number of rules to be presented. In this paper, we propose an efficient algorithm, CLOSET, for mining closed itemsets, with the development of three techniques: (1) applying a compressed, frequent ...
    more.

    An Efficient Algorithm for Discovering Frequent Subgraphs (2002)
    Michihiro Kuramochi and George Karypis

    Abstract. Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to non-traditional domains, existing frequent pattern discovery approach cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the datasets in these domains. An alternate way of modeling the objects in these datasets is to represent them using graphs. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph databases. We experimentally evaluate the performance of FSG using a variety of real and synthetic datasets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in datasets containing over 100,000 graph transactions and scales linearly with respect to the size of the database.
    more.

    Structural Mining of Molecular Biology Data (2001)
    Diane J. Cook, Lawrence B. Holder, Shaobing Su, Ron Maglothin, and Istvan Jonyer

    Abstract. The increasing amount and complexity of molecular biology data evokes a need to focus on automated methods for mining this data. In addition, molecular biology data is frequently structural in nature, or is composed of parts and relations between the parts. Hence, there exists a need to develop tools to analyze and discover concepts in structural databases. The goal of this research is to provide a system that mines databases represented as graphs. We demonstrate how the Subdue system can ...
    more.

    Pharmacophore Discovery using the Inductive Logic Programming System Progol (1998)
    Paul Finn, et al.

    Abstract. This paper presents a case study of a machine-aided knowledge discovery process within the general area of drug design. Within drug design, the particular problem of pharmacophore discovery is isolated, and the Inductive Logic Programming (ILP) system progol is applied to the problem of identifying potential pharmacophores for ACE inhibition. The case study reported in this paper supports four general lessons for machine learning and knowledge discovery, as well as more specific lessons for pharmacophore discovery, for Inductive Logic Programming, and for ACE inhibition. The general lessons for machine learning and knowledge discovery are as follows.
    more.

    Leistungsnachweis:

    Schriftliche Ausarbeitung und Präsentation eines Konferenzpapiers, wobei die Gesamtnote aus dem arithmetischen Mittel beider Leistungen gebildet wird.

    Leistungspunkte:

    Bei Bestehen des Leistungsnachweises können 3 Punkte erworben werden.

    Pool

    bioinf_W03 [an error occurred while processing this directive]