Max-Planck-Institut für Informatik
max planck institut
mpii logo Minerva of the Max Planck Society

Gene Expression Data

Our goal is the improvement of predictions from gene expression data by integrating other data types. Microarray experiments allow for the simultaneous monitoring of mRNA transcript levels for thousands of genes in a single experiment. The resulting expression profiles reflect gene activity at specific time points or under particular conditions. Such profiles are the starting point for an investigation of the underlying biology. Typically researchers analyze data sets containing massive lists of up-regulated or down-regulated genes. Taking known structural, regulatory or enzymatic roles of the corresponding proteins into consideration can improve the functional interpretation of the results significantly. Especially of interest are the biological processes in which the proteins under consideration play a role.

We develop methods for scoring Gene Ontology (GO) groups for their relevance in microarray experiments. We exploit the hierarchical graph structure of the GO annotation for coping with the large number of GO groups. Often, related biological terms are scored with a similar statistical significance. Our methods analyse the GO graph structure, localise dependencies between GO terms and remove them. This means that two neighbouring GO terms with resulting significant score in our analysis exhibit enrichment with distinct genes. Our algorithms can accommodate a large class of group based test statistics, making them suitable for a broader range of biological problems. The algorithms have been applied to several gene expression data sets from prostate cancer patients. The corresponding topGO software package written in the statistical language R is part of the Bioconductor repository. This research has been published in the journal 'Bioinformatics' and 'Molecular Cancer'.

We have developed ScorePAGE, a statistical approach to scoring changes in activity of metabolic pathways from gene expression data. The method identifies the biologically relevant pathways with corresponding statistical significance. Including information about pathway topology in the score further improves the sensitivity of the method. This research has been published in the journal 'Statistical Application in Genetics and Molecular Biology'.


  1. Wolfgang A Schulz, Adrian Alexa, Volker Jung, Christiane Hader, Michele J Hoffmann, Masanori Yamanaka, Sandy Fritzsche, Agnes Wlazlinski, Mirko Muller, Thomas Lengauer, Rainer Engers, Andrea R Florl, Bernd Wullich, Jörg Rahnenführer
    Factor interaction analysis for chromosome 8 and DNA methylation alterations highlights innate immune response suppression and cytoskeletal changes in prostate cancer
    Molecular Cancer 6, 2007
    (Abstract at journal homepage)
  2. Adrian Alexa, J├Ârg Rahnenf├╝hrer, Thomas Lengauer
    Improved scoring of functional groups from gene expression data by decorrelating GO graph structure
    Bioinformatics 13, 1600-1607, 2006
    (Abstract at journal homepage)
  3. Jörg Rahnenführer, Francisco S. Domingues, Jochen Maydt, Thomas Lengauer
    Calculating the statistical significance of changes in pathway activity from gene expression data
    Statistical Applications in Genetics and Molecular Biology 3, No.1, Article 16, 2004
    (Abstract at journal homepage)