max planck institut
mpii logo Minerva of the Max Planck Society

Seminar "Selected Topics in Information Extraction", Winter 2013/2014


  • This seminar is organized by Rainer Gemulla and Luciano Del Corro.
  • Regular meetings are every Tuesday at 10:15-11:45 in Room 021, Building E1.4 (MPI-INF).

How to register?

Registration is closed.


Recent topics in the area of information extraction


In this seminar, you will
  • Read, understand, and explore scientific literature
  • Summarize a current research topic in a concise report (5 pages)
  • Give a full presentation about your topic (45 minutes)
  • Give a flash presentation about your topic (5 minutes)
  • Moderate a scientific discussion about a topic of one of your fellow students

Requirements for the Certificate

  • All deadlines are firm.
  • Pick a topic from the list below. Prepare a 45 minutes presentation about your topic to introduce it to your fellow students. For each topic, we provide you with a main paper. To put the main paper into context, we expect you to (briefly) present at least one self-selected, related paper in your presentation.
  • Make a first appointment with your tutor (who will be announced along with the topics) to discuss the outline of your presentation at least 4 weeks in advance of your presentation. You are responsible for scheduling meetings with your tutor.
  • Point out advantages or potential weaknesses of the work covered in your presentation. If you are unsure about what to present, talk to your tutor. Note that—even though relevant presentations may be available on the web—we expect that you prepare your own slides (which may be, of course, inspired by the original slides). Send your slides to and discuss them with your tutor at least 2 weeks before your talk. Otherwise, your talk may be cancelled. Send the final version of your slides to both your tutor and the moderator on the Friday before your presentation .
  • Each presentation is followed by approximately 15 minutes of discussion. The discussion is moderated by a second student. The moderator's role is to provide interesting input (such as observations, questions, related work) for the discussion and, in general, to enable a constructive discussion. A preliminary version of the presenter's slides will be sent to the moderator on the Friday before the presentation.
  • Three weeks after your talk, submit a short report (not longer than 5 pages) about your topic. The report should concisely summarize the article and point out strengths and weaknesses.
  • In our last meeting, give a 5 minutes flash presentation about your topic. As before, discuss your slides with your tutor at least 2 weeks before the presentation.
  • Attend all presentations, not just your own. If you are ill, let us know in advance.
  • Actively participate in the discussions.
  • Slides, presentations, and reports must be prepared in English.
  • Your final grade is influenced by: your oral presentations, your knowledge about your topic (e.g., as shown in the discussion after your presentation), your performance as a moderator, your general participation in the seminar, and your written report.

Tentative Schedule

  • Oct. 15, 2013: Introduction
            Materials: Organization (pdf), Giving Conference Talks (pptx)

  • Nov. 5, 2013: Linguistic Resources
            Materials: slides (pdf)

  • Nov. 12, 2013: Knowledge Bases
    Main reference: J. Hoffart, F. Suchanek, K. Berberich, G. Weikum
    YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia
    Special issue of the Artificial Intelligence Journal, 2012
    Presenter: Abdul Sattar
    Moderator: Christian Schulte
    Tutor: Luciano Del Corro
            Materials: slides (pdf)

  • Nov. 19, 2013: Table Extraction
    Main reference: M. J. Cafarella, A. Halevy, D. Zhe Wang, E. Wu, Y. Zhang
    WebTables: exploring the power of tables on the web
    Proceedings of the VLDB Endowment, 2008
    Presenter: Alexandr Chernov
    Moderator: Sugavanesh Sadasivam Nagarathinam
    Tutor: Luciano Del Corro
            Materials: slides (pdf)

  • Nov. 26, 2013: Named Entity Recognition and Disambiguation
    Main reference: J. Hoffart, M. A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, G. Weikum
    Robust disambiguation of named entities
    Conference on Empirical Methods in Natural Language Processing, 2011
    Presenter: Abdalghani Abujabal
    Moderator: Ramkumar Aruchamy
    Tutor: Rainer Gemulla

  • Dec. 3, 2013: Rule-Based Information Extraction
    Main reference: R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, S. Vaithyanathan, H. Zhu
    SystemT: a system for declarative information extraction
    ACM SIGMOD, 2008
    Presenter: Cholpon Degenbaeva
    Moderator: Amir Hossein Baradaran
    Tutor: Rainer Gemulla

  • Dec. 10, 2013: Relation Extraction
    Main reference: M. Mintz, S. Bills, R. Snow, D. Jurafsky
    Distant supervision for relation extraction without labeled data
    Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2009
    Presenter: Christian Schulte
    Moderator: Dilafruz Amanova
    Tutor: Luciano Del Corro

  • Dec. 17, 2013: Open Information Extraction
    Main reference: A. Fader, S. Soderland, O. Etzioni
    Identifying relations for open information extraction
    Proceedings of the Conference on Empirical Methods in Natural Language, 2011
    Presenter: Sugavanesh Sadasivam Nagarathinam
    Moderator: Susanne Fertmann
    Tutor: Rainer Geumulla

  • Jan. 7, 2013: Relation Clustering
    Main reference: S. Riedel, L. Yao, A. McCallum, B. M. Marlin
    Relation Extraction with Matrix Factorization and Universal Schemas
    Conference of the North American Chapter of the Association for Computational Linguistics, 2013
    Presenter: Amir Hossein Baradaran
    Moderator: Abdul Sattar
    Tutor: Rainer Gemulla

  • Jan. 14, 2013: Semantic Role Labeling
    Main reference: L. Márquez, X. Carreras, K. C. Litkowski, S. Stevenson
    Semantic Role Labeling: An Introduction to the Special Issue
    Journal Computational Linguistics, 2008
    Presenter: Susanne Fertmann
    Moderator: Alexandr Chernov
    Tutor: Luciano Del Corro

  • Jan. 21, 2013: Question Answering
    Main reference: A. Fader, L. Zettlemoyer, O. Etzioni
    Paraphrase-Driven Learning for Open Question Answering
    Association for Computational Linguistics, 2013
    Presenter: Ramkumar Aruchamy
    Moderator: Cholpon Degenbaeva
    Tutor: Rainer Gemulla

  • Jan. 28, 2013 Reasoning
    Main reference: G. Kasneci, J. Van Gael, R. Herbrich, T. Graepel
    Bayesian knowledge corroboration with logical rules and user feedback
    ECML PKDD, 2013
    Presenter: Dilafruz Amanova
    Moderator: Abdalghani Abujabal
    Tutor: Luciano Del Corro

  • Feb. 4, 2013: Flash presentations