Seminar "Information Extraction & Knowledge Harvesting", Winter 2010/2011
Dr. Martin Theobald,
Dr. Rainer Gemulla
Organization
- Regular meetings are Wednesdays at 16:15-17:15 in Room 021, Building E1.4 (MPI-INF).
Contents of the Seminar
The seminar provides an overview over information extraction (IE) techniques, the design and creation of knowledge bases, as well as their manual and automatic population.
We start with a number of classic topics, such as Hearst-pattern-based IE and Wrapper Induction, before we move on to more recent trends with statistical models, declarative IE approaches, and open information extraction.
Requirements for the Certificate
- Attend all talks - not just your own. If you are ill, please let us know in advance by writing a short mail.
- Prepare a 45 minutes talk about your topic that introduces the matter to your fellow students. Talks will be followed by approximately 15 minutes of discussion.
- Make a first appointment with your tutor (who will be announced along with the topics) to go through the outline of your talk already a few weeks in advance. You are responsible for scheduling the meetings with your tutor.
- You are very welcome to point out the advantages or potential weaknesses of the paper(s) in your talk. If you are unsure about what to present, ask your tutor.
Note that, even though presentations of some papers are available on the Web, we expect that you prepare your own slides (which may be, of course, inspired by the original slides).
You must send your slides to and discuss them with your tutor by the Monday before your talk (4pm) at the latest, otherwise your talk will be cancelled.
- Both the slides and the presentation itself must be given in English.
Otherwise, some students will not be able to follow all talks, which is one of the main purposes of the seminar. After the presentations, there will be a discussion in which all fellow students are encouraged to ask questions.
- For each talk, a second student will be preselected as an opponent. His or her role is to prepare tough questions to challenge the paper presented in the talk (not the talk itself or the speaker!).
To make life a little easier, the preliminary version of the slides will be sent to the opponent on the Monday before the talk. However, as interaction is an important part of science, we expect that every participant actively participates in the discussions.
- Two weeks after the talk, the presenter and the opponent together have to submit a short (usually not longer than 5 pages) report about the topic of the talk. The focus of this report
should be on pointing out strengths and weaknesses of the approach presented in the paper(s), not just on summarizing the paper(s).
- In other words: Your final grade will be influenced by the following components: Your oral presentation, the knowledge about your topic (your answers to questions after the presentation), the questions you asked as opponent, your general participation in the seminar, and your two written reports (one in the role of presenter, one in the role of opponent).
Overview of Information Extraction & Knowledge Harvesting
- Wednesday, 3.11.2010, 16:15
- Speaker: Martin Theobald
- Opponent: Rainer Gemulla
- Slides: [PPT] [PDF]
Early Pattern-based Information Extraction
- Wednesday, 10.11.2010, 16:15
- Speaker: Niket Tandon
- Opponent: Natalia Prytkova
- Tutor: Martin Theobald
- Slides: [PPT] [PDF]
- Report: [PDF]
Hand-built Knowledge Representation Frameworks
- Wednesday, 17.11.2010, 16:15
- Speaker: Hossein Khoshnevis
- Opponent: Andreas Frische
- Tutor: Rainer Gemulla
- Slides: [PPT] [PDF]
- Report: [PDF]
Wrapper Induction
- Wednesday, 24.11.2010, 16:15
- Speaker: Hanna Mousa
- Opponent: Ma Jianan
- Tutor: Rainer Gemulla
- Slides: [PDF]
- Report: [PDF]
DBpedia & YAGO
- Wednesday, 1.12.2010, 16:15
- DBpedia: A nucleus for a web of open data.
S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives. ISWC, 2007.
- Linked Data - The Story So Far.
C. Bizer, T. Heath, T. Berners-Lee. Int. J. Semantic Web Inf. Syst, 2009.
- YAGO: a core of semantic knowledge.
F. M. Suchanek, G. Kasneci, G. Weikum. WWW, 2007.
- Speaker: Ma Jianan
- Opponent: Lars Reiter
- Tutor: Martin Theobald
- Slides: [PPTX] [PDF]
- Report: [PDF]
Statistical Relational Learning (I): Markov Logic Networks & Applications
- Wednesday, 15.12.2010, 16:15
- Markov Logic Networks.
M. Richardson, P. Domingos. ICML, 2006.
- StatSnowball: a Statistical Approach to Extracting Entity Relationships.
J. Zhu, Z. Nie, X. Liu, B. Zhang, J.-R. Wen. WWW, 2009.
- BioSnowball: Automated Population of Wikis.
X. Liu, Z. Nie, N. Yu, J.-R. Wen. KDD, 2010.
- Speaker: Luciano Del Corro
- Opponent: Niket Tandon
- Tutor: Rainer Gemulla
- Slides: [PPTX] [PDF]
- Report: [PDF]
Wikipedia-based Information Extraction Frameworks (Kylin/KOG)
- Wednesday, 12.01.2011, 16:15
- Speaker: Madina Boshtayeva
- Opponent: Hanna Mousa
- Tutor: Martin Theobald
- Slides: [PPTX] [PDF]
- Report: [PDF]
Segmentation & Disambiguation
- Wednesday, 19.01.2011, 16:15
- Speaker: Aliaksandr Talaika
- Opponent: Matej Korvas
- Tutor: Martin Theobald
- Slides: [PPTX] [PDF]
- Report: [PDF]
Declarative IE: SystemT
- Wednesday, 26.01.2011, 16:15
- SystemT: A System for Declarative Information Extraction.
R. Krishnamurthy, Y. Li, S. Raghavan, F. Reiss, S. Vaithyanathan, H. Zhu. SIGMOD Rec., 2008.
- An algebraic approach to rule-based information extraction.
F. Reiss, S. Raghavan, R. Krishnamurthy, H. Zhu, S. Vaithyanathan. ICDE, 2008.
- Uncertainty management in rule-based information extraction systems.
E. Michelakis, R. Krishnamurthy, P. J. Haas, S. Vaithyanathan. SIGMOD, 2009.
- Speaker: Lars Reiter
- Opponent: Aliaksandr Talaika
- Tutor: Rainer Gemulla
- Slides: [PPTX] [PDF]
- Report: [PDF]
Open IE
- Wednesday, 02.02.2011, 16:15
- Open information extraction from the web.
M. Banko, M. J. Cafarella, S. Soderland, M. Broadhead, O. Etzioni. IJCAI, 2007.
- Extracting and Querying a Comprehensive Web Database.
M. J. Cafarella. CIDR, 2009.
- Coupled semi-supervised learning for information extraction.
A. Carlson, J. Betteridge, R. C. Wang, E. R. Hruschka Jr., T. M. Mitchell. WSDM, 2010.
- Speaker: Matej Korvas
- Opponent: Hossein Khoshnevis
- Tutor: Rainer Gemulla
- Slides: [PDF]
- Report: [PDF]
Iterative IE & Provenance
- Wednesday, 02.02.2011, 17:15
- Speaker: Natalia Prytkova
- Opponent: Madina Boshtayeva
- Tutor: Martin Theobald
- Slides: [PPT] [PDF]
- Report: [PDF]