max planck institut
informatik

IRDM WS 2013/14

# Core Lecture "Information Retrieval and Data Mining" WS 2013/14

## Lectures

• Tuesdays 16:00–18:00 and Thursdays 14:00–16:00, Building: E1.3, HS-002.

## Office Hours

• Klaus Berberich, Tuesdays 14:00–16:00.
• Pauli Miettinen, Tuesdays 14:00–16:00.

## News & Announcements

• Paper certificates (only for those not participating in HISPOS) can be picked up in E1 4 / R402.

• The re-exam results are online. Check it here. The inspection will be on March 27th from 2 PM to 4 PM in rotunda of Databases and Information Systems Department, Campus E1 4, 4th floor.

• The final exam results are online. Check it here.

• Inspection for the final exam will be on March 4th from 2 PM to 4 PM in rotunda of Databases and Information Systems Department, Campus E1 4, 4th floor.

• Please check your bonus points and eligibility to take the final exam.

• Homework 13 is online.

• There will be a question-answer session at the end of the last lecture. Students can send their questions to the lecturers in advance (latest on Wednesday).

• The midterm-3 results are announced.
Students having obtained 7 or more points pass; those with 16 or more obtain a bonus point. Inspection for the 3rd mid-term will be on February 4 from 2 PM to 4 PM in rotunda of Databases and Information Systems Department, Campus E1 4, 4th floor.
Check the results here

.
• Solution to Problem 1 of Homework 11 is updated to show correct information gains for the second split.

• Homework 12 is online. Problem to now has with minsup = 1 added compared to the printed version.

• Solution to Problem 2 of Homework 9 is updated to fix a typo in the item names in the FP tree.

• Homework 11 is online.

• irdm-7-3-4.pdf is updated; a typo in the equations in slide 32 is fixed. Another typo is in Equations (9.3) and (9.4) in page 281 of Zaki & Meira: the exponents should have |X \ W| instead of |X \ Y|.

• Homework 10 is online.

• Homework 9 is updated; minimum support has been added to Problem #7.

• Homework 9 is online.

• There is no tutorial on 23 December 2013, Monday due to the Christmas holiday.

• The midterm-2 results are announced.
Students having obtained 7 or more points pass; those with 16 or more obtain a bonus point. The exam inspection will be on December 19 from 4 PM to 6 PM in rotunda of Databases and Information Systems Department, Campus E1 4, 4th floor.
Check the results here.

• Homework 8 is online.

• Homework 7 is online.

• Final Exam, February 13, 2014 from 2PM to 5PM. Place: E 2.2, Guenter Hotz Lecture Hall

• Re-Exam, March 17, 2014 from 2PM to 5PM. Place: E 2.2, Guenter Hotz Lecture Hall

• Homework 6 is online.

• Homework 5 is online.

• The midterm-1 results are announced.
Students having obtained 7 or more points pass; those with 16 or more obtain a bonus point. The exam inspection will be on November 26 from 2 PM to 4 PM in rotunda of Databases and Information Systems Department, Campus E1 4, 4th floor. Check it here.

• Homework 4 is online. Problem 4 in Assignment 4 has been clarified.

• Exam rules are here.

• Homework 3 is online.

• The slides of the lecture on October 29 are updated due to a small mis-calculation in the slide #21, and slide #17.

• The tutorial groups are announced. Check it here.

• The exercise 3.c in Homework 1 has changed. Please download the new exercise sheet, and solve it accordingly.

• Tutorials on 1 November 2013, Friday will be held on 5 November 2013, Tuesday at 18:00 in R021, R023 (E1.4, MPI-INF).

• Registration is closed.

• Please register before 22nd October 2013, 23:59 (Berlin Time).

• There is no class on 17 October 2013.

## Tutoring Groups

• Monday, 12:00-14:00
Group-A : Location - R021 (E1.4, MPI-INF)
• Monday, 14:00-16:00
Group-B : Location - R021 (E1.4, MPI-INF)
• Monday, 16:00-18:00
Group-C : Location - R021 (E1.4, MPI-INF)
• Friday, 12:00-14:00
Group-D : Location - R021 (E1.4, MPI-INF)
• Friday, 14:00-16:00
Group-E : Location - R021 (E1.4, MPI-INF)

## Content

The lecture teaches mathematical models and algorithms that form the basis of search engines for the Web, intranets, and digital libraries, and for data mining and analysis tools. Information Retrieval and Data Mining are technologies for searching, analyzing and automatically organizing text documents, multi-media documents, and structured or semistructured data.

## Prerequisites

Students planning to attend the course should be familiar with basic models and methods from linear algebra (e.g. singular-value decomposition), probability theory and statistics (e.g. Bayesian networks and Markov chains), and combinatorics.

## Literature

• Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. Introduction to Information Retrieval, Cambridge University Press, 2008. (Website)
• R. Baeza-Yates, R. Ribeiro-Neto. Modern Information Retrieval: The concepts and technology behind search, Addison-Wesley, 2010.
• W. Bruce Croft, Donald Metzler, Trevor Strohman. Search Engines: Information Retrieval in Practice, Addison-Wesley, 2009. (Website)
• Mohammed J. Zaki, Wagner Meira Jr. Fundamentals of Data Mining Algorithms, manuscript (pdf, requires username and password)
• Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining, Addison-Wesley, 2006. (Website)