Chapter I: Introduction

October 18 IR overview: [PDF] [PPTX]; 
Part on data mining: [PDF]
IRDM applications & demos Pauli & Martin Not relevant for the exam

Chapter II: Basics from probability theory and statistics

October 20 [PDF] [PPTX] Events, probabilities, RVs, limit theorems LW: Ch. 1-5 Martin  
October 25 [PDF] [PPTX] Sampling & statistical inference, max. likelihood, EM LW: Ch. 6,7,9 Martin  
October 27 [PDF] Hypothesis testing, regression, logistic regression LW: Ch. 10 Pauli  

Chapter III: Ranking principles

November 3 [PDF] & [PDF] Boolean IR, TF-IDF, IR evaluation MRS: Ch. 1,2,6,8 Pauli  
November 8 [PDF] [PPTX] Probabilistic IR, BM25 MRS: Ch. 11 Martin  
November 10 [PDF] [PPTX] Statistical language models MRS: Ch. 12 Martin  
November 15 [PDF] [PPTX] Relevance feedback, XML-IR MRS: Ch. 9,10; BY: Ch. 5,13 Martin  

Chapter IV: Link analysis

November 17 [PDF] PageRank MRS: Ch. 21 Pauli 1st short test
November 22 [PDF] [PPTX] HITS, topic-specific & personalized link analysis MRS: Ch. 21;
see also lecture slides
November 24 [PDF] [PPTX] Spam detection, distributed link analysis, social search see lecture slides Martin  

Chapter V: Indexing & searching

November 29 [PDF] [PPTX] Inverted lists, merging vs. hashing MRS: Ch. 4,5; BY: Ch. 9; BCC: Ch. 5 Martin  
December 1 [PDF] [PPTX] Index compression, top-k query processing MRS: Ch. 5; BY: Ch. 9; BCC: Ch. 6 Martin  
December 6 [PDF] [PPTX] Top-k ct'd, open-source search engines, efficient similarity search & hashing, LSH BCC: Ch. 5;
see also lecture slides

Chapter VI: Information extraction

December 8 [PDF] [PPTX] Similarity search ct'd, IE overview & motivation   Martin  
December 13 [PDF] [PPTX] NLP basics, rule- and learning-based extraction, HMMs see lecture slides Martin  
December 15 [PDF] [PPTX] Entity reconciliation, knowledge base construction, Open-IE see lecture slides Martin  

Chapter VII: Frequent itemsets and association rules

December 20 [PDF Frequent itemsets & association rules ZM: Ch. 6 Pauli 2nd short test
December 22 [PDF] [PPTX] Apriori, association rule mining, quality measures ZM: Ch. 6 Martin

No lectures from Dec. 23 - Jan. 6

Chapter VIII: Clustering

January 10 [PDF] Representation clustering ZM: Ch. 16, 17; TSK: Ch. 8 Pauli  
January 12 [PDF] Hierarchical clustering and co-clustering Pauli  

Chapter IX: Latent topics and dimensionality reduction

January 17 [PDF] Matrix factorizations ZM: Ch. 8; TSK: App. B; MRS: Ch. 18; Extra reading: GL Pauli  
January 19 [PDF] Matrix factorizations & Latent topic models   Pauli  
January 24 [PDF] Latent topic models & Dimensionality reduction ZM: Ch. 6, 8; Pauli  

Chapter X: Classification

January 26 [PDF] Decision trees ZM: Ch. 24, 26, 28, 29; TSK: Ch. 4, 5.3 - 5.6 Pauli  
January 31 [PDF] Naive Bayes classification ZM: Ch. 26; TSK: Ch. 5.3 Pauli 3rd short test
February 2 [PDF] Support vector machines ZM: Ch. 5, 28; TSK: Ch. 5.5; B: Ch. 7.1; Pauli  

Chapter XI: Selected topics in DM

February 7 [PDF (part 1)][PDF (part 2)] Ensemble methods & Data Mining Outro ZM: Ch. 29; TSK: Ch. 5.6; B: Ch. 14.2-3; Pauli  
February 9   Wrap up & summary   Pauli & Martin room changed: E2.5 (Math building) HS2

Final Exam, February 21


Larry Wasserman. All of Statistics, Springer, 2004. (Website)
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. Introduction to Information Retrieval, Cambridge UniversityPress, 2008. (Website)
R. Baeza-Yates, R. Ribeiro-Neto. Modern Information Retrieval: The concepts and technology behind search, Addison-Wesley, 2010.
Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010 (Website)
Mohammed J. Zaki, Wagner Meira Jr. Fundamentals of Data Mining Algorithms, manuscript
(PDF script, username and password will be announced in the lecture)
Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining, Addison-Wesley, 2006. (Website)
Golub, Van Loan. Matrix computations. 3rd ed., JHU press, 1996
Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer, 2006.