Chapter I: Introduction

October 15 IR overview: [PDF Part on data mining: [PDF] IRDM applications & demos Pauli & Klaus Not relevant for the exam

Chapter II: Basics from probability theory, statistics, and linear algebra

October 17 No class ... ... Pauli  
October 22 [PDF Linear algebra ZM: Ch. 7; TSK: App. B; MRS: Ch. 18; Extra reading: GL Pauli  
October 24 [PDF] Events, probabilities, limit theorems LW: Ch. 1-5 Pauli  
October 29 [PDF] Parameter estimation, confidence intervals, hypothesis testing LW: Ch. 6-7, 9, 10 Klaus  

Chapter III: Ranking principles

October 31 [PDF] Boolean IR, TF-IDF, IR evaluation MRS: Ch. 1,2,6,8 Klaus  
November 5 [PDF] Probabilistic IR, BM25 MRS: Ch. 11 Klaus  
November 7 [PDF] Statistical language models, latent topic models MRS: Ch. 12 Klaus  
November 12 [PDF] Relevance feedback, novelty & diversity MRS: Ch. 9,10; BY: Ch. 5,13 Klaus 1st short test

Chapter IV: Link analysis

November 14 [PDF] The World Wide Web as a graph, PageRank MRS: Ch. 21; RU: Ch. 5 Klaus  
November 19 [PDF] HITS MRS: Ch. 21; RU: Ch. 5 Klaus  
November 21 [PDF] Topic-specific & personalized PageRank, online link analysis, spam detection, social networks see lecture slides Klaus  

Chapter V: Indexing & searching

November 26 [PDF] Inverted lists, merging vs. hashing MRS: Ch. 4,5; BY: Ch. 9; BCC: Ch. 5 Klaus  
November 28 [PDF] Index compression, top-k query processing MRS: Ch. 5; BY: Ch. 9; BCC: Ch. 6 Klaus  
December 3 PDF] MapReduce, open-source search engines, efficient similarity search & hashing, LSH BCC: Ch. 5;
see also lecture slides

Chapter VI: Information extraction

December 5 [PDF] IE overview & motivation, NLP basics   Klaus  
December 10 [PDF] Rule- and learning-based extraction, HMMs see lecture slides Klaus  
December 12 Entity reconciliation, knowledge base construction, Open-IE see lecture slides Klaus 2nd short test

Chapter VII: Frequent itemsets and association rules

December 17 [PDF Frequent itemsets & association rules ZM: Ch. 10; TSK: Ch. 6 Pauli  
December 19 [PDF] Association rules and summarizing itemsets ZM: Ch. 10, 11; TSK: Ch. 6 Pauli

No lectures from December 23 - January 3

Chapter VIII: Clustering

January 7 [PDF] Representation clustering ZM: Ch. 13; TSK: Ch. 8 Pauli  
January 9 [PDF] Hierarchical, density-based, and co-clustering ZM: Ch. 14&15; TSK: Ch. 8 Pauli  

Chapter IX: Classification

January 14 [PDF] Decision trees and Naïve Bayes ZM: Ch. 18, 19; TSK: Ch. 4, 5.3 - 5.6 Pauli  
January 16 [PDF] Support vector machines and ensemble techniques ZM: Ch. 21, 22; TSK: Ch. 5.3 Pauli  

Chapter X: Graph mining

January 21 [PDF] Centrality, random graphs, and frequent subgraph mining ZM: Ch. 4 & 11 Pauli  
January 23 [PDF] Graph clustering ZM: Ch. 16 Pauli  

Chapter XI: Two Matrix Factorization Methods

January 28 [PDF] Two matrix factorization methods Pauli 3rd short test

Chapter XII: Data Pre and Post Processing

January 30 [PDF] Curse of dimensionality and data pre-processing ZM: Ch. 2.4, 6 & 8 Pauli  
February 4 [PDF] Analyzing and visualizing results & tales from the real life ZM: Ch. 2.2 Pauli  

Chapter XIII: Summary

February 6 [PDF-dm]&[PDF-ir] Wrap up, summary, and Q & A [PDF-qa Pauli & Klaus  

Final Exam, February 13, 2014 from 2PM to 5PM. Place: E 2.2, Guenter Hotz Lecture Hall

Re-Exam, March 17, 2014 from 2PM to 5PM. Place: E 2.2, Guenter Hotz Lecture Hall


Larry Wasserman. All of Statistics, Springer, 2004. (Website)
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. Introduction to Information Retrieval, Cambridge University Press, 2008. (Website)
R. Baeza-Yates, R. Ribeiro-Neto. Modern Information Retrieval: The concepts and technology behind search, Addison-Wesley, 2010. (Website)
Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010. (Website)
Anand Rajaraman and Jeffrey D. Ullman. Mining of Massive Datasets, Cambridge University Press, 2011. (Website)
Mohammed J. Zaki, Wagner Meira Jr. Fundamentals of Data Mining Algorithms, manuscript
(PDF script, username and password will be announced in the lecture)
Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining, Addison-Wesley, 2006. (Website)
Golub, Van Loan. Matrix computations. 3rd ed., JHU press, 1996
Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer, 2006.