Date

Slides

Topics

Literature

Lecturer

Comments

Chapter I: Introduction

October 18 IR overview: [PDF] [PPTX]; 
Part on data mining: [PDF]
IRDM applications & demos Pauli & Martin Not relevant for the exam

Chapter II: Basics from probability theory and statistics

October 20 [PDF] [PPTX] Events, probabilities, RVs, limit theorems LW: Ch. 1-5 Martin  
October 25 [PDF] [PPTX] Sampling & statistical inference, max. likelihood, EM LW: Ch. 6,7,9 Martin  
October 27 [PDF] Hypothesis testing, regression, logistic regression LW: Ch. 10 Pauli  

Chapter III: Ranking principles

November 3 [PDF] & [PDF] Boolean IR, TF-IDF, IR evaluation MRS: Ch. 1,2,6,8 Pauli  
November 8 [PDF] [PPTX] Probabilistic IR, BM25 MRS: Ch. 11 Martin  
November 10 [PDF] [PPTX] Statistical language models MRS: Ch. 12 Martin  
November 15 [PDF] [PPTX] Relevance feedback, XML-IR MRS: Ch. 9,10; BY: Ch. 5,13 Martin  

Chapter IV: Link analysis

November 17 [PDF] PageRank MRS: Ch. 21 Pauli 1st short test
November 22 [PDF] [PPTX] HITS, topic-specific & personalized link analysis MRS: Ch. 21;
see also lecture slides
Martin  
November 24 [PDF] [PPTX] Spam detection, distributed link analysis, social search see lecture slides Martin  

Chapter V: Indexing & searching

November 29 [PDF] [PPTX] Inverted lists, merging vs. hashing MRS: Ch. 4,5; BY: Ch. 9; BCC: Ch. 5 Martin  
December 1 [PDF] [PPTX] Index compression, top-k query processing MRS: Ch. 5; BY: Ch. 9; BCC: Ch. 6 Martin  
December 6 [PDF] [PPTX] Top-k ct'd, open-source search engines, efficient similarity search & hashing, LSH BCC: Ch. 5;
see also lecture slides
Martin  

Chapter VI: Information extraction

December 8 [PDF] [PPTX] Similarity search ct'd, IE overview & motivation   Martin  
December 13 [PDF] [PPTX] NLP basics, rule- and learning-based extraction, HMMs see lecture slides Martin  
December 15 [PDF] [PPTX] Entity reconciliation, knowledge base construction, Open-IE see lecture slides Martin  

Chapter VII: Frequent itemsets and association rules

December 20 [PDF Frequent itemsets & association rules ZM: Ch. 6 Pauli 2nd short test
December 22 [PDF] [PPTX] Apriori, association rule mining, quality measures ZM: Ch. 6 Martin

No lectures from Dec. 23 - Jan. 6

Chapter VIII: Clustering

January 10 [PDF] Representation clustering ZM: Ch. 16, 17; TSK: Ch. 8 Pauli  
January 12 [PDF] Hierarchical clustering and co-clustering Pauli  

Chapter IX: Latent topics and dimensionality reduction

January 17 [PDF] Matrix factorizations ZM: Ch. 8; TSK: App. B; MRS: Ch. 18; Extra reading: GL Pauli  
January 19 [PDF] Matrix factorizations & Latent topic models   Pauli  
January 24 [PDF] Latent topic models & Dimensionality reduction ZM: Ch. 6, 8; Pauli  

Chapter X: Classification

January 26 [PDF] Decision trees ZM: Ch. 24, 26, 28, 29; TSK: Ch. 4, 5.3 - 5.6 Pauli  
January 31 [PDF] Naive Bayes classification ZM: Ch. 26; TSK: Ch. 5.3 Pauli 3rd short test
February 2 [PDF] Support vector machines ZM: Ch. 5, 28; TSK: Ch. 5.5; B: Ch. 7.1; Pauli  

Chapter XI: Selected topics in DM

February 7 [PDF (part 1)][PDF (part 2)] Ensemble methods & Data Mining Outro ZM: Ch. 29; TSK: Ch. 5.6; B: Ch. 14.2-3; Pauli  
February 9   Wrap up & summary   Pauli & Martin room changed: E2.5 (Math building) HS2

Final Exam, February 21

Literature

[LW]
Larry Wasserman. All of Statistics, Springer, 2004. (Website)
[MRS]
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. Introduction to Information Retrieval, Cambridge UniversityPress, 2008. (Website)
[BY]
R. Baeza-Yates, R. Ribeiro-Neto. Modern Information Retrieval: The concepts and technology behind search, Addison-Wesley, 2010.
[BCC]
Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010 (Website)
[ZM]
Mohammed J. Zaki, Wagner Meira Jr. Fundamentals of Data Mining Algorithms, manuscript
(PDF script, username and password will be announced in the lecture)
[TSK]
Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining, Addison-Wesley, 2006. (Website)
[GL]
Golub, Van Loan. Matrix computations. 3rd ed., JHU press, 1996
[B]
Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer, 2006.