Date

Slides

Topics

Literature

Lecturer

Comments

Chapter I: Introduction

October 15 IR overview: [PDF Part on data mining: [PDF] IRDM applications & demos Pauli & Klaus Not relevant for the exam

Chapter II: Basics from probability theory, statistics, and linear algebra

October 17 No class ... ... Pauli  
October 22 [PDF Linear algebra ZM: Ch. 7; TSK: App. B; MRS: Ch. 18; Extra reading: GL Pauli  
October 24 [PDF] Events, probabilities, limit theorems LW: Ch. 1-5 Pauli  
October 29 [PDF] Parameter estimation, confidence intervals, hypothesis testing LW: Ch. 6-7, 9, 10 Klaus  

Chapter III: Ranking principles

October 31 [PDF] Boolean IR, TF-IDF, IR evaluation MRS: Ch. 1,2,6,8 Klaus  
November 5 [PDF] Probabilistic IR, BM25 MRS: Ch. 11 Klaus  
November 7 [PDF] Statistical language models, latent topic models MRS: Ch. 12 Klaus  
November 12 [PDF] Relevance feedback, novelty & diversity MRS: Ch. 9,10; BY: Ch. 5,13 Klaus 1st short test

Chapter IV: Link analysis

November 14 [PDF] The World Wide Web as a graph, PageRank MRS: Ch. 21; RU: Ch. 5 Klaus  
November 19 [PDF] HITS MRS: Ch. 21; RU: Ch. 5 Klaus  
November 21 [PDF] Topic-specific & personalized PageRank, online link analysis, spam detection, social networks see lecture slides Klaus  

Chapter V: Indexing & searching

November 26 [PDF] Inverted lists, merging vs. hashing MRS: Ch. 4,5; BY: Ch. 9; BCC: Ch. 5 Klaus  
November 28 [PDF] Index compression, top-k query processing MRS: Ch. 5; BY: Ch. 9; BCC: Ch. 6 Klaus  
December 3 PDF] MapReduce, open-source search engines, efficient similarity search & hashing, LSH BCC: Ch. 5;
see also lecture slides
Klaus  

Chapter VI: Information extraction

December 5 [PDF] IE overview & motivation, NLP basics   Klaus  
December 10 [PDF] Rule- and learning-based extraction, HMMs see lecture slides Klaus  
December 12 Entity reconciliation, knowledge base construction, Open-IE see lecture slides Klaus 2nd short test

Chapter VII: Frequent itemsets and association rules

December 17 [PDF Frequent itemsets & association rules ZM: Ch. 10; TSK: Ch. 6 Pauli  
December 19 [PDF] Association rules and summarizing itemsets ZM: Ch. 10, 11; TSK: Ch. 6 Pauli

No lectures from December 23 - January 3

Chapter VIII: Clustering

January 7 [PDF] Representation clustering ZM: Ch. 13; TSK: Ch. 8 Pauli  
January 9 [PDF] Hierarchical, density-based, and co-clustering ZM: Ch. 14&15; TSK: Ch. 8 Pauli  

Chapter IX: Classification

January 14 [PDF] Decision trees and Naïve Bayes ZM: Ch. 18, 19; TSK: Ch. 4, 5.3 - 5.6 Pauli  
January 16 [PDF] Support vector machines and ensemble techniques ZM: Ch. 21, 22; TSK: Ch. 5.3 Pauli  

Chapter X: Graph mining

January 21 [PDF] Centrality, random graphs, and frequent subgraph mining ZM: Ch. 4 & 11 Pauli  
January 23 [PDF] Graph clustering ZM: Ch. 16 Pauli  

Chapter XI: Two Matrix Factorization Methods

January 28 [PDF] Two matrix factorization methods Pauli 3rd short test

Chapter XII: Data Pre and Post Processing

January 30 [PDF] Curse of dimensionality and data pre-processing ZM: Ch. 2.4, 6 & 8 Pauli  
February 4 [PDF] Analyzing and visualizing results & tales from the real life ZM: Ch. 2.2 Pauli  

Chapter XIII: Summary

February 6 [PDF-dm]&[PDF-ir] Wrap up, summary, and Q & A [PDF-qa Pauli & Klaus  

Final Exam, February 13, 2014 from 2PM to 5PM. Place: E 2.2, Guenter Hotz Lecture Hall

Re-Exam, March 17, 2014 from 2PM to 5PM. Place: E 2.2, Guenter Hotz Lecture Hall

Literature

[LW]
Larry Wasserman. All of Statistics, Springer, 2004. (Website)
[MRS]
Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze. Introduction to Information Retrieval, Cambridge University Press, 2008. (Website)
[BY]
R. Baeza-Yates, R. Ribeiro-Neto. Modern Information Retrieval: The concepts and technology behind search, Addison-Wesley, 2010. (Website)
[BCC]
Stefan Büttcher, Charles L. A. Clarke, Gordon V. Cormack. Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010. (Website)
[RU]
Anand Rajaraman and Jeffrey D. Ullman. Mining of Massive Datasets, Cambridge University Press, 2011. (Website)
[ZM]
Mohammed J. Zaki, Wagner Meira Jr. Fundamentals of Data Mining Algorithms, manuscript
(PDF script, username and password will be announced in the lecture)
[TSK]
Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining, Addison-Wesley, 2006. (Website)
[GL]
Golub, Van Loan. Matrix computations. 3rd ed., JHU press, 1996
[B]
Christopher M. Bishop. Pattern Recognition and Machine Learning, Springer, 2006.