Data Mining and Matrices, Summer 2013
Lecturers: Rainer Gemulla, Pauli Miettinen
News
Reexam will be on Tuesday, 8 October, from 10 am until noon in room 023, building E1 4.
You are allowed to use a printout of the lecture slides as well as handwritten notes (but nothing else).
Content
Many data mining tasks operate on dyadic data, i.e., data involving two types
of entities (e.g., users and products, or objects and attributes); such data
can be naturally represented in terms of a matrix. Matrix decompositions, where
we (approximately) represent the data matrix as a product of two (or more) factor
matrices, can be used to perform many common data mining tasks.
In this lecture we explore the
use of matrix decompositions for denoising, discovery of
latent structure, and visualization, among others. We cover data mining tasks such as
prediction, clustering and pattern mining and application areas such as
recommender systems and topic modelling.

Data 
Matrix 
Mining 
Book 1 
5 
0 
3 
Book 2 
0 
0 
7 
Book 3 
4 
6 
5 



Avatar 
The Matrix 
Up 
Alice 

4 
2 
Bob 
3 
2 

Charlie 
5 

3 

A document–term matrix 

An incomplete rating matrix 


Hot Topics in IR 
IR & DM 
DM & Matrices 
Student A 
1 
1 
0 
Student B 
1 
1 
1 
Student C 
0 
1 
1 



Jan. 
June 
Sept. 
Saarbrücken 
–1 
11 
10 
Helsinki 
–6.5 
10.9 
8.7 
Cape Town 
15.7 
7.8 
8.7 

A student–course matrix 

Cities and their average minimum temperatures 
List of topics (tentative):
 Singular value decomposition (SVD)
 Nonnegative matrix factorization (NMF)
 Semidiscrete decomposition (SDD)
 Boolean matrix decomposition (BMF)
 Independent component analysis (ICA)
 Matrix completion
 Probabilistic matrix factorization
 Graphs
 Tensors
Organization
 The course has 2 hours of lectures per week and takehome assignments, but no homework groups (5 credit points).
 Lecture: Thursday 10:15AM, room 021 at building E1.4 (MPIINF), starting Apr. 18, 2013.
 Contact: dmm13@mpiinf.mpg.de
Registration
You
must register in HISPOS. Please also register
via
email to receive news and updates
from us.
Prerequisites
Basic knowledge of linear algebra.
Requirements for the certificate
 You must successfully participate in an exam at the end of the semester.
 There will be 4+1 assignments (e.g., analysing a dataset or writing a short
essay) in parallel to the lecture. You must pass at least 3 assignments
in order to be qualified for the exam. If your assignment is graded
"excellent," you will receive bonus points, which can be used
to improve your final grade.
Lecture notes
 00: Organization (pdf)
 01: Introduction (pdf)
 02: Linear algebra refresher (pdf)
 03: Singular value decomposition (pdf)
 04: Matrix completion (pdf)
 05: Semidiscrete decomposition (pdf)
 06: Nonnegative matrix factorization (pdf)
 07: Graphs I (pdf)
 08: Boolean matrix factorization (pdf)
 09: Introduction to tensors (pdf)
 10: Graphs II (pdf)
 11: Tensor applications (pdf)
 12: Probabilistic matrix factorization (pdf)
Assignments
 01: Singular Value Decomposition (pdf, data) (due May 12, 2013)
 02: Matrix Completion
(pdf,
data) (due June 2, 2013)
 03: NMF and Spectral Clustering
(pdf,
data) (due July 2, 2013)
 Bonus: Independent Component Analysis (pdf) (due July 7, 2013)
 04: Tensors
(pdf,
data) (due July 28, 2013)
Suggested reading
 David Skillicorn, Understanding Complex Dataset: Data Mining with Matrix Decompositions, Chapman & Hall, 2007
 See lecture notes for additional references.