Data Mining and Matrices, Summer 2013
Lecturers: Rainer Gemulla, Pauli Miettinen
News
Re-exam will be on Tuesday, 8 October, from 10 am until noon in room 023, building E1 4.
You are allowed to use a printout of the lecture slides as well as hand-written notes (but nothing else).
Content
Many data mining tasks operate on dyadic data, i.e., data involving two types
of entities (e.g., users and products, or objects and attributes); such data
can be naturally represented in terms of a matrix. Matrix decompositions, where
we (approximately) represent the data matrix as a product of two (or more) factor
matrices, can be used to perform many common data mining tasks.
In this lecture we explore the
use of matrix decompositions for denoising, discovery of
latent structure, and visualization, among others. We cover data mining tasks such as
prediction, clustering and pattern mining and application areas such as
recommender systems and topic modelling.
|
Data |
Matrix |
Mining |
Book 1 |
5 |
0 |
3 |
Book 2 |
0 |
0 |
7 |
Book 3 |
4 |
6 |
5 |
|
|
|
Avatar |
The Matrix |
Up |
Alice |
|
4 |
2 |
Bob |
3 |
2 |
|
Charlie |
5 |
|
3 |
|
A document–term matrix |
|
An incomplete rating matrix |
|
|
Hot Topics in IR |
IR & DM |
DM & Matrices |
Student A |
1 |
1 |
0 |
Student B |
1 |
1 |
1 |
Student C |
0 |
1 |
1 |
|
|
|
Jan. |
June |
Sept. |
Saarbrücken |
–1 |
11 |
10 |
Helsinki |
–6.5 |
10.9 |
8.7 |
Cape Town |
15.7 |
7.8 |
8.7 |
|
A student–course matrix |
|
Cities and their average minimum temperatures |
List of topics (tentative):
- Singular value decomposition (SVD)
- Non-negative matrix factorization (NMF)
- Semi-discrete decomposition (SDD)
- Boolean matrix decomposition (BMF)
- Independent component analysis (ICA)
- Matrix completion
- Probabilistic matrix factorization
- Graphs
- Tensors
Organization
- The course has 2 hours of lectures per week and take-home assignments, but no homework groups (5 credit points).
- Lecture: Thursday 10:15AM, room 021 at building E1.4 (MPI-INF), starting Apr. 18, 2013.
- Contact: dmm13@mpi-inf.mpg.de
Registration
You
must register in HISPOS. Please also register
via
e-mail to receive news and updates
from us.
Prerequisites
Basic knowledge of linear algebra.
Requirements for the certificate
- You must successfully participate in an exam at the end of the semester.
- There will be 4+1 assignments (e.g., analysing a dataset or writing a short
essay) in parallel to the lecture. You must pass at least 3 assignments
in order to be qualified for the exam. If your assignment is graded
"excellent," you will receive bonus points, which can be used
to improve your final grade.
Lecture notes
- 00: Organization (pdf)
- 01: Introduction (pdf)
- 02: Linear algebra refresher (pdf)
- 03: Singular value decomposition (pdf)
- 04: Matrix completion (pdf)
- 05: Semi-discrete decomposition (pdf)
- 06: Non-negative matrix factorization (pdf)
- 07: Graphs I (pdf)
- 08: Boolean matrix factorization (pdf)
- 09: Introduction to tensors (pdf)
- 10: Graphs II (pdf)
- 11: Tensor applications (pdf)
- 12: Probabilistic matrix factorization (pdf)
Assignments
- 01: Singular Value Decomposition (pdf, data) (due May 12, 2013)
- 02: Matrix Completion
(pdf,
data) (due June 2, 2013)
- 03: NMF and Spectral Clustering
(pdf,
data) (due July 2, 2013)
- Bonus: Independent Component Analysis (pdf) (due July 7, 2013)
- 04: Tensors
(pdf,
data) (due July 28, 2013)
Suggested reading
- David Skillicorn, Understanding Complex Dataset: Data Mining with Matrix Decompositions, Chapman & Hall, 2007
- See lecture notes for additional references.