Decoration
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 
MVDM WS 2014–15

Block Seminar on Multi-View Data Mining (7 credits)

Lecturer

Dr. Pauli Miettinen


News


Content

Do Polar Bears live in cold areas? Do some mammals frequently co-inhabit some areas, and can those areas' bioclimatic conditions summarized succintly? Do political candidates with specific socioeconomical background have opinions that set them apart from their peers? How much the discussion in different discussion threads is influenced by shared topics and how much the threads have their individual topics (and what those topics are)?

Answering these and many other questions requires multi-view data mining techniques, meaning techniques that can take multiple data sets describing the same set of entities and find interesting patterns from them. In this block seminar, we are going to study two types of such algorithms, redescription mining and shared subspace factorizations.


Registering

The seminar has a limited number of slots. The registration is first-come-first-served. The registration happens by sending email to the lecturer with at least 3 preferred papers. You will be informed whether you got a slot to the seminar before the kick-off meeting. The deadline for registration is announced later.

If you wish to obtain email as soon as the list of papers is published, you can mail the lecturer.


Schedule

Month Day Hour Topic Location
October 27 14:15–16:00 Kick-off meeting Room 024, building E1.4 (slides)
December 1 16:00 Written report first draft DL
December 15 16:00 Slides first draft DL
January 8 16:00 Written report hand-in DL
January 15–16 Seminar days Room 630, building E1.5


Papers

Below is the list of the papers. Papers 1–6 are about redescription mining and related topics, while papers 7–12 are about shared subspace matrix and tensor factorizations.

All papers are taken. Some papers might be re-released if the student drops off before the kick-off meeting. If you're interested on the seminar but didn't secure a spot, you can come to the kick-off meeting to see if there are any such papers.

  1. Naren Ramakrishnan, Deept Kumar, Bud Mishra, Malcolm Potts, and Richard F Helm. “Turning CARTwheels: an alternating algorithm for mining redescriptions”. In KDD '04, pp. 266-275, 2004. [pdf]
  2. Mohammed J Zaki and Naren Ramakrishnan. “Reasoning about sets using redescription mining”. In KDD '05, pp. 364-373, 2005. [pdf]
  3. Deept Kumar, Naren Ramakrishnan, Richard F Helm, and Malcolm Potts. “Algorithms for Storytelling”. IEEE Trans Knowl Data En 20(6), pp. 736-751, 2008. [pdf]
  4. Esther Galbrun and Pauli Miettinen. “From black and white to full color: Extending redescription mining outside the Boolean world”. Stat Anal Data Min 5(4), pp. 284-303, 2012. [pdf]
  5. Esther Galbrun and Angelika Kimmig. “Finding relational redescriptions”. Mach Learn 96(3), pp. 225-248, 2014. [pdf]
  6. Hao Wu, Jilles Vreeken, Nikolaj Tatti, and Naren Ramakrishnan. “Uncovering the plot: detecting surprising coalitions of entities in multi-relational schemas”. Data Min Knowl Discov 28(5-6), pp. 1398-1428, 2014. [pdf]
  7. Shuiwang Ji, Lei Tang, Shipeng Yu, and Jieping Ye. “A shared-subspace learning framework for multi-label classification”. ACM Trans Knowl Discov Data 4(2), article 8, 2010. [pdf]
  8. Pauli Miettinen. “On Finding Joint Subspace Boolean Matrix Factorizations”. In SDM '12, pp. 954-965, 2012. [pdf]
  9. Sunil Kumar Gupta, Dinh Phung, and Svetha Venkatesh. “A Bayesian Nonparametric Joint Factor Model for Learning Shared and Individual Subspaces from Multiple Data Sources”. In SDM '12, pp. 200-211, 2012. [pdf]
  10. Sunil Kumar Gupta, Dinh Phung, Brett Adams, and Svetha Venkatesh. “Regularized nonnegative shared subspace learning”. Data Min Knowl Discov 26(1), pp. 57-97, 2013. [pdf]
  11. Haiping Lu. “Learning canonical correlations of paired tensor sets via tensor-to-vector projection”. In IJCAI '13, pp. 1516-1522, 2013. [pdf]
  12. Suleiman A Khan and Samuel Kaski. “Bayesian Multi-view Tensor Factorization”. In ECML PKDD '14, pp. 656-671, 2014. [pdf]

Prerequisites

Students should know the basic ideas of data mining and machine learning, e.g., by successfully taking Information Retrieval and Data Mining or Machine Learning core lectures.

Seminar Format

This is a block seminar. We will have one meeting at the begin of the semester and one or two days of presentations at the end of it. In addition to their presentation, every participant must also hand in a short essay of their topic. The essays and the preliminary versions of the presentations need to be handed in to the lecturer during the semester (exact date TBA) and the essays will be distributed to the other attendants. Meeting all the deadlines and attending all presentations (including the kick-off meeting) is mandatory. The grading shall be based on the essays, the presentation, your knowledge of the subject (as evidenced in the discussion after your presentation), and your activity in the discussions.