4th assignment (DL 24 July)

Return the assignments by email to tada14@mpi-inf.mpg.de by 24 July, 1600 hours. The subject of the email must start with [TADA]. The assignment must be returned as a PDF and the PDF must contain your name, matriculation number, and e-mail address together with the exact topic of the assignment.

Topic 4 is hard. Grading of this topic takes this hardness into account.

You will need a username and password to access the papers outside the MPI network. Contact the lecturers if you don't know the username or password.

For the topic of the assignment, choose one of the following:

Storylines
Read both [1] and [2]. These papers consider the problem of making sense of large collections of text documents. Analyse both, and discuss. Are the goals clearly defined? Do the authors make principled, or ad-hoc, choices to define the solution? Are the results convincing? Are the algorithms sufficiently evaluated? What extra experiments would you have liked to see? Can you identify possible improvements for the algorithms, how would you approach the problem?
Growing Graphs
Generating synthetic graphs that exhibit realistic properties is a difficult problem with many applications. In the last few years the stochastic Kronecker approach [3,4] got very popular, receiving a lot of praise left and right. Recently, however, there have been a few papers that are more critical about the model [5].

Read [3] and [5] and form your own opinion. Optionally, read (the relevant parts of) [4] for a more in-depth discussion.
Featuring Recursion and Aggregation
In the last twenty years a small group of European researchers has been working on multi-relational data mining: mining data over multiple relations. One key problem is how to generate informative local 'multi relational features'. One of the main solutions is called 'propositionalization'. Read [6] as an example.

In recent years mining graph-structured data has become very popular, in particular in the USA. Initially only simple undirected unweighted and unlabelled graphs were considered. These days more and more work appears on more complex graphs, considering more interesting problem settings. Read [7] as an example.

Discuss the relation between graph mining and multi-relational data mining. Are they identical, similar, or ultimately different? How similar are the problems and the solutions these two papers propose? Is graph mining reinventing propositionalization? Why (not)? Is graph mining reinventing multi-relational data mining? Why (not)?
Here be Dragons — (Hard)
Read [8] and [9]. Try not to panic. Explain what the papers are about. Are the problems interesting, or completely esoteric? Are the papers completely unconnected, superficially connected, somewhat connected, or is [8] just a re-invention of McSherry's work?

Warning: this assignment is open ended, it considers a research question we are currently looking into ourselves. If you do well, you are invited to join our efforts and share in the eternal glory that awaits us.

References

Shahaf, Dafna and Guestrin, Carlos. 2010. Connecting the Dots. In Proceedings of KDD 2010. PDF
Shahaf, Dafna, Guestrin, Carlos and Horvitz, Eric. 2012. Metro Maps of Science. In Proceedings of KDD 2012. PDF
Leskovec, Jure and Faloutsos, Christos. 2005. Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication. In Proceedings of PKDD 2005. PDF
Leskovec, Jure, Chakrabarti, Deepayan, Kleinberg, Jon, Faloutsos, Christos and Ghahramani, Zoubin. 2010. Kronecker Graphs: An Approach to Modeling Networks. Journal of Machine Learning Research 11(Feb):985–1042. PDF
Seshadhri, C., Pinar, A. and Kolda, T.G. 2013. An in-depth analysis of stochastic Kronecker graphs. Journal of the ACM (JACM) 60(2):13-32. PDF
Knobbe, Arno, de Haas, Marc and Siebes, Arno. 2001. Propositionalisation and Aggregates. In Proceedings of PKDD 2001. PDF
Henderson, Keith, Gallagher, Brian, Li, Lei, Akoglu, Leman, Eliassi-Rad, Tina, Tong, Hanghang and Faloutsos, Christos. 2011. It's who you know: graph mining using recursive structural features. In Proceedings of KDD 2011. PDF
Ramon, Jan, Miettinen, Pauli and Vreeken, Jilles. 2013. Finding Bicliques in GF[q]. In Proceedings of ECMLPKDD 2013. PDF
McSherry, Frank. 2001. Spectral Partitioning of Random Graphs. In Proceedings of FOCS 2001. pp. 529-537. PDF