Return the assignments by email to tada14@mpi-inf.mpg.de by 24 July, 1600 hours. The subject of the email must start with [TADA]. The assignment must be returned as a PDF and the PDF must contain your name, matriculation number, and e-mail address together with the exact topic of the assignment.
Topic 4 is hard. Grading of this topic takes this hardness into account.
You will need a username and password to access the papers outside the MPI network. Contact the lecturers if you don't know the username or password.
For the topic of the assignment, choose one of the following:
Read both [1] and [2]. These papers consider the problem of making sense of large collections of text documents. Analyse both, and discuss. Are the goals clearly defined? Do the authors make principled, or ad-hoc, choices to define the solution? Are the results convincing? Are the algorithms sufficiently evaluated? What extra experiments would you have liked to see? Can you identify possible improvements for the algorithms, how would you approach the problem?
Generating synthetic graphs that exhibit realistic properties is a difficult problem with many applications. In the last few years the stochastic Kronecker approach [3,4] got very popular, receiving a lot of praise left and right. Recently, however, there have been a few papers that are more critical about the model [5].
Read [3] and [5] and form your own opinion. Optionally, read (the relevant parts of) [4] for a more in-depth discussion.
In the last twenty years a small group of European researchers has been working on multi-relational data mining: mining data over multiple relations. One key problem is how to generate informative local 'multi relational features'. One of the main solutions is called 'propositionalization'. Read [6] as an example.
In recent years mining graph-structured data has become very popular, in particular in the USA. Initially only simple undirected unweighted and unlabelled graphs were considered. These days more and more work appears on more complex graphs, considering more interesting problem settings. Read [7] as an example.
Discuss the relation between graph mining and multi-relational data mining. Are they identical, similar, or ultimately different? How similar are the problems and the solutions these two papers propose? Is graph mining reinventing propositionalization? Why (not)? Is graph mining reinventing multi-relational data mining? Why (not)?
Read [8] and [9]. Try not to panic. Explain what the papers are about. Are the problems interesting, or completely esoteric? Are the papers completely unconnected, superficially connected, somewhat connected, or is [8] just a re-invention of McSherry's work?
Warning: this assignment is open ended, it considers a research question we are currently looking into ourselves. If you do well, you are invited to join our efforts and share in the eternal glory that awaits us.