Return the assignments by email to tada14@mpi-inf.mpg.de by 8 May, 1600 hours. The subject of the email must start with [TADA]. The assignment must be returned as a PDF and it must contain your name, matriculation number, and e-mail address together with the exact topic of the assignment.
Topic 4 is hard and contains an optional extra guestion. Grading of this topic takes this hardness into account.
You will need a username and password to access the papers outside the MPI network. Contact the lecturers if you don't know the username or password.
For the topic of the assignment, choose one of the following:
Read [1] and discuss how exploratory data analysis relates to data mining.
Read [2]. The authors introduce a method for detecting correlation in data. They present their approach very confidently. How does it relate to data mining? How strong are their claims? Is the method earth shattering or not? Read [3]. Try to identify and discuss as many (practical and theoretical) strong and weak points of [2] as you can find.
Read [4, 5, 6, 7, 8]. Is Big Data worth all the hype? What are the prospects? What are the (potential) problems? Are these problems insurmountable? What are your opinions about Big Data?
The standard approach to mine frequent itemsets is to
The authors of [10] claim that their method can mine frequent itemsets without candidate generation. This raises the question: where did the candidates go? Discuss whether this claim is valid or not, and why.
(optional) TreeProjection [11] was proposed before [10]. The authors of [10] almost aggressivly discuss that FPGrowth is really different than TreeProjection. Are they really? Why (not)? Discuss, and if possible, give an example where they are (not) different.