max planck institut

informatik

informatik

- The exam and re-exam will be in
**room 021 at the MPII building (E1.4),**ground floor. - There is
**no**lecture on 27 November. The course schedule is updated. - Students
must register to the final exam in HISPOS by
**4th of November**. - The final exam will be held on 19th of February; the re-exam will be on 19th of March

Two hours per week, on Tuesdays, from noon till 2 pm. Place: room 007 in building E2 1 (bioinformatics).

The schedule of the course and the slides are here.

The articles related to each topic are listed below. To access the PDFs, you need the username and password.

The essay topics are here.

The essays must be returned in PDF format via e-mail to the lecturer (see
slides or lecturer's home page for the address). The deadline for the essays,
unless otherwise specified, is two weeks from the date the topics were given
**at 14:00 hours (2 pm)**. Failure to submit the essay on time
*will give you a failed grade*. The time of the submission is the
timestamp of the mail as shown by the lecturer's e-mail system. It is advisable
to send the essays before noon, so that if you have not received a response,
you can ask the status of your essay before the lecture.

Every essay you return must have the following information:

- Your name
- Your matriculation number
- Your e-mail address
- The topic of the essay (even if there was only one given)

In addition, it is advisable to start the subject line of the mail with "DTDM" and have the word "essay" somewhere in the subject. This helps me to notice the purpose of the mail and (hopefully) prevents the spam filters from filtering the mails.

There are no page limits for the essays, but I expect a good essay to take between two to five A4-pages in 10pt font and 2.5cm margins all around (you are free to use other font sizes and margins as long as the text stays legible).

The essays must follow the normal scientific citation practices. Substantial failure to do so will cause a failure of the essay. The essays may contain (numbered) section and subsection headings if the author so prefers.

The course will provide an overview of some important topics in data mining. The purpose of the course is to concentrate on the ideas and intuition behind these topics, with the aim that after the course, the students can follow the current research on the topics.

The exact topics covered on this lecture will be announced later (and students' preferences can be considered), but tentatively we will cover at least pattern set mining, graph mining, and significance testing (in pattern set mining).

The course will have two hours of lectures every week. There will not be any homework sessions. Instead, students have to write longer essays/reports on the topics covered on the lectures.

Students are expected to have passed either *Information Retrieval &
Data Mining* or *Machine Learning* core lectures, or hold equivalent
knowledge.

- At least four (4) passing grades from the essays (out of five essays)
- Final exam

The essays are graded in failed/passed/excellent grades. Out of the five essays,
you need to have a passing grade from at least four to be allowed to take the
final exam. If you are allowed to take and pass the final exam, then each
*excellent* grade from essays will improve your final grade by 1/3 of
what you got from the final exam. That is, if you got 2.0 from the final exam
and you have one excellent grade, your final grade will be 1.7; if you have three or more excellent grades, it will be 1.0.

Topic I, Pattern Set Mining

- Geerts, F., Goethals, B. & Mielikäinen, T., 2004. Tiling databases. In 7th International Conference on Discovery Science. pp. 77–122. PDF
- Gionis, A., Mannila, H. & SeppĂ¤nen, J.K., 2004. Geometric and Combinatorial Tiles in 0–1 Data. In 8th European Conference on Principles and Practice of Knowledge Discovery in Databases. pp. 173–184. PDF
- Tatti, N. & Vreeken, J., 2012. Discovering Descriptive Tile Trees By Mining Optimal Geometric Subtiles. In 2012 European Conference on Machine Learning and Priciples and Practice on Knowledge Discovery in Databases. pp. 9–24. PDF
- Vreeken, J., van Leeuwen, M. & Siebes, A., 2011. Krimp: mining itemsets that compress. Data Mining and Knowledge Discovery 23(1), pp.169–214. PDF

Topic II, Graph Mining

- Inokuchi, A., Washio, T. & Motoda, H., 2002. An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data. In 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 13–23. PDF
- Yan, X. & Han, J., 2002. gSpan: Graph-Based Substructure Pattern Mining. In 2nd IEEE International Conference on Data Mining, pp. 721–724. PDF (extended techincal report PDF)
- Shahaf, D. & Guestrin, C., 2010. Connecting the Dots Between News Articles. In 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 623–632. PDF
- Shahaf, D. & Guestrin, C., 2012. Connecting Two (or Less) Dots: Discovering Structure in News Articles. ACM Transactions on Knowledge Discovery from Data 5(4), article 24. PDF
- Shahaf, D., Guestrin, C. & Horvitz, E., 2012a. Trains of Thought: Generating Information Maps. In 21st International World Wide Web Conference, pp. 899–908. PDF
- Shahaf, D., Guestrin, C. & Horvitz, E., 2012b. Metro Maps of Science. In 18th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1122–1130. PDF

Topic III, Significance Testing

- Kirsch, A., Mitzenmacher, M., Pietracaprina, A., Pucci, G., Upfal, E. & Vandin, F., 2012. An Efficient Rigorous Approach for Identifying Statistically Significant Frequent Itemsets. Journal of the ACM, 59(3), article 12. PDF
- Gionis, A., Mannila, H., Mielikäinen, T. & Tsaparas, P., 2007. Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data, 1(3), article 14. PDF
- Ojala, M., Vuokko, N., Kallio, A., Haiminen, N. & Mannila, H., 2009. Randomization methods for assessing data analysis results on real-valued matrices. Statistical Analysis and Data Mining, 2, pp. 209–230. PDF
- Ojala, M., 2010. Assessing Data Mining Results on Matrices with Randomization. In 10th IEEE International Conference on Data Mining, pp. 959–964. PDF
- De Bie, T., 2010. Maximum entropy models and subjective interestingness: An application to tiles in binary databases. Data Mining and Knowledge Discovery, 23(3), pp. 407–446. PDF
- Kontonasios, K.-N. & De Bie, T., 2010. An information-theoretic approach to finding informative noisy tiles in binary databases. In 2010 SIAM International Conference on Data Mining, pp. 153–164. PDF
- Kontonasios, K.-N., Vreeken, J. & De Bie, T., 2011. Maximum Entropy Modelling for Assessing Results on Real-Valued Data. In 11th IEEE International Conference on Data Mining, pp. 350–359. PDF

Topic IV, Tensors

- Kolda, T. G. & Bader, B. W., 2009. Tensor Decompositions and Applications. SIAM Review, 51(3), pp. 455–500. PDF
- Cerf, L., Besson, J., Robardet, C. & Boulicaut, J.-F., 2009. Closed patterns meet n-ary relations. ACM Transactions on Knowledge Discovery from Data, 3(1), article 3. PDF
- Cerf, L., Besson, J., Nguyen, K.-N. T. & Boulicaut, J.-F., 2013.
Closed and noise-tolerant patterns in
n -ary relations. Data Mining and Knowledge Discovery, 26(3), pp.574–619. PDF - Miettinen, P., 2011. Boolean Tensor Factorizations. In 11th IEEE International Conference on Data Mining. pp. 447–456. PDF
- Nickel, M., Tresp, V. & Kriegel, H.-P., 2011. A Three-Way Model for Collective Learning on Multi-Relational Data. In 28th International Conference on Machine Learning. pp. 809–816. PDF
- Nickel, M., Tresp, V. & Kriegel, H.-P., 2012. Factorizing YAGO: Scalable Machine Learning for Linked Data. In 21st International World Wide Web Conference. pp. 271–280. PDF

- Mohammed J. Zaki, Wagner Meira Jr.
*Fundamentals of Data Mining Algorithms*, manuscript (pdf, requires username and password) - Pang-Ning Tan, Michael Steinbach, Vipin Kumar.
*Introduction to Data Mining*, Addison-Wesley, 2006. (Website) - Jiawei Han, Micheline Kamber, Jian Pei.
*Data Mining - Concepts and Techniques*, 3rd ed., Morgan Kaufmann, 2011. (Website)

- Homepage MPI-INF
- About the Institute
- Departments:
- News
- People
- Services
- Library
- Doctoral Research Program
- Max Planck Center