# 2nd assignment (DL 5 June)

Return the assignments by email to tada14@mpi-inf.mpg.de by 5 June, 1600 hours. The subject of the email must start with [TADA]. The assignment must be returned as a PDF and the PDF must contain your name, matriculation number, and e-mail address together with the exact topic of the assignment.

Topic 4 is hard. Grading of this topic takes this hardness into account.

You will need a username and password to access the papers outside the MPI network. Contact the lecturers if you don't know the username or password.

For the topic of the assignment, choose one of the following:

1. All Those Different Models

The two survey articles on tensor factorizations [1, 2] list dozen different tensor decomposition models (even without counting the non-negative variants). Why are there so many differnt models? Are they all needed? Are some just trivial variants of others? What are the most and least important ones (in your opinion) and why?

2. A Tensor or Just a Bunch of Matrices

Read [3] and [4]. These papers present two diffent ways of analysing data using tensors. But do they really need tensors, or are they just working on a collection of matrices? [3] uses time in the third mode: is their use of tensor factorization still justified? Or do they ignore the temporal order? Try to identify other potential weaknesses on these papers.

3. Eigen and Tensor Faces

Read [5]. How does TensorFaces utilize the tensor decompositions. How does it improve over Eigenfaces [6] (if at all)? Could you replicate TensorFaces using just Eigenfaces? Is TensorFaces scalable (in time and space)? Is TensorFaces are special application, or does it exhibit some more general theme when moving from matrices (2-way tensors) to N-way tensors?

4. Noise-tolerant N-way Tiling (Hard)

In [7] the authors present an idea to tile a database, that is, to represent a binary matrix as a union of all-1s submatrices. The authors of [8] extend this idea into 3-way binary tensors. Both of these methods, however, want to build exact representations and they use all-1s submatrices and tensors. In [9] the authors present a way to mine noise-tolerant patterns in N-ary relations. Could we use these noise-tolerant patterns to find an approximate representation of the given binary tensor as a union of dense subtensors? How would that look as a tensor factorization? Both [7] and [8] use the Set Cover algorithm to select the subtensors. Could we use the Set Cover also here? Why or why not? Would the Positive-Negative Partial Set Cover problem [10] be more appropriate? Why or why not?

## References

1. Acar, Evrim, and B Yener. 2009. Unsupervised Multiway Data Analysis: a Literature Survey. IEEE Transactions on Knowledge and Data Engineering 21(1): 6–20. PDF
2. Kolda, Tamara G, and Brett W Bader. 2009. Tensor Decompositions and Applications. SIAM Review 51(3): 455–500. PDF
3. Dunlavy, Daniel, Tamara G Kolda, and Evrim Acar. 2011. Temporal Link Prediction Using Matrix and Tensor Factorizations. ACM Transactions on Knowledge Discovery From Data 5(2): article 10. PDF
4. Nickel, Maximilian, Volker Tresp, and Hans-Peter Kriegel. 2012. Factorizing YAGO: Scalable Machine Learning for Linked Data. In WWW '12, 271–280. PDF
5. Vasilescu, M Alex O, and Demetri Terzopoulos. 2002. Multilinear Analysis of Image Ensembles: TensorFaces. In ECCV '02, 447–460. PDF
6. Turk, Matthew, and Alex Pentland. 1991. Eigenfaces for Recognition. Journal of Cognitive Neuroscience 3 (1): 71–86. PDF
7. Geerts, Floris, Bart Goethals, and Taneli Mielikäinen. 2004. Tiling Databases. In DS '04, 278-289. PDF
8. Bĕlohlávek, Radim, and Vilém Vychodil. 2010. Factorizing Three-Way Binary Data with Triadic Formal Concepts. In KES '10, 471–480. PDF
9. Cerf, Loïc, Jérémy Besson, Kim-Ngan T Nguyen, and Jean-François Boulicaut. 2013. Closed and noise-tolerant patterns in n-ary relations. Data Mining and Knowledge Discovery 26 (3): 574–619. PDF
10. Miettinen, Pauli. 2008. On the Positive-Negative Partial Set Cover Problem. Information Processing Letters 108 (4): 219–221. PDF