Back to main page
General Guidelines on Topics
Select one of the given topics for each part of the course. The
questions below the topic are aimed to help you to get idea what to study.
You do not necessarily need to answer to each of them and you can—and
should—consider other questions, as well.
The essays are not about the answers; they are about justifying the
answers. Simple lists of answers with no justification are not enough (even
with a citation). Rather, you must explain why (you think) the answer is
what you say (for example, explain in your own words how your sources
justify the answer; if there is no source, explain why you think that is
the answer). For some questions, justification can as simple as a one-line
equation; for others, more argumentation is required.
Essay Topics
Warm-up Essay (DL 30 October)
- What is Data Mining?
- Is Data Mining a Science?
Pattern Set Mining (DL 20 November)
- 0/1 Tiling versus Density Tiling
- Pros and cons of both methods?
- When they can be used?
- When they should be used?
- When one is better than the other?
- Can we use one to get other?
- Better algorithms?
- 0/1 Tiling versus Krimp
- Same questions as above
- Can we use parts of one in other (e.g. MDL in tiling or Set Cover in
Krimp)? Would that be useful?
- MDL versus Bayesian Information Criterion (BIC)
- This topic requires readign outside the lecture's scope
- Differences/similarities?
- Pros and cons?
- When one is better than the other?
- Which one should I use?
Graph Mining (DL 18 December)
- Applications of Frequent Subgraph Mining
- Explore some applications studied in scientific literature
- What is the data and how is it modelled as a graph?
- What are the frequent subgraphs? Why are they interesting?
- Are there restrictions on the type of subgraphs (trees, DAGs, etc.)? Why?
- Metro Maps of Science
- Read Metro Maps of Science
(PDF)
- Explain the work
- Relations to other work?
- Your opinion about it (Interesting? Usefull? Boring?)
- Parameters in Connecting the Dots and Trains of Thought
- What are the user-supplied parameters?
- What do they do?
- Are their effects intuitive?
- How to select good values for them?
- Too many? Too few?
- Your opinion about user-supplied parameters in general
Significance Testing (DL 29 January)
- Swap-based methods vs. maximum entropy methods
- What are they and how do they work?
- What are their similarities and differences?
- Is one clearly better than the other? In some special application?
If yes, when?
- Consider both binary and continuous data.
- Method for finding a frequency threshold for significant itemsets vs
other methods for significance testing
- Consider the method of Kirch et al. (2012)
(PDF)
- How does it related to swap based methods?
- What about MaxEnt methods?
- Only binary data.
Tensors (DL 12 February)
- N-way itemset mining v.s. normal
itemset mining
- What's so hard with tensors?
- Why not use N-way Apriori?
- Do also maximal and non-derivable itemsets' definitions generalize
to N modes?
- Noise-tolerant N-way itemsets
- Consider the method by Cerf et al. (2013)
PDF
- Explain the (main) ideas and algorithm.
- Can this method be used to compute Boolean CP decomposition? How?
- Will the BU problem be a problem?
- Applications of tensor decompositions in data mining
- Present some work that applies tensor decompositions in data mining.
- Explain the ideas.
- Are tensors necessary for the idea?
- Is the work good/relevant/interesting?
- Kolda & Bader (2009) have listed number of applied work
(PDF); more recent work can be
found, for example, using Google.