Seminar "Probabilistic Databases"
Dr. Martin Theobald
Dr. Ralf Schenkel
- Certificates are available in room 402 in the MPI building (E1.4).
- The first
meeting took on
Tuesday, April 21, at 14 c.t. in room 433, building E 1.4.
- Regular meetings
are on Tuesdays at 14 c.t. in room 433 (rotunda 4th floor),
building E 1.4. starting May 12.
- Prerequisite for attending the
seminar is some knowledge of database systems in general. We
participants have successfully participated in the basic course
"Informationssysteme" or in a database systems course.
- We checked
that any linked papers are available from the MPI-INF
network (last check: April 20, 2009, 2pm). If you encounter any
problems accessing a paper, please
Contents of the Seminar
The seminar discusses solutions for the management
of probabilistic and uncertain data, with a focus on current system approaches.
Requirements for the Certificate
- Attend all talks - not just your own. We will keep track of
participation! If you are sick, please let us know in advance by
writing a short mail.
- Read your papers and other related literature.
- Contact your tutor before May 19 and present a brief draft of your intended talk.
- Prepare a 45 minutes talk about your topic that introduces the
matter to your fellow students. This is about twice the size of a
conference talk, so there should be enough time to present some
background information on the topic. Even though there are usually
papers listed for each topic, you are not expected to talk in detail
about all of them; in fact, this is usually impossible given the time
limit. Instead, try to pick the most interesting,
challenging or futuristic contribution(s) from at least one of them for
You are very welcome to discuss any potential weaknesses or problems of
the paper(s) in your talk. If you are unsure about what to present, ask
your tutor. Note that, even though the conference slides of some papers
are available on the Web, we expect that you prepare your own slides
(which may be, of course, inspired by the original slides).
You must send
your slides to and discuss them with your tutor by the Friday before
your talk (4pm) at the latest, otherwise your talk will be cancelled
(this is a hard deadline).
the slides and the presentation
itself must be given in English.
Otherwise, some students will not be able to follow all talks, which is
one of the main purposes of the seminar. After the presentations, there
will be a discussion in which all fellow students are encouraged to ask
questions. We will keep track of your participation (i.e., if you ask
questions) and, of course, the answers of the presenter.
- For each talk, a second student will be preselected as an
opponent. His or her role is to prepare tough questions to challenge
the paper presented in the talk (not the talk itself or the speaker!).
To make life a little easier, the preliminary version of the
slides will be sent to the opponent on the Friday before the talk.
However, as interaction is an important part of science, we expect that
every participant actively participates in the discussions.
- Two weeks after the talk, the presenter and the opponent together have to submit
a short (usually not longer than 5
pages) summary of the topic of the talk. The focus of this report
should be on pointing out strengths
and weaknesses of the approach presented in the paper(s), not just
summarizing the paper(s).
- After your talk, there will be another meeting with your tutor
and Martin and/or Ralf to give feedback on the talk and the report.
- In other words: Your final grade will be influenced by the
following components: Your oral presentation, the knowledge about your
topic (your answers to questions after the presentation), the questions
you asked as opponent, your general
participation in the seminar, and your two written reports (one in the
role of presenter, one in the role of opponent).
- Tuesday, May 19, 2009, 14:15: Martin Theobald (opponent: Ralf Schenkel)
- Overview of Probabilistic Databases and Uncertain Data Management
- Slides [PPT | PDF]
- Tuesday, May 26, 2009, 14:15: Yanchuan Li (tutor Martin Theobald, opponent Christian Fechner)
- Tuesday, June 2, 2009, 14:15: Florian Gross (tutor Ralf Schenkel, opponent Donjete Ibrahimi)
- Tuesday, June 9, 2009, 14:15: Mohamed Yahya (tutor Martin Theobald, opponent Christina Teflioudi)
- Omar Benjelloun, Anish Das Sarma, Alon Y. Halevy, Jennifer Widom:
ULDBs: Databases with Uncertainty and Lineage.
In VLDB 2006, pp. 953-964
- Michi Mutsuzaki, Martin Theobald, Ander de Keijzer, Jennifer Widom, Parag Agrawal, Omar Benjelloun, Anish Das Sarma, Raghotham Murthy, Tomoe Sugihara:
Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS (demo paper).
In CIDR 2007
- Slides [PDF]
- Report [PDF]
- Tuesday, June 16, 2009, 14:15: Christian Fechner (tutor Ralf Schenkel, opponent Mohamed Yahya)
- Tuesday, June 16, 2009, 15:15: Christina Teflioudi (tutor Ralf Schenkel, opponent Martin Vasileski)
- Reynold Cheng, Yuni Xia, Sunil Prabhakar, Rahul Shah, Jeffrey Scott Vitter:
Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data.
In VLDB 2004, pp. 876-887
- Sarvjeet Singh, Chris Mayfield, Sagar Mittal, Sunil Prabhakar, Susanne E. Hambrusch, Rahul Shah:
Orion 2.0: native support for uncertain data (demo paper).
SIGMOD 2008, pp. 1239-1242
- Slides [PPT | PDF]
- Report [PDF]
Efficient Query Processing
- Tuesday, June 23, 2009, 14:15: Stefan Richter (tutor Klaus Berberich, opponent Florian Gross)
- Tuesday, June 30, 2009, 14:15: Timm Meiser (tutor Srikanta Bedathur, opponent Stefan Richter)
- Tuesday, July 7, 2009, 14:15: Donjeta Ibrahimi (tutor Martin Theobald, opponent Sarath Kumar Kondreddi)
- Tuesday, July 14, 2009, 14:15: David Philippi (tutor Ralf Schenkel, opponent Timm Meiser)
- Tuesday, July 21, 2009, 14:15: Sarath Kumar Kondreddi (tutor Martin Theobald, opponent Yanchuan Li)
- Tuesday, July 21, 2009, 15:15: Martin Vasileski (tutor Maya Ramanath, opponent David Philippi)
- Tuesday, July 28, 2009, 14:15: Jörg Schad (tutor Martin Theobald, opponent Florian Gross)