Scalable Uncertainty Management, Summer 2011
Lecturer: Dr. Rainer Gemulla
Teaching assistants: Luciano del Corro, Maximilian Dylla
News
Content
This lecture covers techniques to manage massive amounts of uncertain and inconsistent data (e.g., data obtained from potentially distributed, heterogeneous, and conflicting sources on the Web). The lecture focuses on modeling, semantics, and efficient algorithms. We will touch a number of different research areas, including databases, semantic web, machine learning, and artificial intelligence.
List of topics (tentative):
- Incomplete data
- Inconsistent data
- Probabilistic data
- Query evaluation
- Complexity of query evaluation
- Approximate query evaluation
- Data mining on uncertain data
- Probabilistic graphical models
- Distributed processing
- Applications
Organization
- This is a 2+2 lecture (6 credit points).
- There is no need to pre-register; registration is performed during or after the first lecture.
- Lecture: Friday, 2:15PM - 3:45PM, room 021, building E1.4 (MPI-INF), starting 15.04.2011
- Exercises: Friday, 4:00PM-5:30PM, room 021, building E1.4 (MPI-INF), starting May 6
Prerequisites
All necessary concepts and techniques will be introduced in the lecture. Basic knowledge of database systems and probability theory is advantageous.
Requirements for the certificate
Details about the exam can be found
here. Please select time slots suitable for you immediately. The deadline for registration is July 8, 2011.
- You must successfully participate in an oral exam at the end of the semester.
- You must present a convincing solution of at least one exercise in the exercise group (no prior hand-in necessary).
- You should actively participate in the exercise group. If you want, you can hand in your solutions to get some feedback. To so so, bring your solutions in paper form to the lecture before the exercise group. It is OK to hand in solutions created by a group of students.
Lecture notes
- 00: Organization (pdf)
- 01: Introduction (pdf)
- 02: Incomplete databases (pdf); last updated: May 5, 14:00
- 03: Datalog & provenance (pdf); last updated: June 2, 18:00
- 04: Probabilistic databases (pdf); last updated: June 6, 10:00
- 05: Query evaluation on probabilistic databases (pdf); last updated: July 1, 13:00
- 06: Markov logic (pdf); last updated: July 15, 16:30
Exercises
- 01: Relational algebra & representation Systems (pdf)
- 02: Queries on incomplete databases (pdf)
- 03: Datalog and provenance (pdf)
- 04: Finite probability (pdf)
- 05: Probabilistic databases (pdf); bugs fixed in Exercise 5d and 6
- 06: Query evaluation (pdf); updated with new exercises
- 07: Exact intensional query evaluation (pdf)
- 08: Approximate intensional query evaluation & Markov logic (pdf)
- 09: Probabilistic graphical models (pdf); this exercise is covered in the next two exercise groups
Suggested reading
- Dan Suciu, Dan Olteanu, Christopher RĂ©, Christoph Koch, Probabilistic Databases, Morgan&Claypool, 2011.
- Charu C. Aggarwal (Ed.), Managing and Mining Uncertain Data, Springer, 2009.
- Daphne Koller, Nir Friedman, Probabilistic Graphical Models: Principles and Techniques, The MIT Press, 2009.
- See lecture notes for additional references.