Scalable Uncertainty Management, Summer 2012
Lecturer: Dr. Rainer Gemulla
Teaching assistants: Kaustubh Beedkar, Luciano Del Corro
News
Content
This lecture covers techniques to manage massive amounts of uncertain and inconsistent data (e.g., data obtained from potentially distributed, heterogeneous, and conflicting sources on the Web). The lecture focuses on modeling, semantics, and efficient algorithms. We will touch a number of different research areas, including databases, semantic web, machine learning, and artificial intelligence.
List of topics (tentative):
- Incomplete data
- Inconsistent data
- Probabilistic data
- Query evaluation
- Complexity of query evaluation
- Approximate query evaluation
- Data mining on uncertain data
- Probabilistic graphical models
- Distributed processing
- Applications
Organization
- This is a 2+2 lecture (6 credit points).
- Lecture: Friday, 2:15PM - 3:45PM, room 021, building E1.4 (MPI-INF), starting April 20, 2012.
- Exercises: Wed, 2:15PM-3:45PM (E2.4, R216); Thu, 10:15AM-11:45AM (E1.3, R107); starting May 9, 2012.
- Contact: sum12@mpi-inf.mpg.de
Registration
You
must register in HISPOS. Please also register via
e-mail to receive news and
updates from us.
Prerequisites
Basic knowledge of probability theory is required. Basic knowledge of database systems advantageous.
Requirements for the certificate
- You must successfully participate in an oral exam at the end of the semester.
- You must present a convincing solution of at least two exercises in the exercise group. You must hand in the solutions of the exercises you want to present until Tuesday, 9AM (i.e., the day before the exercise group) via e-mail. You gain bonus points for the exam by presenting additional exercises (up to 2) and solving bonus exercises (up to 3, send in until Wed, 2PM).
- You should actively participate in the exercise group. If you want, you can hand in your solutions via e-mail to get feedback. It is OK to hand in solutions created by a group of students.
Lecture notes
- 00: Organization (pdf)
- 01: Introduction (pdf)
- 02: Incomplete databases (pdf)
- 03: Datalog & provenance (pdf)
- 04: Probabilistic databases (pdf)
- 05: Query evaluation on probabilistic databases (pdf)
- 06: Markov logic (pdf, Jul 20, 1PM)
Assignments
- 01: Relational algebra, incomplete databases, representation systems (pdf)
- 02: Query evaluation in incomplete databases (pdf)
- 03: Datalog (pdf)
- 04: Provenance and finite probability (pdf, psql); see probability refresher in 04
- 05: Relational calculus and probabilistic databases (pdf)
- 06: Extensional query evaluation I (pdf)
- 07: Extensional query evaluation II (pdf, psql)
- 08: Intensional query evaluation (pdf, Jun 29, 5PM)
- 09: Intensional query evaluation II(pdf, Jul 6, 4PM)
- 10: Markov logic networks I (pdf, MLN training data)
- 11: Markov logic networks II (pdf)
Suggested reading
- Dan Suciu, Dan Olteanu, Christopher Re, Christoph Koch, Probabilistic Databases, Morgan&Claypool, 2011.
- Charu C. Aggarwal (Ed.), Managing and Mining Uncertain Data , Springer, 2009.
- Daphne Koller, Nir Friedman, Probabilistic Graphical Models: Principles and Techniques , The MIT Press, 2009.
- See lecture notes for additional references.