Decoration
max planck institut
informatik
mpii logo Minerva of the Max Planck Society
 

Algorithms on Strings

  • Register for the course discussion group
  • The final ninth problemset is available. The deadline is 26.07.

Basic Information

Lectures: Wednesday 10:15-12 (first lecture) or 14:15-16:00 (all remaining lectures), room 029 (ground floor of the MPI-SWS building E1.5)
Lecturers: Paweł Gawrychowski
Textbooks:
"Jewels of Stringology" by Crochemore and Rytter
"Algorithms on Strings, Trees, and Sequences" by Gusfield
"Pattern Matching Algorithms" by Apostolico and Galil
Tutorials: Monday 10:15-12:00 rotunda at 3rd floor (of the MPI building), Monday 14:15-16:00 room 022 (ground floor of the MPI building), Wednesday 10:15-12:00 room 023 (ground floor of the MPI building)
Tutors: Blaga Davidova and Igor Stassiy
Exam dates: Exam: between 29.07 and 09.08, see the questions; Re-exam: 26.09 (by appointment, write me an email)

Description

Don't know what algorithms on strings are? Look here to get some idea.

This will be an introduction to the world of stringology, or algorithms on strings. Strings are one of the basic data types, and the ability to quickly process them is crucial given the massive use of text processing in many applications, especially those connected to computational biology. Moreover, efficient solutions to many problems in this area are full of simple yet beutiful ideas, and one can find many algorithmic jewels there.

The aim of the course will be to present a basic string processing toolkit, including:

  1. basic pattern matching algorithms,
  2. structures for text indexing (suffix arrays, suffix trees, DAWGs),
  3. edit distance, longest common subsequence,
  4. pattern matching with don't care symbols, errors, and mistakes,
  5. text compression (Huffman encoding and Lempel-Ziv method),
  6. two dimensional pattern matching,
  7. multiple pattern matching,
  8. approximations for the shortest superstring problem.

While the course will be geared towards the more theoretical aspects, we will try to see how some of the solutions we discussed work in practice. Hence you should have at least a basic familiarity with some programming language.

Lectures

Date Topics Lecture notes
Apr 17, 10:15-12:00 Introduction, basic concepts, Karp-Rabin algorithm. Slides
Apr 24 Knuth-Morris-Pratt algorithm and some of its extensions. Problemset 1 Hints for problemset 1 Slides
May 1 No lecture.
May 8 Porat-Porat algorithm. Longest common subsequence vs edit distance. Problemset 2 Hints for problemset 2 Slides
May 15 Longest common subsequence in linear space. Edit distance in O(nd) as used in diff. Problemset 3 Hints for problemset 3 Slides
May 22 Bit packing in LCS. Introduction to suffix arrays. Problemset 4 Hints for problemset 4 Slides
May 29 Suffix arrays in linear time. Problemset 5 Hints for problemset 5 Slides
June 5 No lecture.
June 12 Decreasing the search time to O(m+lgn). Range minimum queries in constant time. Problemset 6 Hints for problemset 6 Slides
June 19 Pattern matching with don't cares/mismatches via FFT and kangaroo hopping. Slides
June 26 Introduction to equations on words. Problemset 7 Hints for problemset 7 Slides
July 3 Lempel-Ziv compression with applications to detecting regularities. Problemset 8 Hints for problemset 8 Slides
July 10 Burrows-Wheeler transform with applications to indexing. Problemset 9 Hints for problemset 9 Slides
July 17 Approximating the shortest superstring. Recap. Slides
Prerequisites: An undergraduate course in algorithms is a must. Additionally, some basic familiarity with programming (in any reasonable language) would be welcome.
Class Info: This is a 6-credit-point advanced lecture. There will be one 90 minutes lecture and one 90 minutes tutorial session per week.
Grading policy: Your final grade will be determined by your performance in homework problems, programming assignments, and final exam:
Homeworks: 40%
Programming assignments: 20%
Exam: 40%

More Courses of the Algorithms and Complexity Group

Search MPII (type ? for help)