MVDM WS 2014–15
Seminar on MassiveScale Graph Analysis (SS 2015)
Dr. Vinay Setty, Dr. Stephan Seufert, Mohamed Yahya, Sairam Gurajada
Contents
The seminar covers state of the art topics related to paradigms, systems and algorithms for largescale graph processing. All talks will be based on research papers chosen from both academia and industry and published in reputable venues such as VLDB, SIGMOD, KDD, OSDI etc.
Organization
 Regular meetings take place on Fridays at 10:0011:30 in Room 021, Building E1.4 (MPIINF).
 The kickoff meeting will take place on April 24th. If you want to participate in the seminar, you have to attend this meeting to register!
 Students who want to attend the seminar should have some background knowledge in Databases, Algorithms and Distributed Systems.
Requirements for the Certificate
 Attend all talks  not just your own. We will keep track of participation! If you are sick, please let us know in advance by writing a short mail.
 Read your paper and other related literature.
 Contact your tutor at least two weeks before your talk and present an intended outline of your talk.
 Prepare a 45 minute talk about your topic that introduces the matter to your fellow students. This is about twice the size of a regular conference talk, so there should be enough time to present some background information on the topic. Try to pick the most interesting, challenging or futuristic contribution(s) from the paper. You are very welcome to discuss any potential weaknesses or problems of the paper in your talk. If you are unsure about what to present, ask your tutor. Note that, even though the conference slides of some papers are available on the Web, we expect you to prepare your own slides.
 You must send your slides to and discuss them with your tutor by the Monday before your talk (by 16:00) at the latest, otherwise your talk will be canceled (this is a hard deadline).
 Both the slides and the presentation itself must be given in English. Otherwise, some students will not be able to follow all talks, which is one of the main purposes of the seminar. After the presentation, there will be a discussion in which all fellow students are encouraged to ask questions. We will keep track of your participation (i.e., if you ask questions) and, of course, the answers of the presenter.
 For each talk, two fellow students will be preselected as opponents. Their role is to prepare tough questions to challenge the paper presented in the talk (not the talk itself or the speaker!). To make life a little easier, the preliminary version of the slides will be sent to the opponents on the Monday before the talk. However, as interaction is an important part of science, we expect that every participant actively participates in the discussions.
 Four weeks after the talk, the presenter has to submit a written summary of the topic of the talk. The focus of this report should be on pointing out strengths and weaknesses of the approach presented in the paper, not just summarizing the paper. Reports have to be written using this template and are up to 8 pages.
 Finally, there will be another meeting to give feedback on your talk and report.
 Your final grade will be influenced by the following components: Your oral presentation, the knowledge about your topic (your answers to questions after the presentation), the questions you asked as opponent, your general participation in the seminar, and your written report.
Agenda

24/04/2015 (Kickoff Meeting)

08/05/2015 (Map/Reduce for Graphs)
 Cohen: Graph twiddling in a MapReduce world, Computing in Science & Engineering 2009, [Paper][pdf]
 Lin and Schatz: Design patterns for efficient graph algorithms in MapReduce, Workshop on Mining and Learning with Graphs 2010, [Paper] [pdf]
 Additional Reference (to introduce Map/Reduce) Lin et al.: Dataintensive text processing with MapReduce, 2010, [Book]
 Speaker: Ankur Sharma /
Opponents: Laurent Linden, Maha Aburahma /
Tutor: Sairam Gurajada
 [Slides] [Report]

15/05/2015 (Graph Analysis Using Map/Reduce)
 Kang et al.: Pegasus: A petascale graph mining system implementation and observations, ICDM 2009, [Paper][pdf]
 Kang et al.: PEGASUS: mining petascale graphs, Knowledge and Information Systems 2011, [Paper][pdf]
 Speaker: Laurent Linden / Opponents: Ankur Sharma, Helge Dombrowski / Tutor: Sairam Gurajada
 [Slides] [Report]

22/05/2015 (Pregel)
 Malewicz et al.: Pregel: a system for largescale graph processing, SIGMOD 2010, [Paper][pdf]
 Salihoglu and Widom: Optimizing Graph Algorithms on Pregellike System, VLDB 2014, [Paper]
 Speaker: Ali Shah Opponents: Margarita Salyaeva, Beata Wojciak/ Tutor: Sairam Gurajada
 [Slides] [Report]

29/05/2015 (GraphLab)
 Gonzalez et al.: PowerGraph: Distributed GraphParallel Computation on Natural Graphs, OSDI 2012, [Paper/pdf]
 Low et al.: Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, VLDB 2012, [Paper][pdf]
 Speaker: Susu Sun / Opponents: Ankur Sharma, Peter Matthias Manderscheid / Tutor: Stephan Seufert
 [Slides] [Report]

05/06/2015 (Graph Partitioning)
 Stanton and Kliot: Streaming Graph Partitioning for Large Distributed Graphs, KDD 2012, [Paper][pdf]
 Tsourakakis et al.: FENNEL: streaming graph partitioning for massive scale graphs, WSDM 2014, [Paper][pdf]
 Speaker: Daniel Spanier / Opponents: Maha Aburahma, Susu Sun/ Tutor: Mohamed Yahya
 [Slides] [Report]

12/06/2015 (LargeScale Graph Engines)
 Kyrola et al.: GraphChi: LargeScale Graph Computation on Just a PC, OSDI 2012, [Paper][pdf]
 Shao et al.: Trinity: A Distributed Graph Engine on a Memory Cloud., SIGMOD 2013, [Paper][pdf]
 Speaker: Margarita Salyaeva / Opponents: Daniel Spanier, Laurent Linden / Tutor: Vinay Setty
 [Slides] [Report]

19/06/2015 (Comparison of Approaches)
 Lu et al.: LargeScale Distributed Graph Computing Frameworks: An Experimental Evaluation, VLDB 2014, [Paper/pdf]
 McCune et al.: Thinking Like a Vertex: a Survey of VertexCentric Frameworks for LargeScale Distributed Graph Processing., ACM Computing Surveys 2015, [Paper/pdf]
 Speaker: Maha Aburahma / Opponents: Ali Shah, Helge Dombrowski / Tutor: Vinay Setty
 [Slides] [Report]

26/06/2015 (RDF Graph Processing)
 Neumann et al.: RDF3X: a RISCstyle engine for RDF, VLDB 2008, [Paper/pdf]
 Huang et al.: Scalable SPARQL Querying of Large RDF Graphs, VLDB 2011, [Paper/pdf]
 Speaker: Helge Dombrowski / Opponents: Susu Sun, Margarita Salyaeva / Tutor: Mohamed Yahya
 [Slides] [Report]

03/07/2015 (Graph Streams) (Different location only for this talk: E1.5, Room 005)
 Aggarwal et al.: On dense pattern mining in graph streams., VLDB 2010, [Paper][pdf]
(The speaker will focus entirely on the second paper! since it takes too long to cover both papers)
 Chen et al.: Continuous Subgraph Pattern Search over Certain and Uncertain Graph Streams, TKDE 2010, [Paper]
 Speaker: Peter Matthias Manderscheid / Opponents: Beata Wojciak, Ali Shah / Tutor: Vinay Setty
 [Slides] [Report]

10/07/2015 (Graph Algorithms: Dense Subgraphs and Graph Sketches)
 Angel et al.: Dense subgraph maintenance under streaming edge weight updates for realtime story identification, VLDB 2014, [Paper][pdf]
 Ahn et al.: Graph sketches: sparsification, spanners, and subgraphs, PODS 2012, [Paper][pdf]
 Speaker: Beata Wojciak / Opponents: Peter Matthias Manderscheid, Daniel Spanier / Tutor: Mohamed Yahya
 [Slides] [Report]