A | B | C | D | E | F 
 G | H | I | J | K | L | M 
 N | O | P | Q | R | S | T 
 U | V | W | X | Y | Z 
max planck institut
informatik
mpii logo Minerva of the Max Planck Society

Seminar

Scene Understanding Techniques for Geometric and Visual Data Sets
(Block Seminar, Summer Semester 2011)

Scope

Content

Scene understanding is a hot topic in Computer Vision and, more recently, also an emerging topic in Computer Graphics. The goal is to teach a computer how to "understand" images and/or 3D scenes. A typical task would be to automatically recgonize and label objects in images, where classes of objects of interest could for example include "dogs", "people", "cars", or different types of road signs. Typically, the algorithm would be trained on a set of labeled data and then be required to recognize similar objects in unknown scenes.

This seminar will introduce some fundamental techniques that are commonly used to model and solve such recognition problems. Furthermore, we will look at a number of example systems. We will include topics from both computer vision (image understanding) as well as computer graphics (geometry understanding). We will see connections between these two areas, which can serve as a preparation for further research in the intersection of these two fields.

Teaching Goals

The seminar adresses Master- and Bachelor-students (after the second year).

You can benefit by:

  • Gainig insight in an important and recent research field.
  • Learning how to read and work with raw scientific material.
  • Practicing presentation skills.

The participation marks a good starting point for an upcoming thesis, in particular in the intersection of graphics and vision ("geometry understanding").

Organization

In the starting phase, we will give an overview on the entire research area, assign the topics and advisors followed by some guidelines about the preparation of the own talk and the write up.

The preparation phase is used to share knowledge among the participants by establishing reading groups.

In the presentation phase, all talks will be held within one week and afterwards the write ups will be prepared and handed in.

Requirements:

  • Registration for the seminar!
  • Registration in the HISPOS system by the mid of August!
  • Active participation in the reading groups!
  • Giving a talk on the assigned topic! About 30 min + 5 min for questions!
  • Preparing a write up of approximately 8-10 pages summarizing the assigned topic!
symmetry detection image

News

Schedule

Seminar Talks Schedule

Slides

Contact

Instructor:

Teaching Assistants:

Seminar Topics

We group all topics into two classe: Image related and geometry related techniques. The papers are mostly publications in the area of computer vision and graphics, respectively. Here is how we are going to handle the assignment of topics:

So now, here is the list of topics. Please refer to "References" below for citations and download links.

Reading Group Topics

The following references are discussed in the reading group. These papers are not available as seminar topics. Participants should read these papers prior to the reading group meeting, where we will discuss the content in a moderated discussion. Active participation will be taken into account positively in the final grading.

(R1) Feature Detection in Images: SIFT Features. (22.06.2011)

Paper: Distinctive image features from scale-invariant keypoints [Lowe, 2004].

This paper introduces one of the most frequently used feature detector and descriptor for 2D images. It serves as an example how feature detection techniques work.

(R2) Matching geometry: Robust global registration (29.06.2011)

Paper: Robust global registration [Gelfand et al., 2005].

This paper looks at the geometry side: how can we define feature detectors and feature descriptors for 3D geometry? Furthermore, it uses a quadratic assignment model for matching geometry, implemented by a branch-and-bound method. This approach lays out the foundation for later, more complex 3D matching techniques. Many methods in both 2D and 3D recognition are based on a similar fundamental structure.

(R3) Example for constellation models: Pictorial structures
(selected chapters, 06.07.2011)

Paper: Pictorial Structures for Object recognition [Felzenszwalb and Huttenlocher, 2005].

This paper discusses how to find arrangements of features connected by a tree-structure graph. It serves as an example of how several feature points can be combined into a more complex constellation and how these constellations are learned from trainding data and detected fully automatically in new images. Please note that this paper is rather lengthy - you should start reading it early enough.


Feature Detection and Descriptors

The first set of papers deal with finding interesting feature points (a.k.a. key points) and how to define descriptors of their little local patches of geometry in order to retrieve similar features efficiently. We will look at techniques targeting both images and geometry:

Images Geometry

(FI-1) Histogram of Oriented Gradients [Dalal and Triggs, 2005] (taken)

(FG-1) Rotation invariant spherical harmonic descriptors [Kazhdan et al., 2003]

(FI-2) Real-time feature matching [Viola and Jones, 2001] (taken)

(FG-2) SIFT on point clouds [Li and Guskov, 2005]

(FI-3) More general features: self-similarity [Shechtman and Irani, 2007] (taken)

 

Bag-of-Words Techniques

Bag-of-words techniques describe an image or a 3D object by listing a set of feature descriptors, (more or less) irrespectively of their spatial arrangement.

Images Geometry

(BWI-1) Pyramid match kernels [Grauman and Darrell, 2005] (taken)

(BWG-1) Probabilistic fingerprints for shapes [Mitra et al., 2006] (taken)

(BWI-2) Pyramid matching [Lazebnik et al., 2006] (taken)

(BWG-2) Pyramid matching for range images [Li and Guskov, 2007]

Conditional Random Fields

Conditional random field techniques label the pixels of images by combining descriptor responses and prior assumptions on the regularity of labels (nearby labels are likely to be similar). Discriminative learning is used to train the detector classification and/or the neighborhood constraints.

Images Geometry

(CRFI-1) Discriminative random fields [Kumar and Hebert, 2003]

(CRFG-1) Learning 3D scan segmentation [Anguelov et al., 2005] (taken)

(CRFI-2) Multiscale conditional random fields [He et al., 2004]

(CRFG-2) Learning 3d mesh segmentation and labeling [Kalogerakis et al., 2010] (taken)

(CRFI-3) Hierarchical fields [Kumar and Hebert, 2005]

 

(CRFI-4) Dynamic Scenes [Wojek and Schiele, 2008] (taken)

 

Part-Based Models

Images

(PI-1) Object class recognition by unsupervised scale-invariant learning. [Fergus et al., 2003]

(PI-2) Object detection using the statistics of parts. [Schneiderman and Kanade, 2004] (taken)

(PI-3) Implicit Shape Models. [Leibe et al., 2004] (taken)

(PI-4) Learning Hierarchical Models of Scenes, Objects, and Parts [Sudderth et al., 2005]

(PI-5) Deformable Part Models [Felzenszwalb et al., 2008]

Symmetry

Symmetry is an important cue that reveals structural properties of images and geoemtric objects. We have picked a two example papers that examine this aspect:

Images Geometry

(SYMI-1) Structural regularity in Images [Liu and Collins, 2000] (Journal version: [Liu et al., 2004]) (taken)

(SYMG-1) Structural regularity in Point Clouds [Pauly et al., 2008]

Geometry Understanding

Finally, there have recently been a number of techniques that attempt an automatic interpretation of geometric objects. Again, we pick two example papers to look at what is comming up in this area:

Geometry

(UG-1) Understanding mechanical assemblies [Mitra et al., 2010] (taken)

(UG-2) Exploring model spaces [Ovsjanikov et al., 2011] (taken)

(UG-3) Inverse procedural modeling [Bokeloh et al., 2010] (taken)

 

References

[Anguelov et al., 2005]   Anguelov, D., Taskar, B., Chatalbashev, V., Koller, D., Gupta, D., Heitz, G., and Ng, A. (2005). Discriminative learning of markov random fields for segmentation of 3d scan data. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Link: Paper on the web

[Bokeloh et al., 2005]   M. Bokeloh, M. Wand, H.-P. Seidel: A Connection between Partial Symmetry and Inverse Procedural Modeling. In: ACM Transactions on Graphics 29(4) (Proc. Siggraph), 2010. Link: Paper on the web

[Dalal and Triggs, 2005]   Dalal, N. and Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Link: Paper on the web

[Felzenszwalb and Huttenlocher, 2005]   Felzenszwalb, P. and Huttenlocher, D. (2005). Pictorial structures for object recognition. Intl. J. Computer Vision, 61(1):55–79. Link: Paper on the web

[Felzenszwalb et al., 2008]   Felzenszwalb, P., McAllester, D., and Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Link: Paper on the web

[Fergus et al., 2003]   Fergus, R., Perona, P., and Zisserman, A. (2003). Object class recognition by unsupervised scale-invariant learning. In IEEE Conference on Computer Vision and Pattern Recognition. Link: Paper on the web

[Gelfand et al., 2005]   Gelfand, N., Mitra, N. J., Guibas, L. J., and Pottmann, H. (2005). Robust global registration. In Proc. Symp. Geometry Processing, pages 197–206. Link: Paper on the web

[Grauman and Darrell, 2005]   Grauman, K. and Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In IEEE International Conference on Computer Vision. Link: Paper on the web

[He et al., 2004]   He, X., Zemel, R.S., Carreira-Perpiñán, M.A. (2004). Multiscale conditional random fields for image labeling. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Link: Paper on the web

[Kalogerakis et al., 2010]   Kalogerakis, E., Hertzmann, A., and Singh, K. (2010). Learning 3d mesh segmentation and labeling. ACM Trans. Graph., 29(3). Link: Paper on the web

[Kazhdan et al., 2003]   Kazhdan, M., Funkhouser, T., and Rusinkiewicz, S. (2003). Rotation invariant spherical harmonic representation of 3d shape descriptors. In Symposium on Geometry Processing (SGP), pages 167–175. Link: Paper on the web

[Kumar and Hebert, 2003]   Kumar, S. and Hebert, M. (2003). Discriminative random fields: A discriminative framework for contextual interaction in classification. In Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2, ICCV ’03, pages 1150–, Washington, DC, USA. IEEE Computer Society. Link: Paper on the web

[Kumar and Hebert, 2005]   Kumar, S. and Hebert, M. (2005). A hierarchical field framework for unified context-based classification. In IEEE International Conference on Computer Vision (ICCV). Link: Paper on the web

[Lazebnik et al., 2006]   Lazebnik, S., Schmid, C., and Ponce, J. (2006). Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recongition (CVPR). Link: Paper on the web

[Leibe et al., 2004]   Leibe, B., Leonardis, A., and Schiele, B. (2004). Combined object categorization and segmentation with an implicit shape model. In In Workshop on Statistical Learning in Computer Vision, ECCV. Link: Paper on the web

[Li and Guskov, 2005]   Li, X. and Guskov, I. (2005). Multiscale features for approximate alignment of point-based surfaces. In Symposium on Geometry Processing, pages 217–226. Link: Paper on the web

[Li and Guskov, 2007]   Li, X. and Guskov, I. (2007). 3d object recognition from range images using pyramid matching. In IEEE International Conference on Computer Vision, Workshop on 3D Representation for Recognition. Link: Paper on the web

[Liu and Collins, 2000]   Liu, Y. and Collins, R. (2000). A computational model for repeated pattern perception using frieze and wallpaper groups. In 2000 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2000), volume 1, pages 537 – 544. Link: Paper on the web

[Liu et al., 2004]   Liu, Y., Collins, R., and Tsin, Y. (2004). A computational model for periodic pattern perception based on frieze and wallpaper groups. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(1):354 – 371. Link: Paper on the web

[Lowe, 2004]   Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. In Int. J. Computer Vision, volume 20, pages 91–110. Link: Paper on the web

[Mitra et al., 2006]   Mitra, N. J., Guibas, L., Giesen, J., and Pauly, M. (2006a). Probabilistic fingerprints for shapes. In Proc. Symposium on Geometry Processing (SGP). Link: Paper on the web

[Mitra et al., 2010]   Mitra, N. J., Yang, Y.-L., Yan, D.-M., Li, W., and Agrawala, M. (2010). Illustrating how mechanical assemblies work. ACM Trans. Graph. (Proc. Siggraph), 29(3). Link: Paper on the web

[Ovsjanikov et al., 2011]   Ovsjanikov, M., Li, W., Guibas, L., and Mitra, N. J. (2011). Exploration of continuous variability in collections of 3d shapes. ACM Trans. Graph. (Proc. Siggraph), 30(3), to appear. Link: Paper on the web

[Pauly et al., 2008]   Pauly, M., Mitra, N. J., Wallner, J., Pottmann, H., and Guibas, L. (2008). Discovering structural regularity in 3D geometry. ACM Trans. Graph., 27(3). Link: Paper on the web

[Schneiderman and Kanade, 2004]   Schneiderman, H. and Kanade, T. (2004). Object detection using the statistics of parts. International Journal of Computer Vision, 56(3):151–177. Link: Paper on the web

[Shechtman and Irani, 2007]   Shechtman, E. and Irani, M. (2007). Matching local self-similarities across images and videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Link: Paper on the web

[Sudderth et al., 2005]   Sudderth, E. B., Torralba, A., Freeman, W. T., and Willsky, A. S. (2005). Learning hierarchical models of scenes, objects, and parts. In IEEE International Conference on Computer Vision (ICCV). Link: Paper on the web

[Viola and Jones, 2001]   Viola, P. and Jones, M. (2001). Rapid object detection using a boosted cascade of simple features. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Link: Paper on the web

[Wojek and Schiele, 2008]   Wojek, C. and Schiele, B. (2008). A dynamic conditional random field model for joint labeling of object and scene classes. In European Conference on Computer Vision (ECCV). Link: Paper on the web