Compuatational Photography - Ideas for projects

Computional Photography

Ideas for projects

Many of these project ideas could be turned into research papers at international conferences. In fact, some of the projects in this course will probably end up being submitted for publication! To keep you and your fellow students from being scooped, please don't distribute this handout, or the URL of this web page, or its ideas, beyond the Saarbruecken campus.
Some of these project ideas might present a rather far reached goal. Any reasonable step towards addressing these problems might make a good project as well, if the full problem is too tough.

Combining multiple images

Imagine a camera that sweeps quickly through a range of focus stacks, taking an image at every setting. This would allow you to select the focus and depth of field afterwards using an interactive graphics program. Design and implement such a program. What additional creative effects can you achieve using this technique? For this project, mount your camera on a tripod to avoid the need to first stabilize the imagery.

Image stabilization, which is a mature technology in video cameras, could have many uses in still photography. Implement an image stabilization algorithm from the vision literature, then use it to stabilize and combine a set of images shot using a hand-held camera. This should allow you to shorten exposure times in low-light scenes without enhancing noise, and to produce cleaner shadows in high-contrast scenes. Using non-linear image combinations, you might be able to remove cars from a freeway, or crowds from a plaza.

Looking through dense occluders Using synthetic aperture photography one can combine images from different viewpoints in order to focus through an occluder. Matting the occluder even allows for clearer pictures but might as well introduce high frequency artifacts. How could one remove a semi-transparent occluder such as a curtain by combining multiple images. There exist some interesting work by [Bernards and Irani - ECCV 2003] .

Removing multiple occluders In a similar fashion assume you take multiple images of crowded place from the same view point, e.g. a building behind a market place. Then the probability of of capturing images where parts of the building are occluded is extremely high. There might even be parts of the images which are almost always occluded sometimes even by the same object (a parked car present in 80% of all images). Find a robust measure to estimate a clear picture of the background. Compare to [Alexa 2007] ).

Combining a set of photographs taken under different conditions, such as at different times, or under different exposure, focus, or illumination (e.g. with and without flash) has become a major theme at Siggraph, but we've surely barely scratched the surface in this area. Look at the Digital Photomontage paper (Agarwala, Siggraph 2004) or the several flash-noflash papers (Petschnigg, Siggraph 2004, Raskar, Siggraph 2004, Agrawal, Siggraph 2005), then think up something they haven't tried.

Feature enhancement . Given a set of images of the same object, same viewpoint, but under different illumination. How could the images be combined such as to optimally convey the shape of the object? Some features are only visible at grazing illumination but this might cast long shadows somewhere else. What would be a good metric?

View interpolation using image statistics. Fitzgibbon et al. (ICCV 2003) proposed a method for image-based rendering using image-based priors where view interpolation of objects of even complex geometry results in decent images by ensuring that every pixel in a reconstructed view is somehow probable in the context of its reconstructed neighbor pixels. The distribution of reasonable neighborhoods is derived directly from the input images. Implement this technique and extend it by applying acceleration techniques found in current texture synthesis papers.

Image statistics for refocusing. In a series of photographs of a scene with a hand-held point-and-shoot camera there are typically one or two images which captured the correct moment in time but are just out of focus while others, more boring shots might be perfectly sharp. By using image statistics (see above), i.e. knowledge about the distribution of pixel neighborhoods, derived from the sharp images it should be possible to derive a refocused image for the point in time you are actually interested in (see [Freeman et al 2002] ).

Removing camera shake from a single picture. Given a long exposure image which has been taking without a tripod the image is likely to suffer from camera shake, e.g. a single dot in the scene might not be imaged as a dot but rather as some blob with a structure that depends on the actual motion, the so-called point spread function. The observed image is basically a convolution of the point spread function with the original scene. The paper by [Fergus et al. 2006] addresses the problem by first identifying this point spread function and then deconvolving the image using the Lucy-Richardson algorithm. There are however better deconvolution methods based on wavelets which could be used to reduce the remaining artifacts (see [Stark 2007] or [Stark et al. 2002] ).
Removing camera shake by sensor fusion. How could the same problem be addressed if two long exposure images are captured in succession? Is it possible to derive a better PSF using a sensor fusion approach? How to combine a long exposure with a short exposure one, where the PSF can be neglected but the image is rather badly exposed? Are there other hints that could be used.

Video enhancement by space-time filtering and motion compensation. Videos captured in low-light conditions typically are underexposed or rather noisy. By applying an adaptive filter which averages over time for static pixels and averages over neighboring pixels within the same frame for moving pixels ) were able to enhance and tonemap low dynamic range videos to decent videos at decent exposure. While their current filter is oriented to either the time or the spatial domain better result might be achieved by performing motion compensation, i.e. orienting the filter along some curve in the space-time continuum.

Noise Characterization and HDR. In order to create a high dynamic range image from low dynamic range input, one can combine multiple images at different exposures, i.e. varying the exposure time. Assuming a linear response of the camera, one can easily compute the radiance in each image and then compute a weighted sum of all rescaled input images (see Robertson ). The weights should account for non-linearities in the camera's response but do not consider camera noise. Estimate the different camera noise parameters (especially quantization noise) for a given camera and try to obtain an HDR image with as low noise as possible. What would be the best set of exposure times in order to minimize noise (see also [Grossberg and Nayar 2003] ).

Projects involving projectors and cameras

Automatic Shaped illumination. Given a set of uncalibrated video projectors and a camera observing a particular unknown object. What illumination patterns should be loaded into the projectors to best visualize the surface features. One might run an optimization that tries to maximize the local contrast in the observed images.

Active depth-from-defocus. By projecting synthetically focused textures onto an object, existing shape-from-focus algorithms should perform better. In fact, there are several kinds of texture-enhanced vision algorithms that could benefit from having some active illumination. Try implementing one of these algorithms (e.g. [Zhang and Nayar 2006] ).

Single shot 3D capture. Current 3D scanning techniques use a time-varying code to determine the disparity between a camera and a projector. Even space-time stereo approaches ( [Davis et al. 2003] ), which use two cameras for the triangulation and active high-frequency illumination, typically rely on multiple patterns. Try to apply a high frequency pattern in order to acquire a depth map in a single shot by merging stereo matching plus decorrelation of the projected high frequency pattern.

3D scanning with multiple projectors. 3D scanning with a single projector is an almost solved problem. However, the number of pixels seen by the camera which cannot be reliably illuminated by a single projector is typically significant due to occlusion. Using multiple projectors (more than the two called for in the previous project) it should be possible to illuminate many more surface points even for highly occluded scenes. Unfortunately, the light patterns typically used in a single-projector setup turn out to be not suitable since multiple projector might illuminate the same surface points leading to disturbed readings. Design a 3D scanning setup where multiple projectors illuminate the scene at the same time (projector and camera calibration code is given).

3D scanning of refracting objects. Capturing the environment matte of a glass objects allows to reproduce the appearance of the object for arbitrary planar backgrounds. For refracting objects environment matting typically yields a one-to-one mapping from camera pixels to locations on a monitor where the viewing ray after interacting with the object would intersect the monitor plane. Capturing environment mattes from multiple view points, it should be possible to reconstruct the 3D shape of refracting objects, at least for those objects with relatively smooth geometry.

Purely analytical or design projects

Analyze the applicability of an exotic or computational imaging technique from microscopy, astronomy, or another field to photography, i.e. at the human scale using combinations of cameras, projectors, or other optical components. An example from microscopy might be phase contrast illumination; an example from astronomy might be coded-aperture imaging (using an array of pinholes on a mask).

Super-resolution in reflectance fields. Assume you are given two 4D reflectance fields of a scene from two projectors and a single camera, and in addition an individual picture of the scene illuminated by a single point light source positioned in between the projectors. What techniques could be used to derive the 4D reflectance field from a virtual projector at the location of the point light source? Transfer techniques for computing super-resolution in image and videos to the light domain.

As long as this list is, don't consider these your only options. If you have an idea for another project, or a variation on one of these ideas, come talk to us; we'd love to brainstorm with you about it!

lensch@mpi-inf.mpg.de