New Measurements Reveal Weaknesses of Image Quality Metrics in Evaluating Graphics Artifacts

ABSTRACT

Reliable detection of global illumination and rendering artifacts in the form of localized distortion maps is important for many graphics applications. Although many quality metrics have been developed for this task, they are often tuned for compression/transmission artifacts and have not been evaluated in the context of synthetic CG-images. In this work, we run two experiments where observers use a brush-painting interface to directly mark image regions with noticeable/objectionable distortions in the presence/absence of a high-quality reference image, respectively. The collected data shows a relatively high correlation between the with-reference and no-reference observer markings. Also, our demanding per-pixel image-quality datasets reveal weaknesses of both simple (PSNR, MSE, sCIE-Lab) and advanced (SSIM, MS-SSIM, HDR-VDP-2) quality metrics. The most problematic are excessive sensitivity to brightness and contrast changes, the calibration for near visibility-threshold distortions, lack of discrimination between plausible/implausible illumination, and poor spatial localization of distortions for multi-scale metrics. We believe that our datasets have further potential in improving existing quality metrics, but also in analyzing the saliency of rendering distortions, and investigating visual equivalence given our with- and no-reference data.

ADDITIONAL MATERIALS

[Paper (pdf)]
[Supplementary Material - distortions, more results (pdf)]
[Supplementary Material - per-scene result analysis (HTML)]
[Dataset and Perceptual Experiment]
[Presentation slides (pdf)]
[bibTeX entry (bib)]

Acknowledgements and Credits: the presented dataset should not be used for commercial purposes without our explicit permission. Please acknowledge the use of the dataset by citing the publication.