Image-warping, a per-pixel deformation of one image into another, is an essential component in immersive visual experiences such as virtual reality or augmented reality.
The primary issue with image warping is disocclusions, where occluded (and hence unknown) parts of the input image would be required to compose the output image.
We introduce a new image warping method, Metameric image inpainting - an approach for hole-filling in real-time with foundations in human visual perception.
Our method estimates image feature statistics of disoccluded regions from their neighbours.
These statistics are inpainted and used to synthesize visuals in real-time that are less noticeable to human subjects, in particular, in peripheral vision.
Our method improves over the speed of common structured image inpainting, and in realism over color-based inpainting like push-pull.
This paves the way for future applications such as depth image-based rendering, 6-DoF 360 rendering, and remote rendering-streaming.
Here we show an example of our approach, compared to a baseline Pull-Push inpainting implementation. Click or mouse over the links below the video to choose which approach to display.
In the Disoccluded Input video, disocclusions are shown as a grey checkerboard pattern.
This section gives a brief overview of the stages involved in our approach.
For more detail, please see the paper (source code is also available).
Input
The input to the approach is a warped colour frame, and the associated depth map (z-buffer).
Steerable Pyramid
We convert the image to a decorrelated colourspace (YCrCb) and then construct a steerable pyramid of the input image.
This is a decomposition of the image into frequency bands of different orientations and scales.
We use a GPU-accelerated decomposition that makes use of smaller kernel sizes and MIP mapping to improve efficiency.
Below we show two of the bands (horizontal and vertical) at one scale.
Inpainting Statistics
We use a depth-aware pull-push approach to inpaint these oriented bands to generate maps of local statistics of each band in the disoccluded region.
We campute the first and second moments (means and standard deviations) of each band - these are shown for the horizontal band in the videos below.
Synthesis
We finally synthesise the output.
We use a steerable pyramid constructed from white noise, and weight each band based on the local statistics in the disoccluded region.
This fills the disoccluded region with oriented noise that matches these statistics.
The result is converted back to RGB for display.
Depth-Aware Pull-Push
We use a modified warping approach and depth-aware variant of pull-push when inpainting colour and stats values.
Compared to regular pull-push, our depth-aware version ensures that only the background is sampled, giving improved results.
One application of the approach is warping and inpainting based on a motion field.
This can accelerate rendering by rendering frames once and warping and re-using the results over multiple subsequent frames.
This reduces compute requirements by decreasing the frequency at which expensive rendering operations need to be carried out.
Here we show this approach being adopted - in each 8 frames of the video below the first is rendered, and the following 7 are warped.
We compare our approach to using just push-pull.
If the scene to be rendered is static, we can instead move just the viewpoint based on the depth map and a 6DoF transform.
This allows us to display new views of the scene without the need to re-render it.
Using the same approach as for the transform-based inpainting, we can render one eye image in a stereo pair based on the other.
This can allow an application to just render the left eye image for example, and warp and inpaint to obtain the image for the right eye.
This can effectively halve rendering requirements for stereo displays.
Our approach can also be applied to real-world data - for example to inpaint an RGBD video to allow it to be rendered from another viewpoint without disocclusions.
Here we apply our inpainting in real time to a 360 RGBD video.
Our approach can be modified so the noise follows the motion of the background content, avoiding the "screen door" effect that would otherwise occur.
This is achieved by warping the noise maps based on the depth map & transform, or the motion field, depending on the application.
Our approach requires input that has not had anti-aliasing applied. However AA can be applied as a post-process. Here we compare output of the approach with and without 4x SSAA.
Rafael Kuffner Dos Anjos, David R. Walton, Sebastian Friston, David Swapp, Kaan Akşit, Anthony Steed and Tobias Ritschel Metameric Inpainting
TVCG 2022