DeRF: Decomposed Radiance Fields
10 Minutes Paper Review
To start with the review, I recommend you to read a detailed summary — that I’ve written in my blog — if you were interested in this paper.
NeRF, an abbreviation of Neural Radiance Fields, has shown great performance in rendering tasks. However, due to its slow inference, it has been limited to be used in the real world task. DeRF addresses this problem by decomposing scenes into smaller areas and individually renders each area.
Introduction
Neural rendering methods have shown great potential to render unseen views. However, the neural rendering methods are far from being a fully developed technology. The two development axes are generalizability and performance. This paper focuses on performance, i.e., increasing the efficiency of training and/or inference. NeRF has to be improved since its inference time is incredibly slow. As illustrated in the figure below, there are diminishing returns regarding how neural networks' capacity affects final rendering quality.
To take advantage of this phenomenon, our method divides the scene into multiple areas and employs smaller networks in each of these areas. For decomposing, we select a Voronoi diagram with the Painter’s algorithm. To summarize our contributions,
- We highlight the presence of diminishing returns for network capacity in NeRF and propose spatial decompositions to address this issue.
- We demonstrate how a decomposition based on Voronoi Diagrams may be learned to represent a scene optimally.
- We show how this decomposition allows the whole scene to be rendered by rendering each part independently and compositing the final image via the Painter’s Algorithm.
- In comparison to the NeRF baseline, these modifications result in improved rendering quality for the same computational budget or the faster rendering of images given the same visual quality.
Neural Radiance Fields (NeRFs)
NeRF, which is an ancestor of DeRF, is well explained in my other article. For details, refer to the link below:
Decomposed Radiance Fields
For the convenience of writing equations in medium, I’ve used an image Our model decomposes scenes to multiple areas and gives a weight for each area. In DeRF, we write:
Voronoi Learnable Decomposition
We seek a decomposition method, which is differentiable, spatial partition, and GPU-friendly. We’ve selected Voronoi Decomposition; the formal equation for it is:
As the \beta reaches infinite, the equation will be approximately the efficient state, aforementioned. The Painter’s Algorithm is one of the most elementary rendering techniques. It renders from back to front, to the outer buffer. This rendering technique is available only if the However, this method is not appropriate when the scene is arranged as in the figure below:
In the solution proposed here, the scene can be rendered part-by-part, one Voronoi cell at a time, without causing memory cache incoherence, leading to better GPU throughput. The correctness of the Voronoi Painter algorithm is proved in the link below:
Implementation Details
Results
Evaluation Metrics
- Peak Signal to Noise Ratio (PSNR): A classic metric to measure the corruption of a signal.
- Structural Similarity Index Measure (SSIM): A perceptual image quality assessment based on the degradation of structural information.
- Learned Perceptual Image Patch Similarity (LPIPS): A perceptual metric based on the deep features of a trained network that is more consistent with human judgment.
- Frames-per-TeraFLOP (theoretical performance) and frames-per-second (practical performance)
Since DeRF is interested in the relationship between rendering quality and the computational budget in a practical scenario, it uses the “Real-Forward-Facing” dataset from NeRF. It uses a batch size of 512 and 128 samples/ray with 300k iterations for training. Other settings are the same as reported in NeRF.
To avoid confusion, this is a short summary of terminologies used in the result tables.
- Units: The number of hidden units in the network.
- Heads: The number of partitioned areas.
Increasing the number of heads is much more efficient than increasing the units. Given the same render cost, more fine-grained decompositions improve rendering quality across all metrics. Regardless of the computation, using more decomposition leads to better rendering quality.