3D reconstruction is a major problem in computer vision. The goal is to infer the true geometry of an object or scene given an image observation from an unknown camera viewpoint and/or under unknown lighting conditions. This is an important task for many applications such as autonomous driving, augmented reality content deployment, and robotic navigation.
Traditionally, to construct a 3D space, the first thing is to capture 2D depth maps using multi-view stereo (MVS). These 2D maps are then merged together to form a 3D representation of the imaged surface.
Recently, a family of deep learning-based methods has been developed that directly reconstructs in finite 3D volumetric spatial space. The key component of these methods is 3D convolution. Although these methods have shown outstanding reconstruction results, their practicality in real-world scenarios is limited because they use expensive 3D convolutional layers.
This is where SimpleRecon comes into action. Instead of relying on memory-hungry and computationally expensive 3D convolutions, they go back to the basics. They show that it is possible to achieve accurate depth estimation using a 2D CNN augmented with a cost volume.
SimpleRecon lies between monocular depth estimation and plane-scanning MVS. The encoder-decoder architecture for depth prediction is complemented by overhead. The image encoder extracts matching features from the source and reference images, then passes them to the cost volume. Finally, using a 2D encoder-decoder convolutional network, the output of the cost volume is processed, which is augmented with image-level features.
SimpleRecon has two main contributions that make it a state-of-the-art multi-view depth estimator.
The first contribution is a carefully designed 2D CNN that uses strong prior images along with a 3D feature volume with planar scanning and geometric loss. The network is based on a 2D convolutional autoencoder design. The authors avoid using computationally expensive structures such as LSTM to keep the network lightweight.
The second contribution is the integration of keyframe and geometric metadata in the cost volume, which is a low-cost operation but leads to a significant increase in performance. Traditional stereo methods provide important information that is usually overlooked. In this study, readily available metadata is included in the cost volume, allowing the network to intelligently aggregate data across views. This can be achieved in two ways: explicitly by adding more feature channels, or implicitly by enforcing a certain feature ordering.
Metadata is injected into the network by extending image-level features using additional metadata channels. This is extremely useful for the network to reason about the importance of each source image to estimate the depth of a given pixel, since these channels encode information about the 3D relationship between the images.
SimpleRecon can produce accurate depth estimation in various scenarios while being a lightweight network that can be used in practical use cases. The authors christen their study “back to basics” and show that quality depths are necessary for quality reconstructions.
This Article is written as a research summary article by Marktechpost Staff based on the research paper 'SimpleRecon: 3D Reconstruction Without 3D Convolutions'. All Credit For This Research Goes To Researchers on This Project. Check out the paper and github link. Please Don't Forget To Join Our ML Subreddit
Ekrem Chetinkaya received a bachelor’s degree. in 2018 and M.Sc. in 2019 from Ozyegin University, Istanbul, Turkey. He wrote his M.S. thesis on image denoising using deep convolutional networks. He is currently pursuing a Ph.D. degree at the University of Klagenfurt, Austria, and works as a researcher on the ATHENA project. His research interests include deep learning, computer vision, and multimedia networks.