Get the app

2D to stereoscopic 3D with AI: Depth map from a single image

Created
Aug 15, 2023 11:52 PM
Tags
notion image
Welcome to our series on 2D to Stereoscopic 3D conversions. In this segment, we explore how modern AI techniques can assist in creating immersive 3D visuals from 2D images. Our focus will be on depth maps - understanding what they are, their importance in the 2D to 3D process, and different ways to generate depth maps.

So what is a depth map?

A depth map is a 2D representation of a 3D scene where each pixel contains information about the distance or depth of the corresponding point in the scene from the camera or observer. It provides a measure of the relative distances between objects in the scene, allowing for depth perception and the creation of 3D effects or reconstructions.
To put it simply, a depth map is a visual storyteller - it narrates the distance between every pixel of an image and the viewer. To give you a clearer picture, let's consider two examples of depth maps processed by Owl3D:
(The original image is on the left, and the depth map is on the right)
(The original image is on the left, and the depth map is on the right)
(The original image is on the left, and the depth map is on the right)
(The original image is on the left, and the depth map is on the right)
In these images, you can see that the depth map is usually encoded as a grayscale image. The pixels within this image can hold values ranging from 0 to 255. The higher the value, the brighter the pixel. Conversely, lower values render darker pixels. Interpreting this in terms of depth, whiter or brighter pixels indicate closer proximity to the viewer, while the darker pixels depict points that are further away.

How is a depth map used in the 2D to stereoscopic 3D process?

Understanding how depth maps contribute to the 2D to 3D conversion process is key. At its core, the conversion aims to create a unique left and right-eye perspective from a single image. When viewed through 3D-enabled devices, such as VR headsets or 3D TVs, this results in accurate angles perceived by each of your eyes, subsequently forming a 3D visualization in your brain.
A depth map is instrumental in achieving this conversion. It aids in calculating the left and right eye views by creating a displacement guide for shifting the image pixels. The pixels shift from the center to slight left and right positions, taking their cues from the depth map. The depth map crucially informs which pixels should be displayed prominently (on top) and which ones recede (at the bottom) to facilitate proper occlusion handling.
Consider the scenario of two pixels, A and B, with depth values of 230 and 20 respectively. If these pixels attempt to occupy the same location in the shifted view, the depth map ensures that the closer pixel - pixel A in this instance - assumes the spot. This rule is applied across the image to generate a correctly occluded view.
The subsequent images demonstrate shifted left and right perspectives with accurate occlusion handling:
notion image
notion image
On close observation, you'll notice that the left and right images, though similar, have slightly differently shifted and occluded pixels. It's these subtle differences that enable our brains to create the stereoscopic vision when displayed correctly.

Various Approaches to Generating a Depth Map

The choice of your depth map generation method can largely depend on your specific objectives and resources. Here are a few potential options:

Open-source models - local PC

For those comfortable with writing Python code and desiring a hands-on approach with full control over the generation process, open-source depth estimation models offer a compelling solution. These models, developed in the PyTorch framework, can be run on a compatible PC. Some notable models include:

Open-source models - Google Colab

If your PC lacks a suitable GPU, Google Colab offers a beneficial alternative. This free tool allows for the execution of arbitrary Python code through your browser. There's no need for setup, and Colab generously provides access to computational resources, including GPUs to run the depth models. All the aforementioned projects support Google Colab. Simply navigate to their respective project pages and click on the Colab icon. More information on Google Colab is available here: https://research.google.com/colaboratory/faq.html

Desktop software

For those who have a GPU-equipped PC and prefer unlimited and private conversion, desktop converters are worth considering.
  • Owl3D: Owl3D is an AI-powered 2D to 3D converter utilizes cutting-edge machine learning models to compute stereoscopic 3D from 2D images or videos. With a user-friendly interface, you simply need to select your media and click convert. The resulting output is compatible with various display devices such as AR/VR/3D TV/3D projectors and Looking Glass Factory products.

Online Conversion

  • LeiaPix Converter
  • Looking Glass Factory's 2D to 3D converter (Powered by Owl3D)

Summary

In closing, the journey from 2D to stereoscopic 3D is made possible through depth maps, which are becoming easier to generate thanks to open-source models and user-friendly software like Owl3D. As we push the boundaries of VR, AR, and beyond, depth map creation is leading the charge.
With a depth map, the third dimension is within reach, making the world of 3D imaging more accessible and immersive than ever. Stay tuned to our blog for the latest on 2D to 3D conversion with AI. Here's to bringing depth to life with each pixel!