|Process type||digital and print|
|Industrial sector(s)||Film and television, print production|
|Main technologies or sub-processes||Computer software|
|Product(s)||Movies, television shows, social media, printed images|
2D to 3D video conversion (also called 2D to stereo 3D conversion and stereo conversion) is the process of transforming 2D ("flat") film to 3D form, which in almost all cases is stereo, so it is the process of creating imagery for each eye from one 2D image.
2D-to-3D conversion adds the binocular disparity depth cue to digital images perceived by the brain, thus, if done properly, greatly improving the immersive effect while viewing stereo video in comparison to 2D video. However, in order to be successful, the conversion should be done with sufficient accuracy and correctness: the quality of the original 2D images should not deteriorate, and the introduced disparity cue should not contradict other cues used by the brain for depth perception. If done properly and thoroughly, the conversion produces stereo video of similar quality to "native" stereo video which is shot in stereo and accurately adjusted and aligned in post-production.
Two approaches to stereo conversion can be loosely defined: quality semiautomatic conversion for cinema and high quality 3DTV, and low-quality automatic conversion for cheap 3DTV, VOD and similar applications.
Computer animated 2D films made with 3D models can be re-rendered in stereoscopic 3D by adding a second virtual camera if the original data is still available. This is technically not a conversion; therefore, such re-rendered films have the same quality as films originally produced in stereoscopic 3D. Examples of this technique include the re-release of Toy Story and Toy Story 2. Revisiting the original computer data for the two films took four months, as well as an additional six months to add the 3D. However, not all CGI films are re-rendered for the 3D re-release because of the costs, time required, lack of skilled resources or missing computer data.
With the increase of films released in 3D, 2D to 3D conversion has become more common. The majority of non-CGI stereo 3D blockbusters are converted fully or at least partially from 2D footage. Even Avatar contains several scenes shot in 2D and converted to stereo in post-production. The reasons for shooting in 2D instead of stereo are financial, technical and sometimes artistic:
Even in the case of stereo shooting, conversion can frequently be necessary. Besides the mentioned hard-to-shoot scenes, there are situations when mismatches in stereo views are too big to adjust, and it is simpler to perform 2D to stereo conversion, treating one of the views as the original 2D source.
Without respect to particular algorithms, all conversion workflows should solve the following tasks:
High quality conversion methods should also deal with many typical problems including:
Most semiautomatic methods of stereo conversion use depth maps and depth-image-based rendering.
The idea is that a separate auxiliary picture known as the "depth map" is created for each frame or for a series of homogenous frames to indicate depths of objects present in the scene. The depth map is a separate grayscale image having the same dimensions as the original 2D image, with various shades of gray to indicate the depth of every part of the frame. While depth mapping can produce a fairly potent illusion of 3D objects in the video, it inherently does not support semi-transparent objects or areas, nor does it represent occluded surfaces; to emphasize this limitation, depth-based 3D representations are often explicitly referred to as 2.5D. These and other similar issues should be dealt with via a separate method. 
The major steps of depth-based conversion methods are:
Stereo can be presented in any format for preview purposes, including anaglyph.
Time-consuming steps are image segmentation/rotoscoping, depth map creation and uncovered area filling. The latter is especially important for the highest quality conversion.
There are various automation techniques for depth map creation and background reconstruction. For example, automatic depth estimation can be used to generate initial depth maps for certain frames and shots.
People engaged in such work may be called depth artists.
A development on depth mapping, multi-layering works around the limitations of depth mapping by introducing several layers of grayscale depth masks to implement limited semi-transparency. Similar to a simple technique, multi-layering involves applying a depth map to more than one "slice" of the flat image, resulting in a much better approximation of depth and protrusion. The more layers are processed separately per frame, the higher the quality of 3D illusion tends to be.
3D reconstruction and re-projection may be used for stereo conversion. It involves scene 3D model creation, extraction of original image surfaces as textures for 3D objects and, finally, rendering the 3D scene from two virtual cameras to acquire stereo video. The approach works well enough in case of scenes with static rigid objects like urban shots with buildings, interior shots, but has problems with non-rigid bodies and soft fuzzy edges.
Another method is to set up both left and right virtual cameras, both offset from the original camera but splitting the offset difference, then painting out occlusion edges of isolated objects and characters. Essentially clean-plating several background, mid ground and foreground elements.
Binocular disparity can also be derived from simple geometry.
It is possible to automatically estimate depth using different types of motion. In case of camera motion, a depth map of the entire scene can be calculated. Also, object motion can be detected and moving areas can be assigned with smaller depth values than the background. Occlusions provide information on relative position of moving surfaces.
Approaches of this type are also called "depth from defocus" and "depth from blur". On "depth from defocus" (DFD) approaches, the depth information is estimated based on the amount of blur of the considered object, whereas "depth from focus" (DFF) approaches tend to compare the sharpness of an object over a range of images taken with different focus distances in order to find out its distance to the camera. DFD only needs two or three at different focus to properly work, whereas DFF needs 10 to 15 images at least but is more accurate than the previous method.
If the sky is detected in the processed image, it can also be taken into account that more distant objects, besides being hazy, should be more desaturated and more bluish because of a thick air layer.
The idea of the method is based on the fact that parallel lines, such as railroad tracks and roadsides, appear to converge with distance, eventually reaching a vanishing point at the horizon. Finding this vanishing point gives the farthest point of the whole image.
The more the lines converge, the farther away they appear to be. So, for depth map, the area between two neighboring vanishing lines can be approximated with a gradient plane.
PQM mimic the HVS as the results obtained aligns very closely to the Mean Opinion Score (MOS) obtained from subjective tests. The PQM quantifies the distortion in the luminance, and contrast distortion using an approximation (variances) weighted by the mean of each pixel block to obtain the distortion in an image. This distortion is subtracted from 1 to obtain the objective quality score.
HV3D quality metric has been designed having the human visual 3D perception in mind. It takes into account the quality of the individual right and left views, the quality of the cyclopean view (the fusion of the right and left view, what the viewer perceives), as well as the quality of the depth information.
The VQMT3D project  includes several developed metrics for evaluating the quality of 2D to 3D conversion
|Cardboard effect||Advanced||Qualitative||2D-to-3D conversion|
|Edge-sharpness mismatch||Unique||Qualitative||2D-to-3D conversion|
|Stuck-to-background objects||Unique||Qualitative||2D-to-3D conversion|
|Comparison with the 2D version||Unique||Qualitative||2D-to-3D conversion|