The Cardboard Effect - A real person or a paper figure?

A mini lecture I gave for the class "Seeing in Time, Space and Color".

Cecilia Zhang ☕

Most of the contemporary 3D image (here I use the word 'image' to refer to overall visible appearance, including videos) systems, such as 3D movie theaters, use the binocular disparity method. I won't go into detail talking about disparity, but long story short, it's the difference between the left-right retina images of the same point in the real world that provides the sense of relative depth. Most of these 3D contents (e.g. 3D movies) are shot with a pair of cameras that function like the left and right human eyes. For these recording and projection systems to really work, meaning the projected 3D contents have no visible distortion and can make the viewers feel as if they are standing exactly at the position where the cameras are placed, the cameras need to have precise amount of separation, focal length, and other imaging parameters, and apparently these are so challenging that current technology cannot achieve such perfection. It is quite unfortunate as it's found by a pyschovisual (psychology of vision) study that, "only when no artifacts were visible , was stereoscopic imagery preferred".

The distortion discussed in this blog is called the "cardboard cutoff effect". As its name suggests, it's a phenomenon in which 3D objects set at different depths appear flat to viewers, as if they were made of cardboard/paper, although their relative depth distances from one another are somewhat maintained. It's interesting to think that this distortion is actually "man-made". It's not present in the real, natural world, or caused by our visual system interacting with photons. Cardboard effect only appears when we try to re-create the 3D world and put it onto a screen to view (it sounds like we lay up trouble for ourselves ☺). Therefore most discussions around it is largely related to 3D film-making, and mostly on how to avoid such distortion.

Fig.1 - 3D photos that exhibit cardboard effect. Use a pair of 3D green-red glasses to view these two examples. Note to tilt your head a bit as the disparity is not purely horizontal in the left example. Do you feel the figures have little thickness that they look like cardboard-made 2D paper figures? (relative depth is preserved, e.g. you can identify that the subjects locate at different depths.)

It's reported that this "flat" feeling of 3D contents is present in almost all feature movies, including Avatar, Pirates of the Caribbean (on Strange Tides), and Titanic in 3D. These are three representative movie examples demonstrating the wide range of sources that cause the cardboard effect. Titanic was shot in 2D and converted to 3D. This indicates that the cardboard effect can originate from the post-processing stage where the technology fails to create the correct disparity at each pixel. This makes sense as the 3D is simulated. By the way, it's a great pleasure to read this article about how James Cameron and his team of 450 artists converted his masterpiece from 2D to 3D. Such a dedication and enthusiasm! Pirates of the Caribbean is shot with stereo cameras, inferring the image recording phase plays a role in this cardboard artifact as well. Avatar is a computer-generated film where scenes are rendered with full modeling of the 3D space, this seems to rule out the recording and post-processing stages. The left-over "culprit" is the stage of display and viewing, which also affect the presence of the cardboard effect. In summary, the cardboard effect may come from all three stages of the pipeline: recording, post-processing and display/viewing, as illustrated below.

Fig.2 - An illustration of the "culprits" of the cardboard effect.

I will focus on the recording stage, by first outlining imaging principles that can theoretically alleviate the cardboard effect, and then show how researchers conduct control experiments to validate their hypothesis.


    ▶ Stereo Basis / Camera Baseline:
    Camera baseline denotes the distance between the pair of cameras used to shoot the scene, corresponding to the interocular distance between the centers of the pupils of our eyes (63-66 mm for a normal adult). This distance determines the visual angle of the camera viewing the object. The figure below draws two pairs of cameras with different stereo basis viewing a close object and a far object. The visual angles extended by a pair of camera differ in 1) viewing the same object, and 2) the decreasing rate of viewing objects at different depths.

    Fig.3 - Two pairs of cameras with different stereo basis (S1 and S2) and their corresponding visual angles viewing a near and a far object. Visual angle decreases with the distance of the object. Cameras with larger stereo basis (S2) tends to decrease at a slower rate.

    It seems that the bigger the stereo basis the more three-dimensional an object appears (be able to see more of the back of an object). However, it's of course not a good idea to have the baseline to be too big. First, a large camera baseline corresponds to a large retinal divergence, causing undesirable discomfort. Secondly, a large camera baseline makes farther objects appear 3D-dimensional, not that it's bad, but it contradicts to our common sense. We are used to the assumption of farther things to be more flat, and thus our brain would interpret the 3D-dimensional objects to be rather close. This unnaturally small illusion is named the "puppet-theater effect" (different from the miniature effect that is artistically used in photography using tilt-shift lens).

    ▶ Focal Length / Perspective:
    The larger the focal length the more flat the scene appears. Small focal length corresponds to large field of view and to the extreme, fish-eye lenses. Large focal length has small field of view, to the other extreme, telephoto lenses. From basic geometry, the perspective from 3D world to 2D image plane changes when we use different focal lengths to capture an object, think of how a camera can make a face look so different (from very thin to flat) when varying focal lengths. Same rule applies here, that a shorter focal length would convey more depth perception than a longer focal length that makes objects at different depths look flat as if they sit on the same plane (See figure below). However, it's neither the case to always use short focal length, as wide field of view often makes boundary objects look visually distorted, and sometimes the depth perception can be so prominent that the regions can be undesirably occluded by front objects (notice how the man at the back becomes more occluded under shorter focal lengths).

    Fig.4 - The depth perception between the two subjects changes due to different focal lengths.

    ▶ Aperture / Depth of Field:
    The larger the aperture the more popped out a subject appears to be. Larger aperture makes very shallow depth of field (DOF), blurring out the rest of the scene except for the in-focus planes. This is not desirable in most movie-making as backgrounds are often important to convey a story as well. Scenes with a large aperture choice are mostly close-up shots and for artistic bokeh (the aesthetic quality of the blur) purpose.

    Fig.5 - Aperture and Depth of Field. Large aperture makes objects appear more "popped out".

    ▶ Lighting / Shadow:
    Shadow information delivers 3D sensation of a subject or a scene. Area or planar lighting makes the objects look uniformly lit and thus more flat looking. This means if we cannot sacrifice our camera or lens choices, we may adjust the lighting of the environment to compensate for potential 3D perception distortion.


To bridge the gap between theoretical hypotheses and actual human perception. A group of researchers from Japan conducted a controlled experiment on how image recording affects the stereoscopic perception, especially on the cardboard effect distortion. 27 subjects participated in this experiment to give feedback on their 3D perception of their controlled scene. A quantitative metric named "spatial thickness" is used to evaluate "how much depth is perceived by the subject". It's measured by the ratio between widthwise factor and the depthwise factor in reproduced 3-D space (please refer to Fig.2 in the paper, reference included at the bottom). This number would be one only when the inter-camera distance equals the interocular distance of a human (63-66 mm). I would think of this measurement as a relative difference to the actual depth perceived by our eyes.

In this experiment, they present images captured by varying shooting conditions that include spatial thickness, focal length and lighting in a controlled manner. Then the participants rate how much 3D is perceived in the scene, using a scale from 1 to 5, corresponding to "no 3D at all" to "thickness is strongly perceived". They also considered adding 2D and 3D backgrounds to the scene and then on top of that, rotational motion to introduce motion parallax (the effect of seeing farther objects move slower than closer objects). You may refer to the paper for details of the experiments. The results are, in summary, the following:

    ▹ Spatial thickness is the most important factor in human perceiving 3D contents. This finding appears quite obvious in that the subjects respond correspondingly to how the scene is presented. When the scene is presented with small spatial thickness, people find it less 3D, and vice versa.
    ▹ Other factors such as focal length and lighting affect little to 3D perception. There is also little correlation between the variables.
    ▹ Adding backgrounds, either 2D or 3D would make the scene appear flatter. This indicate that it's easier for the viewer to notice depth discontinuity when there are varying depths in the scene.
    ▹ Adding rotational motion to the object would mitigate the cardboard effect, which makes sense as motion parallax can be a assisting depth indicator to help the viewer with 3D perception.

Overall, this experiment helps to justify some hypotheses about the 3D cardboard effect, but further research is still required to understand how the human visual system interprets such distortions and what would be effective to alleviate so. I'm also unsure whether this experiment can say it all. The scene presented in this experiment is a tree, which is a very familiar object that people would have a lot of prior knowledge and assumptions that might affect how they respond. Again, the cardboard effect is caused by imperfect technology and it would require more heuristics to get to conclusions. I'm positive that as our 3D projection techniques advance (maybe light field?), this 3D distortion would be of little impact in us viewing 3D contents.

One more interesting thing I've learned through reading 3D content creation is that: there is no single optimal spot in a theater to view 3D movies. Theoretically, the optimal spot is the location where the cameras were placed when shooting the film, however, since different scenes are shot with different settings of the cameras (e.g. different focal lengths), one needs to move around the theater to find the optimal spot for each scene. So, don't bother, just enjoy the show, there's generally little noticeable differences as long as you're not sitting in the front two rows.


    Stereoscopic Perceptions of Size, Shape, Distance and Direction, by D.L.MacADAM in 1954. The interview between MacADAM and a filmmaker in the attachment is particularly a pleasure reading.
    Perceptual Issues in Stereoscopic Signal Processing, by Scott J. Daly et al. I found Fig.3 to be a good illustration of understanding different affecting mis-perception causes.
    A study on the relationship between shooting conditions and cardboard effect of stereoscopic images, by H. Yamanoue et al. This paper is about the aforementioned controlled experiment. ▹ I used XstereO Player to display 3D images shown in this blog.
    ▹ A Bachelor thesis on the cardboard effect.
    Steven McQuinn's Quora answer on "Where is the best place to sit in a movie theater for a 3D movie?"