Chinese

2020,  2 (1):   1 - 11   Published Date：2020-2-20

DOI: 10.1016/j.vrih.2019.10.004

Abstract

Background
Depth sensor is an essential element in virtual and augmented reality devices to digitalize users’ environment in real time. The current popular technologies include the stereo, structured light, and Time-of-Flight (ToF). The stereo and structured light method require a baseline separation between multiple sensors for depth sensing, and both suffer from a limited measurement range. The ToF depth sensors have the largest depth range but the lowest depth map resolution. To overcome these problems, we propose a co-axial depth map sensor which is potentially more compact and cost-effective than conventional structured light depth cameras. Meanwhile, it can extend the depth range while maintaining a high depth map resolution. Also, it provides a high-resolution 2D image along with the 3D depth map.
Methods
This depth sensor is constructed with a projection path and an imaging path. Those two paths are combined by a beamsplitter for a co-axial design. In the projection path, a cylindrical lens is inserted to add extra power in one direction which creates an astigmatic pattern. For depth measurement, the astigmatic pattern is projected onto the test scene, and then the depth information can be calculated from the contrast change of the reflected pattern image in two orthogonal directions. To extend the depth measurement range, we use an electronically focus tunable lens at the system stop and tune the power to implement an extended depth range without compromising depth resolution.
Results
In the depth measurement simulation, we project a resolution target onto a white screen which is moving along the optical axis and then tune the focus tunable lens power for three depth measurement subranges, namely, near, middle and far. In each sub-range, as the test screen moves away from the depth sensor, the horizontal contrast keeps increasing while the vertical contrast keeps decreasing in the reflected image. Therefore, the depth information can be obtained by computing the contrast ratio between features in orthogonal directions.
Conclusions
The proposed depth map sensor could implement depth measurement for an extended depth range with a co-axial design.

Content

1 Introduction
Depth sensors have become essential sub-systems in many modern virtual reality (VR) and augmented reality (AR) devices to scan, digitalize, and understand the user’s the environment in real time. The scanned depth map can be used for 3D scene reconstruction as the input content for the VR/AR systems, or it can be used for gesture recognition for human-computer interaction. Those two prominent usages require different depth sensing range, the environmental scanning requires a more extensive range typically from 60cm to 5m. While the gesture recognition needs a much closer range from about 20cm to 100cm. Those functionalities can potentially boost the VR and AR application in gaming, education, and training.
The most popular technologies currently available for depth sensing in VR and AR devices include stereo vision[1], structured light[2,3] and Time of Flight[4,5,6,7,8,9]. The stereo and structured light method are based on the triangulation method which requires a baseline separation between multiple views captured by multiple sensors or between structured light and camera paths. As shown in Figure 1a, the passive stereo method implements the depth measurement through two separated cameras. The depth map is produced through the difference of pixel position in the two captured images. The structured light depth sensor (Figure 1b) is constructed with a projection path and an imaging path with baseline separation. In the depth measurement, the projector projects a designed pattern onto the target scene, and the camera captures the image of the deformed pattern for the depth information. The Time-of-Flight (ToF) depth sensor (Figure 1c) measures the depth information directly through the time offset or the phase shift between the emitted signal and the received signal. For the time offset method, the ToF system uses a laser source to send out a pulse and then detects the reflected pulse for the time of flight. For high accuracy, the laser pulse should be short which leads to higher cost. Therefore, in AR/VR devices, the commonly used ToF sensors are based on the modulated signal which measures the phase shift for depth information and has a lower cost.
Table 1 shows a comparison of the depth sensing technologies mentioned above. The stereo and structured light methods can potentially achieve high depth map resolution up to several million pixels, which depend on the camera resolution and projected pattern, respectively. However, both methods require a significant baseline separation to implement high-quality measurement. Furthermore, due to the nature of the triangulation method, both approaches suffer from the occlusion shadowing effects. Besides, the passive stereo method requires massive computation power. It has weak performance under the low light circumstance and suffers from matching a uniform test scene. For the structured light method, the depth sensing range is restricted by the illumination power and background brightness besides the requirement for a baseline separation between camera and projector. As a result, the typical working distance is up to 4m—5m. The ToF devices could achieve a more extensive depth measurement range with a low computation power, and the sensor could be small. However, the most significant limitation is that the depth map resolution is limited. Finally, the laser pulse might suffer from the interference problem due to occlusion effects.
Popular depth sensing technologies used in VR/AR devices
Stereo Structured Light Time of Flight
Depth map resolution Several Mpix Max. 1―3 Mpix Max. VGA
Depth range Short Short Long
Depth accuracy mm to cm mm to cm mm to cm
Hardware Simple cameras Demanding illumination Complex system Simple illumination Complex sensors
Size Large Large Small
Computation power High Medium Low
Low light performance Weak good good
Bright light performance Good Weak Medium
Speed Medium Medium Fast
To overcome the limitations of these existing methods summarized above, we propose a co-axial depth map sensor with an extended depth range for AR/VR devices. There are three significant contributions of this work. Firstly, the proposed depth sensor has a co-axial design for the projection and imaging paths which could potentially be more compact than the conventional structured light 3D cameras. Secondly, this depth sensor can achive an extended depth range than a conventional structured light method while maintaining a high-resolution depth map. Finally, this design can capture a high-resolution 2D image along with a 3D depth map.
In section 2, we will introduce the method of the proposed depth sensor including its depth measurement concept and the concept to extend the depth measurement range. Section 3 will focus on the optical design for the proposed depth sensor and its performance. In section 4, we demonstrate a depth measurement simulation based on the designed depth sensor.
2 Methods
Similar to a conventional structured-light depth sensor, the proposed depth sensor is constructed with a projection path and an imaging path. These two paths are combined by a beam splitter to implement a co-axial configuration for a compact size. It implements depth measurement based on the controlled aberration method[10], and we further extend this method by using an electronically focus-tunable lens in the system to extend the depth measurement range. In the following subsections, we will introduce the details of the depth measurement concept first and then illustrate the principle to extend the depth measurement range.
2.1　Depth measurement concept
The proposed depth sensor measures the depth of a scene from the reflected projection pattern based on the controlled aberration method[10]. Figure 2 shows a schematic layout of our proposed co-axial depth sensor, which mainly consists of a structured illumination unit, an aberration-encoding unit, and an imaging optics unit shared by the projection path and the imaging path. In the structured illumination unit, a collimated near-infrared (NIR) light source is encoded by a digital micromirror device (DMD) to create a desired light pattern. The pattern is then projected through an aberration-eoncoding unit to induce a controlled aberration. Here the aberration-encoding unit maily consists of a cylindrical lens (labeled in yellow) inserted into the projection path. It adds extra power in one direction of the projection path, namely, extraordinary projection path, and its orthogonal direction is the ordinary projection path. The aberrated light pattern is then projected onto a test scene through the projection optics. Finally, the light reflcted by the scene is captured by the shared imaging optics to form a reflected pattern on the image sensor. In this design, the projection path and imaging path are combined by a beam splitter to form a co-axial configuration.
As noted above, the aberration-encoding unit creates two projection paths: an extraordinary projection path and an ordinary projection path. The ordinary projection path focuses the horizontal direction of the light patterns at the ordinary focal plane. The extraordinary path focuses the vertical direction at the extraordinary focal plane which is shifted along the optical axis from the ordinary focal plane. In this sense, the depth map sensor creates two focal depths along the optical axis. The separation between the two focal depths is defined as the depth measurement range
$Δ z$
. This depth range is determined by the cylindrical lens power and the projection optics power. For a test scene located between the two focal planes, the image contrast of reflected pattern in the extraordinary projection direction keeps decreasing while the contrast in the ordinary projection direction keeps increasing as the target moves away from the depth sensor. Therefore, by measuring the reflected image contrast in those two orthogonal directions, the depth information of the scene can be extracted.
Since the depth range is determined by the cylindrical lens and the projection optics, it is important to find out the relationship between the cylindrical lens power and the depth-sensing range
$Δ z$
and then choose a cylindrical lens power for a given projection optics. To further illustrate the depth map resolution, we calculate the imaging size of the designed pattern projected onto the test scene at different distances with considering the defocus condition for both the ordinary and extraordinary projection path.
Figure 3a through 3c shows the simplified first-order layout of the ordinary projection path, the extradinary projection path, and the imaging path, respectively, with the optics being simply represented by the cardinal points and planes. Specifically, in Figure 3a, we use the principle planes Po and Po’ to illustrate the ordinary projection optics. zo and
$z o '$
are the object and image distance, respectively, and
$Φ o$
is the power of the ordinary projection lens. The stop of the system is placed at the rear focal plane for a telecentric design, and the stop diameter is defined as DXP. As mentioned above, the distance between the extraordinary image plane and the ordinary image plane is defined as the depth measurement range
$Δ z$
. It could be expressed as
$Δ z = z e 1 + z e ( Φ o + Φ c - Φ o Φ c t ) - Φ c ( Φ o + Φ c - Φ o Φ c t ) t - z o '$
In which,
$z e$
is the object distance for the extraordinary projection path,
$Φ c$
is the power of the cylindrical lens, and t is the directional distance from the rear principal plane of the cylindrical lens to the front principal plane of the ordinary projection lens.
The image size on the sensor for a designed projection pattern, which determines the depth map resolution can be obtained. Assume there is a target plane locates at the distance
$z M$
from the ordinary image plane. Then for a pixel on DMD with size as
$h 0$
, the projected pattern size on the target plane defined as
$h M 2 '$
. Based on geometric optics, the pattern size
$h p$
on the test plane is constructed with the chief ray height and the circle of confusion, it could be expressed as
$h p = h 0 1 + z o Φ o ∙ 1 + Φ o ∙ z M z o ' Φ o - 1 + D X P 2 ∙ z M z o ' - 1 / Φ o$
For the extraordinary path, repeat the same calculation for the projected pattern size on the test plane
Then the corresponding image size for the extraordinary path projected pattern is
To numerically illustrate the depth range and the image size, we apply the calculation into a specific application. The system parameters are given in Table 2. The projection lens power is 51.58 diopters, the object size is the DMD pixel pitch which is 7.68μm. The rest parameters match with the system parameter in the following section of system optical design.
system parameters
$Φ o$
(diopters)
$h 0$
(μm)
$z o$
(mm)
(mm)
(mm)
t (mm)
51.58 7.68 -19.75 6.46 -17.7 0.46
Figure 4a plots the depth measurement range
$Δ z$
as a function of different cylindrical lens power
$Φ c$
. According to the plot, for the given projection lens with power around 50 diopters, in order to achieve the depth measurement range of 600mm, the cylindrical lens power should be around 5.9 diopters. Figure 4b shows the image size for one DMD pixel for the ordinary and extraordinary projection based on the system parameters in Table 2 and the cylindrical lens power was set as 5.9 diopters. When the test plane coincides with the ordinary projection image plane, the image size for ordinary projection path is the same as the object size because the imaging path is the reverse of the ordinary projection path. Then when the test plane distance increases, the image size for both extraordinary path and ordinary path increase due to the defocus.
Additionally, we choose the image sensor which works for the visible light (VIS) and NIR range. In this sense, the imaging path could capture the 2D RGB background image in the VIS range while capturing the depth map from the NIR channel. The 2D image of the test scene which could offer extra information for further process. Also, in the configuration, we chose the DMD to generate the projected pattern for flexibility. We choose the grid pattern with bars in orthogonal directions as an example to illustrate the depth measurement concept based on controlled astigmatism. The available pattern, however, is not limited to a grid, and any other patterns with orthogonal features, such as a crosses pattern or a checkboard pattern, which correspond to the ordinary and extraordinary paths, respectively, are applicable. Additionally the features in the orthogonal directions should be comparable or in other words have similar spatial frequencies as the pattern density determines the depth map resolution. By combining the depth map result from the different projected pattern, the depth sensor could potentially improve the depth map resolution.
2.2　Method for extended range of depth measurement
As described in the previous section, the depth measurement range
$Δ z$
is determined by the cylindrical lens power and the projection optics power, and is limited by selected combination of the two components in order to achieve high-resolution depth mapping. To extend the depth measurement range without compromising depth resolution, an electronically focus-tunable lens is utilized in the projection optics. Due to the nature of our co-axial configuration where the same projection optics is shared by the projection path and the imaging path, the focus-tunable lens would enable us to shift the focal depth of the projection and imaging path simultaneously along the axial direction. In order to maintain the same image size on the imaging sensor while shifting the focal depth, the projection optics should be designed as object space telecentric, and the electronically focus tunable lens is placed at the system stop. With this setup, when tuning the electronically focus-tunable power, the image size keeps the same. By changing the power of the focus-tunable lens, it can shift the focal depths of both the extraordinary and ordinary projection paths (Figure 5). Therefore, this depth sensor can extend the depth measurement range through focus-scanning.
Based on the system described in Table 2, the depth measurement range is around 600mm. After applying the electronically focus-tunable lens into the system and tuning the power, it will create multiple sub-ranges. All the sub-ranges are merged together to implement an extened range of deph measurement for the proposed depth sensor. Altho-ugh the depth range can be tuned continu-ously, Table 3 shows an example of utilizing the focus-tunable lens at three different power, 0.5 diopters, 0 diopters, and -0.35 diopters and the corresp-onding depth range being tunned the near range (340mm—550mm), middle range (400mm—1000 mm) and far range (700mm—2000mm), respectively.
Parameters change with tunable lens
Depth Range Near Middle Far
TL CVR (1/mm) 0.001709 0
$- 0.0012$
TL power (diopter) 0.5 0 -0.35
Oridinary focus distance (mm) 550 1000 2000
Extraordinary focus distance (mm) 340 400 700
XP position (mm) 1.4e+5 1e+10 -1.7e+5
Notes: TL: Tunable lens.
Meanwhile, the wavefront quality of the focus tunable lens impacts the astigmatic pattern image. Therefore, to achieve better accuracy for the depth measurement, we need to consider the residual aberrations of the liquid focus tunable lens where the spherical-shape membrane surfaces change their shapes with the applied electrical power. In the optical system design, by optimizing the system together with tunable lens we can compensate most of the residual aberrations such as spherical aberration. Besides the residual spherical aberration, however, some of the aberrations, such as coma, not only changes with the applied voltage but also with gravity or the orientation of the lens. The coma induced by the gravity coma may shorten the depth measurement range. One way to deal with the gravity effect is to mount the lens in an orientation perpendicular to the direction of gravity when possible. Otherwise, the coma induced by gravity to the liqud can be dealt with through careful calibration since it is a systematic aberration. Take the Optotune electronically focus tunable lens EL-16-40-TC-VIS-5D as an example. It has a RMS wavefront error of less than 0.25 waves over 80% of the 16mm clear aperture.
Compared with the wavefront error from the added astigmatism which could be more than 1.2 waves, the residual wavefront error without the gravity effect can be well compensated during optimization and can be neglected.
3 Optical system design
In this section, we illustrate the optical design of the proposed depth sensor. In the design example, the projection lens power
$Φ 0$
is 51.58 diopters (Table 2) , the cylindrical lens power is set as 5.9 diopters, and the focus-tunable lens power we use to optimize is set for three range, 0.5 diopters, 0, -0.35 diopters (Table 3). For the convenience of optimization, the projection lens was modeled in the reverse order from the scene of interest to the DMD. As listed in Table 4, the cylindrical lens power is 5.9 diopters, FOV is around 30°, the image space F-number is 3. The wavelengths used in the design are 0.486μm, 0.588μm, 0.656μm and 0.810μm.
Specification of the designed optical system
FOV Image space F/# Wavelengths Cylindrical lens power
$Φ c$
$± 15.6 °$
3 0.810μm (NIR), 0.486μm, 0.588μm, 0.656μm 5.9 diopters
Pattern generator Resolution Pixel pitch Working wavelength
DLP4500NIR 912×1140 7.6μm 700nm-2000nm
Image sensor Spectral range Spatial resolution Pixel size
Imec RGB-NIR 400nm-700nm, 750nm-850nm 1024×544 5.5×5.5um
Focus tunable lens Optical power Cover glass coating
Optotune EL-16-40-TC-VIS-5D -2 to +3 diopters 420nm－1500nm
In this design, we use a NIR DMD chip[11] (DLP4500NIR) to generate the designed projection pattern. It is a 0.45 WXGA Near-Infrared DMD with 912×1140 resolution array with each pixel pitch as 7.6 microns, and its window transmission efficiency is around 96% for the wavelength range from 700nm to 2000nm. For the choice of the image sensor, to capture both the projected pattern in NIR for depth map and a 2D RGB image for background at the same tim, we choose the Imec RGB-NIR multispectral image sensor[12] which covers the RGB color channel and a narrow NIR channel around 810nm. To extend the depth measurement range, an Optotune electronically focus-tunable lens[13] is placed at the system stop. This focus tunable lens can change the shape from a flat zero-state into a plano-concave or plano-convex lens, resulting in a focal tuning range from -2 diopters to 3 diopters.
Based on the system specifications summarized in Table 4, we designed and optimized a system consisting of 11 stock lenses, 1 commercially available focus-tunable lens, and a cylindrical lens. Figure 6 shows the optical layout for the extraordinary projection path. The cylindrical lens is cemented onto the beamsplitter to add extra power onto the extraordinary projection path. Figure 7 is the optical layout of the imaging path. The optical layout for the ordinary projection path is the same as the reverse of the imaging path and thus is omitted.
Based on the three different settings of the focus-tunable lens power in Table 3, Figure 8 plots the polychromatic modulation transfer function (MTF) of the image path performance and the results clear show that the design satisfies the resolution requirements of the selected sensor.
4 Depth measurement simulation
4.1　Depth measurement simulation system setup
To demonstrate the depth measurement capability of the proposed design of a depth sensor, a depth measurement simulation was implemented. The simulation setup is illustrated in Figure 9. In this depth measurement simulation, a resolution target was utilized as the desired projection pattern to clearly show the astigmatism effects of the structured light projection, and a white screen was utilized as a test scene for depth sensing with the screen position moving along the optical axis. Through each of the three different power settings of the focus-tunable lens listed in Table 3, the depth sensor creates three sub-ranges of depth measurements, namely, near, middle and far, which are illustrated as Red, Blue, and Green ranges in Figure 9. Within each sub-range, five different depths were sampled by placing the testing scene at the corresponding depths and the resulted image patterns captured by the NIR-VIS sensor are simulated for each sampled depth using the image simulation function available in Code V® optical design software. The overall sampled depth range covers a depth from 340mm to 2000mm.
Figure 10 shows an array of simulated image patterns obtained by the image sensor. The patterns are grouped into three rows, according to the corresponding depth ranges, near, middle, and far, respectively, and the 5 simulated patterns in each row correspond to the five sampled depths within each range. Each of the resolution target used for simulation consistes of four groups of tri-bars, corresponding to angular frequencies of 10, 15, 20, and 30 cycles/degree, or equivalently angular resolutions of 3, 2, 1.5, and 1arcmins, respectively, for the bar widths and gaps. Each group of the tri-bars consists of a vertical and a horizontal element for the purpose of demonstrating their different contrast change due to the induced astigmatism. Overall, when comparing the contrast change of these simulated patterns reflected by a test scene on each row, the vertical bars gradually gain increasing sharpness and image contrast and the horizontal bars become more blurry and lose their contrasts as the test depth is shifted axially from near to far within each of depth range determined by the setting of the focus-tunable lens power. The same observations can be made for the simulated patterns of all three rows corresponding to three different depth ranges configured by the tunable lens. The simulated results show adequate image contrast difference between the vertical and horizontal features of different spatial frequencies as even the smallest tri-bar groups show high image contrast difference between the vertical and horizontal directions at each of the 25 sampled depths, which is essential for the proposed depth mapping method. The simulated patterns also clearly demonstrate high optical resolution of the optical system as predicted by the MTF assessment in Figure 8 for the three different depth range.
To quantify the image contrast difference between vertical and horizontal features, we selected the tri-bar elements in group 20 as examples and they are marked with red dashline box in Figure 10, and calculated the ratio between tangential contrast and the sagittal contrast (i.e., the vertical and horizontal bars). Figure 11 plots the results of the contrast ratio as a function of scene depth for each of three sampled depth ranges. A large contrast ratio, varying from 0.5 to about 1.8, is observed for all the sampled depths within each depth range, and such a large contrast ratio between horizontal and vertical features ensures reliable extraction of depth information through the proposed method. Furthermore, the relationship in Figure 11 can be used as a depth calibration reference where depth measurement of an object may be inferred by measuring the contrast ratio of orthogonal feasures and comparing it with a reference table extracted from reference targets.
In the future work of conducting experiments, the major challenge which impact the depth measurement accuracy is the calibration of contrast ratio to depth. For a given depth, the corresponding contrast ratio varies with different fields. Therefore, in order to cover the whole field with small increment, the calibration process might be operated multiple time with shifted projection patterns. Also, based on the calibration result, the repeatability of focus tunable lens might impact the accuracy of measurement.
6 Conclusion
In this paper, we presented the design of a co-axial depth map sensor adapted from a structured-light method where the controlled astigmatic aberration is induced to the projected pattern. Additionally, this depth sensor has a co-axial configuration which can be more compact than the conventional stereo and structured light depth sensors, while achieves a higher-resolution depth mapping than that of a ToF method. Finally, the proposed depth sensor design can achieve a significantly extended range of depth measurement by introducing an electronically focus-tunable lens to the shared projection/imaging path. These functionalities and features have been demonstrated through image simulation and analysis.

Reference

1.

Davis J, Ramamoorthi R, Rusinkiewicz S. Spacetime stereo: a unifying framework for depth from triangulation. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Madison, WI, USA, IEEE, 2003 DOI:10.1109/cvpr.2003.1211491

2.

Valkenburg R J, McIvor A M. Accurate 3D measurement using a structured light system. Image and Vision Computing, 1998, 16(2): 99–110 DOI:10.1016/S0262-8856(97)00053-X

3.

Geng J. Structured-light 3D surface imaging: a tutorial. Advances in Optics and Photonics, 2011, 3(2): 128–160 DOI:10.1364/aop.3.000128

4.

Kollorz E, Penne J, Hornegger J, Barke A. Gesture recognition with a Time-Of-Flight camera. International Journal of Intelligent Systems Technologies and Applications, 2008, 5(3/4): 334 DOI:10.1504/ijista.2008.021296

5.

Yahav G, Iddan G J, Mandelboum D. 3D imaging camera for gaming application. In: 2007 Digest of Technical Papers International Conference on Consumer Electronics. Las Vegas, NV, USA, IEEE, 2007, 1–2 DOI: 10.1109/ICCE.2007.341537

6.

Lange R, Seitz P. Solid-state time-of-flight range camera. IEEE Journal of Quantum Electronics, 2001, 37(3): 390–397 DOI:10.1109/3.910448

7.

Kawahito S, Halin I A, Ushinaga T, Sawada T, Homma M, Maeda Y. A CMOS time-of-flight range image sensor with Gates-on-field-oxide structure. IEEE Sensors Journal, 2007, 7(12): 1578–1586 DOI:10.1109/jsen.2007.907561

8.

Ganapathi V, Plagemann C, Koller D, Thrun S. Real time motion capture using a single time-of-flight camera. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, CA, USA, IEEE, 2010, 755–762 DOI: 10.1109/CVPR.2010.5540141

9.

Gokturk SB, Yalcin H, Bamji C. A time-of-flight depth sensor-system description, issues and solutions. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop. Washington, DC, USA, USA, IEEE, 2004, 35 DOI: 10.1109/CVPR.2004.291

10.

Birch G C, Tyo J S, Schwiegerling J. Depth measurements through controlled aberrations of projected patterns. Optics Express, 2012, 20(6): 6561 DOI:10.1364/oe.20.006561

11.

DLP4500NIR. http://www.ti.com/product/DLP4500NIR

12.

Imec RGB-NIR image sensor. https://www.imec-int.com/en/hyperspectral-imaging

13.

EL-16-40-TC. https://www.optotune.com/products/focus-tunable-lenses/electrical-lens-el-16-40-tc