## Abstract

This paper presents a time-of-flight (ToF) measurement method for use in foggy weather. The depth measured by a ToF camera is greatly distorted in fog because the light scattered in the fog reaches the camera much faster than the target reflection. We reveal that the multi-frequency measurements contain a cue whether two arbitrary pixels have the same depth. After clustering the same depth pixels using this cue, the original depth can be recovered for each cluster by line fitting in the Cartesian coordinate frame. The effectiveness of this method is evaluated numerically via real-world and road-scale experiments.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Depth measurement plays an important role in many applications such as autonomous driving, manufacturing, and scientific research. Time-of-flight (ToF) is a common depth sensing method and is widely used. It is, however, still difficult for use in complicated scenes such as through smoke/fog because the light is scattered and the signal is contaminated.

Figure 1 shows the actual measurement of a ToF camera with and without fog. The ToF camera can capture the amplitude (which corresponds to the reflected intensity) and the phase (which corresponds to the depth) with $2\pi $ ambiguity, as shown in Fig. 1(a) and (c). In each region of the signboards, the phase is uniform because the depth is the same. In contrast, both the amplitude and phase are greatly distorted through the fog as shown in Fig. 1(e) and (f). Interestingly, the phase reflects the reflectance of the object despite the depth being the same, and the order of the amplitude is not preserved.

In this paper, the aim is to correctly measure the depth through fog using a ToF camera. A key idea is that if two points at the same depth can be measured, the effect of the fog is easily canceled because the fog effect can be interpreted as just a translation of phasors in the Cartesian coordinate frame. The major challenge is how to extract those pixels from the distorted observation. Pixels that are at almost the same depth can be extracted using multiple modulation frequencies. We reveal that the difference of the phasor is constant with respect to the modulation frequency if the points are at the same depth. Otherwise, it varies along with the modulation frequency regardless of the density of the fog. Using this difference, pixels at a similar depth can be clustered. Once it is determined that the pixels are at the same depth, the correct depth of the cluster can be calculated by line fitting.

The contributions of this paper are twofold. First, we reveal that multiple frequency observations indicate whether the depths of two arbitrary pixels are equal. This does not require any hardware modifications hence off-the-shelf ToF cameras such as Kinect can be used to implement this method. Second, it is shown that once the depth of the pixels is known to be the same, the scene depth can be recovered using a similar pipeline to that of ordinary ToF unwrapping. The method is evaluated with real-world and road-scale experiments with different ToF cameras, and the effectiveness of this method is confirmed.

## 2. Related work

In the computer vision field, solutions for recovering a clear view through bad weather such as haze and fog have been actively developed and yet, it remains an open problem. A typical approach is to use a dark channel prior [1, 2], which assumes one of the RGB channels should be dark in natural objects. Berman *et al*. [3] remove haze effects by clustering similar colors throughout the image and fitting haze-lines in the RGB space. While color-based methods sometimes struggle with artificial objects that are white or gray, our method does not depend on color information but instead uses temporal information. Another approach for seeing through fog is to use a geometric constraint. O’Toole *et al*. [4] optically mask the light rays using a synchronized projector-camera system so that only the reflection from a designated depth is captured. A similar idea is implemented on a galvo and linear sensor system [5]. While these methods capture an image slice at designated depth, our method recovers the depths of a foggy scene.

Light-in-flight imaging, also known as transient imaging, conveys rich information of the imaged scene. The reader is referred to the recent survey by Jarabo *et al*. [6] for a comprehensive overview. Light-in-flight imagery can be obtained by an interferometer [7], holography [8, 9], femtosecond-pulsed laser [10–12], and single photon avalanche diode sensor (SPAD) [13]. Light-in-flight can be also recovered using the ToF camera, with which the cost can be significantly reduced while the temporal resolution becomes lower. Light-in-flight can be obtained by frequency and phase sweep [14–17] and optical coding [18, 19]. When the ultra-high speed response is obtained, removing the fog effect is much easier to achieve. Indeed, SPAD observations are applied for seeing through fog [20] by analyzing the distribution of photon arrivals. Our method bypasses the exact recovery of the time domain responses and simply uses a ToF camera without any hardware modifications.

Multi-path interference is a problem of ToF measurement. Waves of multiple paths caused by scattering and inter-reflection interfere and superpose into a single wave hence the measured depth becomes unreliable. Recovering the correct depth of the multi-path scene is of broad interest and has been studied using the 2-bounce model [21–23], parametric model [24, 25], *K*-sparsity [26, 27], simplified indirect reflection model [28], frequency analysis [29], consistency between ToF and stereo [30], and material database [31]. Our method falls into this class but is unique with regards to using both temporal and spatial information.

## 3. Time-of-Flight observations in the fog

An amplitude modulated continuous wave (AMCW) ToF camera illuminates the scene using amplitude modulated light at the frequency *f*. It measures the amplitude *a* and phase delay *ϕ* of the returned wave for each pixel *x* as

*r*is the intensity of the reflected light,

*d*is the depth of the scene, and

*c*is the speed of light. Ideally, the amplitude is the same regardless of the frequency and the phase is proportional to the depth and the frequency.

Followed by the phasor representation [32], these measurements can be represented by a single phasor $p\left(f,x\right)\in \u2102$ as shown in Fig. 2 as

When the scene has volume scattering effects caused by fog and smoke, the observed phasor is the sum of all contributions of multiple light paths. In this paper, we assume that fog is spatially uniform. In this case, the observed phasor *p* is distorted as

*p*and

_{s}*p*are the phasors of the volume scattering and the target object components, respectively. $\mathcal{L}\left(d\left(x\right)\right)$ is a set of light paths travelling in the fog in front of the object, $s\left(l\right)$ and $d\left(l\right)$ are respectively the intensity and the travel distance of the path

_{t}*l*, and $\tilde{a}\left(x\right)$ is the decayed amplitude of the object reflection, which is not a function of the frequency. From the geometrical perspective, a phasor corresponds to a vector in the Cartesian coordinate frame and the observation corresponds to the sum of vectors as shown in Fig. 3. Obviously, the observed phasor is distorted hence the depth cannot be accurately recovered from foggy scenes. We call this distortion as

*Phasor distortion*. When the amplitude of the original phasor is changed, the distorted phasor varies accordingly, hence the phase reflects the distribution of the reflectance as shown in Fig. 1(f).

#### Multiple frequency measurements

The phase measured by ToF camera has the $2\pi $ ambiguity as known as the wrapping problem. The typical approach to unwrap the phase is to use different modulation frequencies and extend the phase to the lower frequency, which is the greatest common divisor of multiple frequencies. We also utilize multiple frequencies for defogging.

If the target object is clearly seen, the amplitude of the phasor is the same and the phase is proportional to the frequency as

*f*are different frequencies. A complicating factor with the foggy scene is that the phasors measured with different frequencies do not hold to this relationship because the phasor of the fog component is the sum of all possible paths. Figure 4 shows the geometric illustration. When the frequency becomes double, the amplitude of the object is same and the phase becomes double as shown in Fig. 4(a). This relationship, however, does not hold for the fog component as shown in Fig. 4(b). Thus, the observation does not hold as well as shown in Fig. 4(c). Recovering the original depth of the object is an ill-posed problem because we have 2

_{k}*n*observations (

*n*phasors) for

*n*frequencies while the unknowns are $2n+2$ (

*n*fog phasors plus the object phasor).

## 4. Depth recovery method

Our goal is to recover the depth of the scene through fog from phasor observations of multiple frequencies. As explained above, recovering the correct depth pixel by pixel is an ill-posed problem. To make the problem tractable, we use multiple pixels to recover the depth. The key idea is that if we can get multiple pixels whose phasor distortion due to fog are same, the number of unknowns becomes less than the number of observations. We adopt a two-step approach. First, clusters of the same depth pixels are identified. The second step is recovering the depth of each cluster.

#### 4.1. Clustering pixels

A key observation of this work is that the difference of two pixels is constant regardless of frequency if the depths of those two pixels are equal, otherwise, the difference varies with respect to the frequency. The distance $\delta \left(f,{x}_{1},{x}_{2}\right)$ of two phasors $p\left(f,{x}_{1}\right)$ and $p\left(f,{x}_{2}\right)$ is represented as

If the depth is the same as $d\left({x}_{1}\right)=d\left({x}_{2}\right)=\tilde{d}$, fog components are cancelled because they are assumed to be equal. The difference is now expressed as

*f*and the fog effect as shown in Fig. 5(a) and (b).

On the other hand, if depth values for two pixels *x*_{1} and *x*_{3} are different, the distance of two phasors varies with respect to the frequency as

*f*as

For evaluating the constancy of the difference, the standard deviation of the distance is used along with *n* frequencies as

All pixels are clustered using thresholding as

#### 4.2. Depth recovery for each cluster

Let the depth of each cluster ${\mathcal{C}}_{k}$ be $d\left({x}_{i}\right)=d\left({x}_{j}\right)=\tilde{d}$, where ${x}_{i},{x}_{j}\in {\mathcal{C}}_{k}$. The original phase of each cluster can be recovered by line fitting because the difference vector of the phasors is parallel to the original phasor as shown in Fig. 6(a). The slope *θ _{k}* of a line that connects two phasors in the Cartesian coordinate frame can be written as

*θ*has

_{k}*π*ambiguity because we do not know the original amplitudes of the two pixels. In practice, the line is fit from multiple pixels in the cluster. The slope for each cluster and frequency can be estimated using any line fitting algorithm in the Cartesian coordinate as shown in Fig. 6(b).

Estimating the depth from multiple slopes is the same problem encountered in ordinary ToF unwrapping. The depth of the *k*-th cluster $\widehat{d}\left({\mathcal{C}}_{k}\right)$ can be estimated using a lookup table search as

*f*for the

*k*-th cluster obtained by line fitting.

### Masking background

Neither our method nor the ordinary ToF cameras cannot recover the depth of very dark regions. Traditional applications construct a mask of the image by thresholding the amplitude as a confidence. It is, however, difficult to mask the low-confidence region with the amplitude because the fog tends to increase the image brightness.

If there is no returned light from the target object, the observed phasor consists of the fog component only. When one of the pixels is in the background region, the distance of the phasor can be written as

### Depth resolution and the threshold

The depth resolution is determined by the threshold, the original amplitude, and the bandwidth of multiple frequencies. From Eq. (7), the minimum and maximum values of the slightly different depth pixels are represented as

_{d}is assumed to be sufficiently small such that $\frac{4\pi {f}_{max}{\mathrm{\Delta}}_{d}}{c}$ is less than

*π*.

In the case of that the distance *δ* is uniformly distributed between ${v}_{\mathrm{min}\text{}}$ and ${v}_{\mathrm{max}\text{}}$, the threshold *t* that is less than the standard deviation of the uniform distribution as

_{d}.

## 5. Experiments

We conduct two different experiments with different ToF cameras and different depth scales to show our method works on any AMCW ToF cameras and for both lab-scale and road-scale environments.

Firstly, a paper-craft scene is measured using a fog machine that generates fog by combining chemical substances and water. We use a ToF camera manufactured by Texas Instruments (OPT8241-CDK-EVM) for this experiment. Used frequencies are 10MHz to 22MHz step by 3 MHz. The scene is shown in Fig. 7(a). The ground-truth depth is measured before generating the smoke. While the depth measured by the ordinary method is distorted and the reflectance is appeared in the depth image as shown in Fig. 7(d), our method recovers globally plausible depth pattern and the smooth depth on the car region as shown in Fig. 7(e). However, the right side of the image is not recovered very well because the visibility at this region is extremely low due to the low illumination power as shown in Fig. 7(c). While our method assumes that the amplitude of all frequencies are constant, it is observed that the measured amplitude becomes lower as the higher frequency. We did not carry any calibration for this effect, hence the result contains the error due to the lack of calibration.

Secondly, a road-scale scene is captured in a car testing site as shown in Fig. 8(a). Fog is generated from water and the density is controllable. A Kinect v2 is used as a ToF camera for this experiment, on which three modulation frequencies are available (These data can be extracted using a public tool [31].) Experimental results are shown in Fig. 8. A printed traffic signs was placed alongside with a car. The ground truth depth is captured before the fog is generated as shown in Fig. 8(b) and (f). The depth of the car body is not measured correctly because ToF cameras suffer from weak reflectance. The fog is generated as shown in Fig. 8(c). Through the fog, the ordinary depth measurement is completely disturbed as shown in Fig. 8(d). Especially, the traffic signs are affected by the reflectance and completely distorted depth is recovered. On the other hand, our method recovers the plausible depth as shown in Fig. 8(e). The uniform depth is estimated correctly on those traffic sign objects. The black pixels show the estimated mask, where a car region cannot be recovered due to low reflectance as shown in Fig. 8(b). The numerical values are summarized in Table 1, in which the depths of the five traffic signs are shown. The error is much smaller than the ordinary ToF measurement.

## 6. Conclusion

A method is presented for measuring the scene depth through fog using off-the-shelf ToF cameras. A key finding is that the original phase can be easily recovered when we a pixel set of the same depth is known. To find the same depth pixels, the constancy of the difference of the phasor along with the multiple frequencies are evaluated. The effectiveness of the proposed method is quantitatively and qualitatively evaluated via real-world experiments.

We assume that the objects at the same depth have different reflectance. Otherwise, the line fitting does not work and the depth cannot be recovered. We think this assumption is not critical for the real world scenes that is colorful.

We do not care about the selection of the frequency. Currently, the frequencies are arbitrary selected. The only constraint is that the frequency should be available on the device. For example, the frequency set of Kinect is not selectable thus we use the preset frequencies. Investigating the best frequency for defogging is an interesting direction of the research.

A remaining problem is the computational cost. The clustering part is computationally expensive because it requires an all pixels comparison for each pixel. Although parallel processing is easily employed, improving algorithmic efficiency is an important topic for future research.

It is expected that our method work for any thin scattering media. In this paper, we show two different fog-like medium. Under water imaging is another interesting application of this method. The limitation of this method is that the density of the medium should be low so that the reflectance variation is visible. The maximum range of this method is limited by the visibility of the medium.

While homogeneous fog is assumed for the entire scene, it is possible to extend the method to allow for heterogeneous fog scenes as the clustering part is interpreted as finding the pixels at the same depth with the same fog effect. In other words, the same depth but different fog pixels are detected as the different clusters. This is an viable future direction for the development of this method.

## Funding

Japan Science and Technology Agency (JST) CREST JPMJCR1764; Japan Society for the Promotion of Science (JSPS) KAKENHI JP18H03265; and Koito Manufacturing.

## References

**1. **K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell. **33**, 2341–2353 (2011). [CrossRef]

**2. **K. Nishino, L. Kratz, and S. Lombardi, “Baysian defogging,” Int. J. Comput. Vis. **98**, 263–278 (2012). [CrossRef]

**3. **D. Berman, T. Treibitz, and S. Avidan, “Non-local image dehazing,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), pp. 1674–1682.

**4. **M. O’Toole, S. Achar, S. G. Narasimhan, and K. N. Kutulakos, “Homogeneous codes for energy-efficient illumination and imaging,” ACM Trans. Graph. **34**, 35 (2015).

**5. **J. Wang, J. Bartels, W. Whittaker, A. C. Sankaranarayanan, and S. G. Narasimhan, “Programmable triangulation light curtains,” in Proceedings of European Conference on Computer Vision, (Springer, 2018), pp. 19–34.

**6. **A. Jarabo, B. Masia, J. Marco, and D. Gutierrez, “Recent advances in transient imaging: A computer graphics and vision perspective,” Vis. Informatics **1**, 65–79 (2017). [CrossRef]

**7. **I. Gkioulekas, A. Levin, F. Durand, and T. Zickler, “Micron-scale light transport decomposition using interferometry,” ACM Trans. Graph. **34**, 37 (2015). [CrossRef]

**8. **N. Abramson, “Light-in-flight recording by holography,” Opt. Lett. **3**, 121 (1978). [CrossRef] [PubMed]

**9. **T. Kakue, K. Tosa, J. Yuasa, T. Tahara, Y. Awatsuji, K. Nishio, S. Ura, and T. Kubota, “Digital light-in-flight recording by holography by use of a femtosecond pulsed laser,” IEEE J. Sel. Top. Quantum Electron. **18**, 479–485 (2012). [CrossRef]

**10. **A. Velten, D. Wu, A. Jarabo, B. Masia, C. Barsi, C. Joshi, E. Lawson, M. Bawendi, D. Gutierrez, and R. Raskar, “Femto-photography: Capturing and visualizing the propagation of light,” ACM Trans. Graph. **32**, 44 (2013). [CrossRef]

**11. **A. Kirmani, T. Hutchison, J. Davis, and R. Raskar, “Looking around the corner using ultrafast transient imaging,” Int. J. Comput. Vis. **95**, 13–28 (2011). [CrossRef]

**12. **A. Velten, T. Willwacher, O. Gupta, A. Veeraraghavan, M. G. Bawendi, and R. Raskar, “Recovering three-dimensional shape around a corner using ultrafast time-of-flight imaging,” Nat. Commun. **3**, 745 (2012). [CrossRef] [PubMed]

**13. **M. O’Toole, F. Heide, D. Lindell, K. Zang, S. Diamond, and G. Wetzstein, “Reconstructing transient images from single-photon sensors,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2017), pp. 2289–2297.

**14. **F. Heide, M. B. Hullin, J. Gregson, and W. Heidrich, “Low-budget transient imaging using photonic mixer devices,” ACM Trans. Graph. **32**, 45 (2013). [CrossRef]

**15. **J. Lin, Y. Liu, M. B. Hullin, and Q. Dai, “Fourier analysis on transient imaging with a multifrequency time-of-flight camera,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2014), pp. 3230–3237.

**16. **C. Peters, J. Klein, M. B. Hullin, and R. Klein, “Solving trigonometric moment problems for fast transient imaging,” ACM Trans. Graph. **34**, 220 (2015). [CrossRef]

**17. **K. Kitano, T. Okamoto, K. Tanaka, T. Aoto, H. Kubo, T. Funatomi, and Y. Mukaigawa, “Recovering temporal psf using tof camera with delayed light emission,,” IPSJ Trans. Comput. Vis. Appl. **9**, 15 (2017). [CrossRef]

**18. **A. Kadambi, R. Whyte, A. Bhandari, L. Streeter, C. Barsi, A. Dorrington, and R. Raskar, “Coded time of flight cameras: Sparse deconvolution to address multipath interference and recover time profiles,” ACM Trans. Graph. **32**, 1–10 (2013). [CrossRef]

**19. **M. O’Toole, F. Heide, L. Xiao, M. B. Hullin, W. Heidrich, and K. N. Kutulakos, “Temporal frequency probing for 5d transient analysis of global light transport,” ACM Trans. Graph. **33**, 1–11 (2014). [CrossRef]

**20. **G. Satat, M. Tancik, and R. Raskar, “Towards photography through realistic fog,” in Proceedings of IEEE International Conference on Computational Photography, (IEEE, 2018), pp. 1–10.

**21. **S. Fuchs, “Multipath interference compensation in time-of-flight camera images,” in International Conference on Pattern Recognition, (IEEE, 2010), pp. 3583–3586.

**22. **A. A. Dorrington, J. P. Godbaz, M. J. Cree, A. D. Payne, and L. V. Streeter, “Separating true range measurements from multi-path and scattering interference in commercial range cameras,” in Proceedings of SPIE7864, (SPIE, 2011).

**23. **D. Jimenez, D. Pizarro, M. Mazo, and S. Palazuelos, “Modelling and correction of multipath interference in time of flight cameras,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2012), pp. 893–900.

**24. **F. Heide, L. Xiao, A. Kolb, M. B. Hullin, and W. Heidrich, “Imaging in scattering media using correlation image sensors and sparse convolutional coding,” Opt. Express **22**, 26338–50 (2014). [CrossRef] [PubMed]

**25. **A. Kirmani, A. Benedetti, and P. A. Chou, “Spumic: Simultaneous phase unwrapping and multipath interference cancellation in time-of-flight cameras using spectral methods,” in Proceedings of IEEE International Conference on Multimedia and Expo, (IEEE, 2013), pp. 1–6.

**26. **D. Freedman, E. Krupka, Y. Smolin, I. Leichter, and M. Schmidt, “SRA: fast removal of general multipath for ToF sensors,” in Proceedings of European Conference on Computer Vision, (Springer, 2014), pp. 1–15.

**27. **H. Qiao, J. Lin, Y. Liu, M. B. Hullin, and Q. Dai, “Resolving transient time profile in tof imaging via log-sum sparse regularization,” Opt. Lett. **40**, 918–921 (2015). [CrossRef] [PubMed]

**28. **N. Naik, A. Kadambi, C. Rhemann, S. Izadi, R. Raskar, and S. Bing Kang, “A light transport model for mitigating multipath interference in time-of-flight sensors,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2015), pp. 73–81.

**29. **A. Kadambi, J. Schiel, and R. Raskar, “Macroscopic interferometry: Rethinking depth estimation with frequency-domain time-of-flight,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 2016), pp. 893–902.

**30. **S. Lee and H. Shim, “Skewed stereo time-of-flight camera for translucent object imaging,” Image Vis. Comput. **43**, 27–38 (2015). [CrossRef]

**31. **K. Tanaka, Y. Mukaigawa, T. Funatomi, H. Kubo, Y. Matsushita, and Y. Yagi, “Material classification from time-of-flight distortions,” IEEE Trans. Pattern Anal. Mach. Intell. (2018). [PubMed]

**32. **M. Gupta, S. K. Nayar, M. B. Hullin, and J. Martin, “Phasor imaging: a generalization of correlation-based time-of-flight imaging,” ACM Trans. Graph. **34**, 156 (2015). [CrossRef]