Consistent depth of moving objects in video

Zhang, Zhoutong; Cole, Forrester; Tucker, Richard; Freeman, William T.; Dekel, Tali

Author(s)

Zhang, Zhoutong; Cole, Forrester; Tucker, Richard; Freeman, William T.; Dekel, Tali

Download3450626.3459871.pdf (24.42Mb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/

Metadata

Show full item record

Abstract

We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera. We seek a geometrically and temporally consistent solution to this under-constrained problem: the depth predictions of corresponding points across frames should induce plausible, smooth motion in 3D. We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction MLP over the entire input video. By recursively unrolling the scene-flow prediction MLP over varying time steps, we compute both short-range scene flow to impose local smooth motion priors directly in 3D, and long-range scene flow to impose multi-view consistency constraints with wide baselines. We demonstrate accurate and temporally coherent results on a variety of challenging videos containing diverse moving objects (pets, people, cars), as well as camera motion. Our depth maps give rise to a number of depth-and-motion aware video editing effects such as object and lighting insertion.

Date issued

2021-08

URI

https://hdl.handle.net/1721.1/136702

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory

Publisher

Association for Computing Machinery (ACM)

Citation

ACM Transactions on Graphics, Volume 40, Issue 4August 2021 Article No.: 148 pp 1–12

Version: Final published version

ISSN

0730-0301

1557-7368

Keywords

Computer Graphics and Computer-Aided Design

Collections

MIT Open Access Articles