NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields

Yen-Chen, Lin; Florence, Pete; Barron, Jonathan T.; Lin, Tsung-Yi; Rodriguez, Alberto; Isola, Phillip

dc.contributor.author	Yen-Chen, Lin
dc.contributor.author	Florence, Pete
dc.contributor.author	Barron, Jonathan T.
dc.contributor.author	Lin, Tsung-Yi
dc.contributor.author	Rodriguez, Alberto
dc.contributor.author	Isola, Phillip
dc.date.accessioned	2024-03-08T16:58:27Z
dc.date.available	2024-03-08T16:58:27Z
dc.date.issued	2022-05-23
dc.identifier.uri	https://hdl.handle.net/1721.1/153644
dc.description	2022 International Conference on Robotics and Automation (ICRA) 23-27 May 2022	en_US
dc.description.abstract	Thin, reflective objects such as forks and whisks are common in our daily lives, but they are particularly chal-lenging for robot perception because it is hard to reconstruct them using commodity RGB-D cameras or multi-view stereo techniques. While traditional pipelines struggle with objects like these, Neural Radiance Fields (NeRFs) have recently been shown to be remarkably effective for performing view synthesis on objects with thin structures or reflective materials. In this paper we explore the use of NeRF as a new source of supervision for robust robot vision systems. In particular, we demonstrate that a NeRF representation of a scene can be used to train dense object descriptors. We use an optimized NeRF to extract dense correspondences between multiple views of an object, and then use these correspondences as training data for learning a view-invariant representation of the object. NeRF's usage of a density field allows us to reformulate the correspondence problem with a novel distribution-of-depths formulation, as opposed to the conventional approach of using a depth map. Dense correspondence models supervised with our method significantly outperform off-the-shelf learned descriptors by 106% (PCK@3px metric, more than doubling performance) and outperform our baseline supervised with multi-view stereo by 29%. Furthermore, we demonstrate the learned dense descriptors enable robots to perform accurate 6-degree of freedom (6-DoF) pick and place of thin and reflective objects.	en_US
dc.language.iso	en_US
dc.publisher	IEEE	en_US
dc.relation.isversionof	10.1109/icra46639.2022.9812291	en_US
dc.rights	Creative Commons Attribution-Noncommercial-ShareAlike	en_US
dc.rights	Attribution-NonCommercial-ShareAlike 4.0 International	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	IEEE	en_US
dc.title	NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields	en_US
dc.type	Article	en_US
dc.identifier.citation	Yen-Chen, Lin, Florence, Pete, Barron, Jonathan T., Lin, Tsung-Yi, Rodriguez, Alberto et al. 2022. "NeRF-Supervision: Learning Dense Object Descriptors from Neural Radiance Fields."
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.contributor.department	Massachusetts Institute of Technology. Department of Mechanical Engineering
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dspace.date.submission	2024-03-08T16:55:47Z
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: license_rdf
Size:: 1.006Kb
Format:: application/rdf+xml

View/Open

Name:: 2203.01913v1.pdf
Size:: 14.86Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record