3D Object-Oriented Learning: An End-to-end Transformation-Disentangled 3D Representation

Liao, Qianli; Poggio, Tomaso

Author(s)

Liao, Qianli; Poggio, Tomaso

DownloadCBMM-Memo-075.pdf (980.4Kb)

Terms of use

Attribution-NonCommercial-ShareAlike 3.0 United States http://creativecommons.org/licenses/by-nc-sa/3.0/us/

Metadata

Show full item record

Abstract

We provide more detailed explanation of the ideas behind a recent paper on “Object-Oriented Deep Learning” [1] and extend it to handle 3D inputs/outputs. Similar to [1], every layer of the system takes in a list of “objects/symbols”, processes it and outputs another list of objects/symbols. In this report, the properties of the objects/symbols are extended to contain 3D information — including 3D orientations (i.e., rotation quaternion or yaw, pitch and roll) and one extra coordinate dimension (z-axis or depth). The resultant model is a novel end-to-end interpretable 3D representation that systematically factors out common 3D transformations such as translation and 3D rotation. As first proposed by [1] and discussed in more detail in [2], it offers a “symbolic disentanglement” solution to the problem of transformation invariance/equivariance. To demonstrate the effectiveness of the model, we show that it can achieve perfect performance on the task of 3D invariant recognition by training on one rotation of a 3D object and test it on 3D rotations (i.e., at arbitrary angles of yaw, pitch and roll). Furthermore, in a more realistic case where depth information is not given (similar to viewpoint invariant object recognition from 2D vision) our model generalizes reasonably well to novel viewpoints while ConvNets fail to generalize.

Date issued

2017-12-31

URI

http://hdl.handle.net/1721.1/113002

Series/Report no.

CBMM Memo Series;075

Collections

CBMM Memo Series

The following license files are associated with this item:

Creative Commons