A deep representation for invariance and music classification

Zhang, Chiyuan; Evangelopoulos, Georgios; Voinea, Stephen Constantin; Rosasco, Lorenzo Andrea; Poggio, Tomaso A.

dc.contributor.author	Zhang, Chiyuan
dc.contributor.author	Evangelopoulos, Georgios
dc.contributor.author	Voinea, Stephen Constantin
dc.contributor.author	Rosasco, Lorenzo Andrea
dc.contributor.author	Poggio, Tomaso A.
dc.date.accessioned	2016-05-13T18:51:27Z
dc.date.available	2016-05-13T18:51:27Z
dc.date.issued	2014-05
dc.identifier.isbn	978-1-4799-2893-4
dc.identifier.issn	1520-6149
dc.identifier.uri	http://hdl.handle.net/1721.1/102485
dc.description.abstract	Representations in the auditory cortex might be based on mechanisms similar to the visual ventral stream; modules for building invariance to transformations and multiple layers for compositionality and selectivity. In this paper we propose the use of such computational modules for extracting invariant and discriminative audio representations. Building on a theory of invariance in hierarchical architectures, we propose a novel, mid-level representation for acoustical signals, using the empirical distributions of projections on a set of templates and their transformations. Under the assumption that, by construction, this dictionary of templates is composed from similar classes, and samples the orbit of variance-inducing signal transformations (such as shift and scale), the resulting signature is theoretically guaranteed to be unique, invariant to transformations and stable to deformations. Modules of projection and pooling can then constitute layers of deep networks, for learning composite representations. We present the main theoretical and computational aspects of a framework for unsupervised learning of invariant audio representations, empirically evaluated on music genre classification.	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (STC Center for Brains, Minds and Machines Award CCF-1231216)	en_US
dc.description.sponsorship	Italian Ministry of Education (University and Research FIRB Project RBFR12M3AC)	en_US
dc.language.iso	en_US
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	en_US
dc.relation.isversionof	http://dx.doi.org/10.1109/ICASSP.2014.6854954	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	A deep representation for invariance and music classification	en_US
dc.type	Article	en_US
dc.identifier.citation	Zhang, Chiyuan, Georgios Evangelopoulos, Stephen Voinea, Lorenzo Rosasco, and Tomaso Poggio. “A Deep Representation for Invariance and Music Classification.” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (May 2014).	en_US
dc.contributor.department	Center for Brains, Minds, and Machines	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.department	McGovern Institute for Brain Research at MIT	en_US
dc.contributor.mitauthor	Zhang, Chiyuan	en_US
dc.contributor.mitauthor	Evangelopoulos, Georgios	en_US
dc.contributor.mitauthor	Voinea, Stephen Constantin	en_US
dc.contributor.mitauthor	Rosasco, Lorenzo Andrea	en_US
dc.contributor.mitauthor	Poggio, Tomaso A.	en_US
dc.relation.journal	Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dspace.orderedauthors	Zhang, Chiyuan; Evangelopoulos, Georgios; Voinea, Stephen; Rosasco, Lorenzo; Poggio, Tomaso	en_US
dspace.embargo.terms	N	en_US
dc.identifier.orcid	https://orcid.org/0000-0001-8467-1888
dc.identifier.orcid	https://orcid.org/0000-0002-3944-0455
dc.identifier.orcid	https://orcid.org/0000-0001-6376-4786
dc.identifier.orcid	https://orcid.org/0000-0003-2240-1801
dc.identifier.orcid	https://orcid.org/0000-0002-5727-9941
mit.license	OPEN_ACCESS_POLICY	en_US

Files in this item

Name:: Poggio_A deep.pdf
Size:: 1.549Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record