Learning Mid-Level Auditory Codes from Natural Sound Statistics

Mlynarski, Wiktor; McDermott, Josh

dc.contributor.author	Mlynarski, Wiktor
dc.contributor.author	McDermott, Josh
dc.date.accessioned	2017-01-25T20:38:48Z
dc.date.available	2017-01-25T20:38:48Z
dc.date.issued	2017-01-25
dc.identifier.uri	http://hdl.handle.net/1721.1/106624
dc.description.abstract	Interaction with the world requires an organism to transform sensory signals into representations in which behaviorally meaningful properties of the environment are made explicit. These representations are derived through cascades of neuronal processing stages in which neurons at each stage recode the output of preceding stages. Explanations of sensory coding may thus involve understanding how low-level patterns are combined into more complex structures. Although models exist in the visual domain to explain how mid-level features such as junctions and curves might be derived from oriented filters in early visual cortex, little is known about analogous grouping principles for mid-level auditory representations. We propose a hierarchical generative model of natural sounds that learns combina- tions of spectrotemporal features from natural stimulus statistics. In the first layer the model forms a sparse convolutional code of spectrograms using a dictionary of learned spectrotemporal kernels. To generalize from specific kernel activation patterns, the second layer encodes patterns of time-varying magnitude of multiple first layer coefficients. Because second-layer features are sensitive to combi- nations of spectrotemporal features, the representation they support encodes more complex acoustic patterns than the first layer. When trained on corpora of speech and environmental sounds, some second-layer units learned to group spectrotemporal features that occur together in natural sounds. Others instantiate opponency between dissimilar sets of spectrotemporal features. Such groupings might be instantiated by neurons in the auditory cortex, providing a hypothesis for mid-level neuronal computation.	en_US
dc.description.sponsorship	This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Center for Brains, Minds and Machines (CBMM), arXiv	en_US
dc.relation.ispartofseries	CBMM Memo Series;060
dc.rights	Attribution-NonCommercial-ShareAlike 3.0 United States	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/us/	*
dc.subject	Auditory Representations	en_US
dc.subject	auditory cortex	en_US
dc.subject	spectrotemporal features	en_US
dc.title	Learning Mid-Level Auditory Codes from Natural Sound Statistics	en_US
dc.type	Technical Report	en_US
dc.type	Working Paper	en_US
dc.type	Other	en_US

Files in this item

Name:: license_rdf
Size:: 1.5Kb
Format:: application/rdf+xml

View/Open

Name:: CBMM-Memo-060.pdf
Size:: 7.111Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

CBMM Memo Series

Show simple item record