Learning with a Wasserstein loss
Author(s)Araya-Polo, Mauricio; Frogner, Charles Albert; Zhang, Chiyuan; Mobahi, Hossein; Poggio, Tomaso A
MetadataShow full item record
Learning to predict multi-label outputs is challenging, but in many problems there is a natural metric on the outputs that can be used to improve predictions.In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact Wasserstein distance is costly, recent work has described a regularized approximation that is efficiently computed. We describe an efficient learning algorithm based on this regularization, as well as a novel extension of the Wasserstein distance from probability measures to unnormalized measures. We also describe a statistical learning bound for the loss. The Wasserstein loss can encourage smoothness of the predictions with respect to a chosen metric on the output space. We demonstrate this property on a real-data tag prediction problem, using the Yahoo Flickr Creative Commons dataset, outperforming a baseline that doesn't use the metric.
DepartmentMcGovern Institute for Brain Research at MIT. Center for Brains, Minds, and Machines; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS 2015)
Frogner, Charlie et al. "Learning with a Wasserstein loss." Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS 2015), December 7-12 2015, Montreal, Canada, MIT Press, December 2015 © 2015 MIT Press Cambridge, MA, USA
Author's final manuscript