dc.contributor.author | Mao, Junhua | |
dc.contributor.author | Xu, Wei | |
dc.contributor.author | Yang, Yi | |
dc.contributor.author | Wang, Jiang | |
dc.contributor.author | Huang, Zhiheng | |
dc.contributor.author | Yuille, Alan L. | |
dc.date.accessioned | 2015-12-11T22:15:05Z | |
dc.date.available | 2015-12-11T22:15:05Z | |
dc.date.issued | 2015-05-07 | |
dc.identifier.uri | http://hdl.handle.net/1721.1/100198 | |
dc.description.abstract | In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated according to this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, the m-RNN model can be applied to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. | en_US |
dc.description.sponsorship | This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF - 1231216. | en_US |
dc.language.iso | en_US | en_US |
dc.publisher | Center for Brains, Minds and Machines (CBMM), arXiv | en_US |
dc.relation.ispartofseries | CBMM Memo Series;033 | |
dc.rights | Attribution-NonCommercial 3.0 United States | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc/3.0/us/ | * |
dc.subject | multimodal Recurrent Neural Network (m-RNN) | en_US |
dc.subject | Artificial Intelligence | en_US |
dc.subject | Computer Language | en_US |
dc.title | Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) | en_US |
dc.type | Technical Report | en_US |
dc.type | Working Paper | en_US |
dc.type | Other | en_US |
dc.identifier.citation | arXiv:1412.6632 | en_US |