MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot

Author(s)
Lykov, Artem; Litvinov, Mikhail; Konenkov, Mikhail; Prochii, Rinat; Burtsev, Nikita; Abdulkarim, Ali Alridha; Bazhenov, Artem; Berman, Vladimir; Tsetserukou, Dzmitry; ... Show more Show less
Thumbnail
Download3610978.3641080.pdf (3.355Mb)
Publisher Policy

Publisher Policy

Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.

Terms of use
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Metadata
Show full item record
Abstract
This paper introduces CognitiveDog, a pioneering development of quadruped robot with Large Multi-modal Model (LMM) that is capable of not only communicating with humans verbally but also physically interacting with the environment through object manipulation. The system was realized on Unitree Go1 robot-dog equipped with a custom gripper and demonstrated autonomous decision-making capabilities, independently determining the most appropriate actions and interactions with various objects to fulfill user-defined tasks. These tasks do not necessarily include direct instructions, challenging the robot to comprehend and execute them based on natural language input and environmental cues. The paper delves into the intricacies of this system, dataset characteristics, and the software architecture. Key to this development is the robot's proficiency in navigating space using Visual-SLAM, effectively manipulating and transporting objects, and providing insightful natural language commentary during task execution. Experimental results highlight the robot's advanced task comprehension and adaptability, underscoring its potential in real-world applications. The dataset used to fine-tune the robot-dog behavior generation model is provided at the following link: huggingface.co/datasets/ArtemLykov/CognitiveDog_dataset
Date issued
2024-03-11
URI
https://hdl.handle.net/1721.1/159351
Publisher
ACM|Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction
Citation
Lykov, Artem, Litvinov, Mikhail, Konenkov, Mikhail, Prochii, Rinat, Burtsev, Nikita et al. 2024. "CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot."
Version: Final published version
ISBN
979-8-4007-0323-2

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.