Show simple item record

dc.contributor.authorLykov, Artem
dc.contributor.authorLitvinov, Mikhail
dc.contributor.authorKonenkov, Mikhail
dc.contributor.authorProchii, Rinat
dc.contributor.authorBurtsev, Nikita
dc.contributor.authorAbdulkarim, Ali Alridha
dc.contributor.authorBazhenov, Artem
dc.contributor.authorBerman, Vladimir
dc.contributor.authorTsetserukou, Dzmitry
dc.date.accessioned2025-06-06T16:11:34Z
dc.date.available2025-06-06T16:11:34Z
dc.date.issued2024-03-11
dc.identifier.isbn979-8-4007-0323-2
dc.identifier.urihttps://hdl.handle.net/1721.1/159351
dc.description.abstractThis paper introduces CognitiveDog, a pioneering development of quadruped robot with Large Multi-modal Model (LMM) that is capable of not only communicating with humans verbally but also physically interacting with the environment through object manipulation. The system was realized on Unitree Go1 robot-dog equipped with a custom gripper and demonstrated autonomous decision-making capabilities, independently determining the most appropriate actions and interactions with various objects to fulfill user-defined tasks. These tasks do not necessarily include direct instructions, challenging the robot to comprehend and execute them based on natural language input and environmental cues. The paper delves into the intricacies of this system, dataset characteristics, and the software architecture. Key to this development is the robot's proficiency in navigating space using Visual-SLAM, effectively manipulating and transporting objects, and providing insightful natural language commentary during task execution. Experimental results highlight the robot's advanced task comprehension and adaptability, underscoring its potential in real-world applications. The dataset used to fine-tune the robot-dog behavior generation model is provided at the following link: huggingface.co/datasets/ArtemLykov/CognitiveDog_dataseten_US
dc.publisherACM|Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interactionen_US
dc.relation.isversionofhttps://doi.org/10.1145/3610978.3641080en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceAssociation for Computing Machineryen_US
dc.titleCognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Roboten_US
dc.typeArticleen_US
dc.identifier.citationLykov, Artem, Litvinov, Mikhail, Konenkov, Mikhail, Prochii, Rinat, Burtsev, Nikita et al. 2024. "CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot."
dc.identifier.mitlicensePUBLISHER_POLICY
dc.identifier.mitlicensePUBLISHER_POLICY
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/NonPeerRevieweden_US
dc.date.updated2025-06-01T07:46:16Z
dc.language.rfc3066en
dc.rights.holderThe author(s)
dspace.date.submission2025-06-01T07:46:17Z
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record