Last-Meter Delivery: Solving the Unattended Delivery Challenge from Streets to Doorsteps

Xiao, Wen-Xin

Author(s)

Xiao, Wen-Xin

DownloadThesis PDF (5.062Mb)

Advisor

Larson, Kent

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

The rise of e-commerce has led to a surge in package deliveries, resulting in the proliferation of unattended delivery methods to address the "last-meter" problem – the challenge of delivering packages from the roadside or sidewalk to the customer's front door. This thesis proposes a methodology for implementing Large Language Model (LLM), and Vision Language Model (VLM) to enable delivery robots to identify the final delivery target and navigate the complex terrain from the curb to the front door. The proposed solution aims to enhance the autonomy and safety of last-mile delivery systems, addressing the "last-meter" challenge and improving the customer experience. This thesis presents a comprehensive overview of the last-meter delivery concept, aiming to bridge the gap between the roadside/sidewalk and the customer's front door. It begins by introducing the significance of last-meter delivery in the growing e-commerce industry and the challenges posed by unattended deliveries. The thesis then reviews the existing literature on autonomous and unmanned delivery systems, multimodal delivery approaches, and the application of large language models and vision language models in robotics. This research identifies the advancements and gaps in the field that the proposed methodology aims to address. The thesis primarily focuses on leveraging Large Language Models, the Segment Anything Model, and the open-source Florence-2 vision foundation model to enable the transmission of customers' delivery instructions to the final delivery target in the context of last-meter delivery. It outlines the methodology for data preparation, object detection and labeling, as well as the integration of Large Language Models to handle customer instructions and coordinate delivery target. It also describes the experimental design and methodologies employed to validate the effectiveness of the proposed system. This includes the use of a last-meter dataset and the evaluation of last-meter scene and target coordinate identification. The thesis concludes by summarizing the key findings and contributions, discussing the broader implications of the proposed methodology, and suggesting directions for future work, such as enhancing system robustness and scalability. KEYWORDS: Last-Mile Delivery, last-meter Delivery, Large Language Models (LLM), Vision Language Models (VLM), Robotics, Segment Anything Model (SAM), Open-Vocabulary Object Detection (OVD).

Date issued

2024-09

URI

https://hdl.handle.net/1721.1/157723

Department

Program in Media Arts and Sciences (Massachusetts Institute of Technology)

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses