Characterizing and Improving Resilience of Accelerators to Memory Errors in Autonomous Robots
Author(s)
Shah, Deval; Xue, Zi Yu; Pattabiraman, Karthik; Aamodt, Tor
Download3627828.pdf (7.936Mb)
Publisher Policy
Publisher Policy
Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.
Terms of use
Metadata
Show full item recordAbstract
Motion planning is a computationally intensive and well-studied problem in autonomous robots. However, motion planning hardware accelerators (MPA) must be soft-error resilient for deployment in safety-critical applications, and blanket application of traditional mitigation techniques is ill-suited due to cost, power, and performance overheads. We propose Collision Exposure Factor (CEF), a novel metric to assess the failure vulnerability of circuits processing spatial relationships, including motion planning. CEF is based on the insight that the safety violation probability increases with the surface area of the physical space exposed by a bit-flip. We evaluate CEF on four MPAs. We demonstrate empirically that CEF is correlated with safety violation probability, and that CEF-aware selective error mitigation provides 12.3×, 9.6×, and 4.2× lower dangerous Failures-In-Time rate on average for the same amount of protected memory compared to uniform, bit-position, and access-frequency-aware selection of critical data. Furthermore, we show how to employ CEF to enable fault characterization using 23,000× fewer fault injection (FI) experiments than exhaustive FI, and evaluate our FI approach on different robots and MPAs. We demonstrate that CEF-aware FI can provide insights on vulnerable bits in an MPA while taking the same amount of time as uniform statistical FI. Finally, we use the CEF to formulate guidelines for designing soft-error resilient MPAs.
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
ACM Transactions on Cyber-Physical Systems
Publisher
ACM
Citation
Shah, Deval, Xue, Zi Yu, Pattabiraman, Karthik and Aamodt, Tor. "Characterizing and Improving Resilience of Accelerators to Memory Errors in Autonomous Robots." ACM Transactions on Cyber-Physical Systems.
Version: Final published version
ISSN
2378-962X