Show simple item record

dc.contributor.authorZhang, Qiao
dc.contributor.authorAlomairy, Rabab
dc.contributor.authorWang, Dali
dc.contributor.authorGu, Zhuowei
dc.contributor.authorCao, Qinglei
dc.date.accessioned2025-12-22T20:48:50Z
dc.date.available2025-12-22T20:48:50Z
dc.date.issued2025-12-20
dc.identifier.urihttps://hdl.handle.net/1721.1/164427
dc.description.abstractGeneral Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research presents an adaptive mixed-precision GEMM framework that enables support for various precision formats at fine-grained tile and block levels, offering a reliable foundation for trustworthy mixed-precision computations. Furthermore, we leverage the PaRSEC runtime system to effectively balance workloads across diverse architectures. The performance exhibits strong scalability across both homogeneous platforms (Intel CPU-based systems and the ARM CPU-based Fugaku supercomputer) and heterogeneous systems (Nvidia V100, A100, and H100 GPU-based platforms, as well as the AMD GPU-based Frontier supercomputer). This work aims to improve computational efficiency and accuracy by bridging algorithmic innovations with hardware capabilities, fostering transformative advancements across a wide range of applications.en_US
dc.publisherSpringer Nature Singaporeen_US
dc.relation.isversionofhttps://doi.org/10.1007/s42979-025-04575-0en_US
dc.rightsArticle is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use.en_US
dc.sourceSpringer Nature Singaporeen_US
dc.titleHigh-Performance Mixed-Precision Matrix Multiplication via Tile-Centric Design on Modern Architecturesen_US
dc.typeArticleen_US
dc.identifier.citationZhang, Q., Alomairy, R., Wang, D. et al. High-Performance Mixed-Precision Matrix Multiplication via Tile-Centric Design on Modern Architectures. SN COMPUT. SCI. 7, 24 (2026).en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.relation.journalSN Computer Scienceen_US
dc.eprint.versionAuthor's final manuscripten_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2025-12-22T17:03:54Z
dc.language.rfc3066en
dc.rights.holderThe Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
dspace.embargo.termsY
dspace.date.submission2025-12-22T17:03:54Z
mit.journal.volume7en_US
mit.licensePUBLISHER_POLICY
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record