| dc.contributor.author | Zhang, Qiao | |
| dc.contributor.author | Alomairy, Rabab | |
| dc.contributor.author | Wang, Dali | |
| dc.contributor.author | Gu, Zhuowei | |
| dc.contributor.author | Cao, Qinglei | |
| dc.date.accessioned | 2025-12-22T20:48:50Z | |
| dc.date.available | 2025-12-22T20:48:50Z | |
| dc.date.issued | 2025-12-20 | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/164427 | |
| dc.description.abstract | General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research presents an adaptive mixed-precision GEMM framework that enables support for various precision formats at fine-grained tile and block levels, offering a reliable foundation for trustworthy mixed-precision computations. Furthermore, we leverage the PaRSEC runtime system to effectively balance workloads across diverse architectures. The performance exhibits strong scalability across both homogeneous platforms (Intel CPU-based systems and the ARM CPU-based Fugaku supercomputer) and heterogeneous systems (Nvidia V100, A100, and H100 GPU-based platforms, as well as the AMD GPU-based Frontier supercomputer). This work aims to improve computational efficiency and accuracy by bridging algorithmic innovations with hardware capabilities, fostering transformative advancements across a wide range of applications. | en_US |
| dc.publisher | Springer Nature Singapore | en_US |
| dc.relation.isversionof | https://doi.org/10.1007/s42979-025-04575-0 | en_US |
| dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
| dc.source | Springer Nature Singapore | en_US |
| dc.title | High-Performance Mixed-Precision Matrix Multiplication via Tile-Centric Design on Modern Architectures | en_US |
| dc.type | Article | en_US |
| dc.identifier.citation | Zhang, Q., Alomairy, R., Wang, D. et al. High-Performance Mixed-Precision Matrix Multiplication via Tile-Centric Design on Modern Architectures. SN COMPUT. SCI. 7, 24 (2026). | en_US |
| dc.contributor.department | Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory | en_US |
| dc.relation.journal | SN Computer Science | en_US |
| dc.eprint.version | Author's final manuscript | en_US |
| dc.type.uri | http://purl.org/eprint/type/JournalArticle | en_US |
| eprint.status | http://purl.org/eprint/status/PeerReviewed | en_US |
| dc.date.updated | 2025-12-22T17:03:54Z | |
| dc.language.rfc3066 | en | |
| dc.rights.holder | The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. | |
| dspace.embargo.terms | Y | |
| dspace.date.submission | 2025-12-22T17:03:54Z | |
| mit.journal.volume | 7 | en_US |
| mit.license | PUBLISHER_POLICY | |
| mit.metadata.status | Authority Work and Publication Information Needed | en_US |