Show simple item record

dc.contributor.advisorDaniel Sanchez.en_US
dc.contributor.authorZhang, Guowei,Ph. D.Massachusetts Institute of Technology.en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2021-05-24T20:23:31Z
dc.date.available2021-05-24T20:23:31Z
dc.date.copyright2021en_US
dc.date.issued2021en_US
dc.identifier.urihttps://hdl.handle.net/1721.1/130774
dc.descriptionThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February, 2021en_US
dc.descriptionCataloged from the official PDF of thesis.en_US
dc.descriptionIncludes bibliographical references (pages 109-128).en_US
dc.description.abstractComputer systems are increasingly bottlenecked by data movement, and rely on sophisticated memory hierarchies to address this issue. However, conventional memory systems suffer from poor performance on many irregular access patterns. This is because memory systems use an inexpressive interface that does not convey sufficient program semantics: they organize data in fixed-sized chunks and access data with only reads and writes. As a result, memory systems incur significant performance loss on several common patterns. In this thesis, we identify three such patterns: accesses to small data fragments suffer poor locality; concurrent updates introduce excessive traffic and serialization; and dependent reads incur long latencies that are on the critical path. To tackle these issues, this thesis proposes techniques that extend the semantics of the memory system. We apply this insight to address each of the three issues and propose solutions with different degrees of generality.en_US
dc.description.abstractCOUP and COMMTM provide general architectural support by exploiting commutative updates to reduce communication and synchronization. COUP supports strict single-instruction commutativity by extending the cache coherence protocol, while COMMTM supports multi-instruction and semantic commutativity by leveraging hardware transactional memory. Whereas COUP and COMMTM are general, HTA and GAMMA target a specific data structure and a specific application, respectively. HTA addresses the inefficiencies of small fragments in the context of hash tables. It exploits the associativity in hash tables and leverages caches to reduce runtime overheads and to improve spatial locality. GAMMA is a sparse matrix-matrix multiplication accelerator. Its novel storage idiom, FIBERCACHE, combines caching and decoupled execution to ensure low latency for dependent reads with irregular reuse. This enables GAMMA to adopt an efficient dataflow, Gustavson's algorithm, to minimize off-chip traffic.en_US
dc.description.abstractIn return, these techniques improve the performance and reduce the data movement of challenging applications significantly.en_US
dc.description.statementofresponsibilityby Guowei Zhang.en_US
dc.format.extent128 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsMIT theses may be protected by copyright. Please reuse MIT thesis content according to the MIT Libraries Permissions Policy, which is available through the URL provided.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleExtending memory system semantics to accelerate irregular applicationsen_US
dc.typeThesisen_US
dc.description.degreePh. D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.identifier.oclc1252062370en_US
dc.description.collectionPh.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Scienceen_US
dspace.imported2021-05-24T20:23:31Zen_US
mit.thesis.degreeDoctoralen_US
mit.thesis.departmentEECSen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record