On-node performance optimization of a Monte-Carlo transport code for leadership architectures
Author(s)Salcedo-Pérez, José Luis.
Massachusetts Institute of Technology. Department of Nuclear Science and Engineering.
Benoit Forget and Kord Smith.
MetadataShow full item record
The tally system in Monte Carlo neutron transport codes accounts for a significant fraction of the total execution time. This project studied the tally performance of a Monte Carlo neutron transport code (i.e., OpenMC) and implemented several optimizations to address the major bottlenecks. First, a comprehensive profiling analysis was carried out on modern Intel micro-architectures (i.e., Intel Xeon Phi and Intel Xeon Platinum 8180) to understand what hardware and settings configurations were optimal. The specific modules and subroutines that were responsible for the performance drop were also highlighted. The first round of optimizations were specific to the information that the profiling analysis provided. Both the nuclide and the reaction index searches were found to be inefficient. As a result, the two searches were improved with the implementation of direct address tables, which have a single search efficiency of O(1) and a small memory footprint.Moreover, a linear array cache was also introduced to store the following cross-sections: (n, 2n), (n, 3n), (n, 4n), (n, p), (n, [alpha]), and (n, [gamma]). These cross-sections, together with (n; fission), are indispensable to solve the transition matrix of the Bateman equations during transmutation analysis. As a result, pre-computing and storing them before tally-time eliminated redundant computations in the case a high energy particle travels through multiple fuel regions without colliding. Overall, these optimizations resulted in speedups of 2.31x and 2.15x for the Xeon Platinum and Xeon Phi, respectively. Further, this project also presents an alternative method to compute reaction rate tallies. In general, tallying all of the aforementioned seven rates through a Monte Carlo simulation can be quite expensive for realistic light water reactors.Another approach would be to collapse a very fine-group flux together with a pregenerated multigroup cross section (constructed with the same energy grid). While this approach does provide a 3x speedup in the OpenMC active cycles performance, it also introduces a considerable memory penalty. The issue is that thousands of groups are needed to accurately resolve the (n, [gamma]) rates, most notably that of 238U. This study explores a hybrid approach in which (n, [gamma]) and (n, fission) are handled with a standard reaction rate tally while the remaining reaction rates are computed through the flux tally route. This option provides more flexibility in reducing the total number of groups because the remaining reactions outside of (n, fission) and (n, [gamma]) usually have smoother shapes. Performance was tested on five benchmarks with depleted fuel and increasing geometrical complexity.Results showed that the hybrid tally method provided decent speedups ranging from 1.30x to 1.75x in the active cycles across all benchmarks. Multiple error analyses were also carried out on the proposed hybrid method; the results show that even when going as low as 300 groups, the eigenvalue is still within 100 pcm of a traditional simulation.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Thesis: S.M., Massachusetts Institute of Technology, Department of Nuclear Science and Engineering, 2019Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 83-85).
DepartmentMassachusetts Institute of Technology. Department of Nuclear Science and Engineering
Massachusetts Institute of Technology
Nuclear Science and Engineering.