Machine Learning Dynamic Correlation in Chemical Kinetics

The kinetics of surface reactions are often described using a lattice model. Since it is expensive to propagate the configuration probabilities of the entire lattice, it is practical to consider the occupation probabilities of a typical site or a cluster of sites instead. This amounts to a moment closure approximation of the chemical master equation (CME). Unfortunately, simple closures, such as the mean-field (MF) and the pair approximation (PA), exhibit weaknesses in systems with significant long-range correlation. In this paper, we show that machine learning (ML) can be used to construct accurate moment closures in chemical kinetics, using the lattice Lotka-Volterra model (LLVM) as a model system. We trained feed-forward neural networks (FFNNs) to estimate the three-site occupation probabilities, using kinetic Monte Carlo (KMC) results at select values of rate constants and initial conditions. Given the same level of input as PA, the machine learning moment closure (MLMC) gave drastic improvements in the simulated dynamics and descriptions of the dynamical regimes throughout the parameter space. In this way, MLMC is a promising tool to interpolate KMC simulations or construct pre-trained closures that would enable researchers to extract useful insight at a fraction of the computational cost. 1


Introduction
Machine learning (ML) is an important tool in computational chemistry. On one hand, it has been used to accelerate the discovery of drugs and materials by deducing the electronic properties of molecules, 1-5 reactivities of organic compounds, 6-8 and secondary structures of proteins 9-11 using just the topologies of the molecules. On the other hand, it has been used to improve simulations by replacing the approximate right hand sides, such as density functionals, 12,13 electron densities, 14,15 and force fields, 16-19 with ML models. Meanwhile, there have been few applications of ML in chemical kinetics. We believe that ML might provide a way to improve the solution of the chemical master equation (CME).
There is a duality in what various authors mean by CME. First, there are homogeneous systems, where the state of the system is defined by the numbers of molecules, 20-24 and the positions of the molecules are not explicit variables. Second, there are heterogeneous systems, where the state of the system is defined by the configuration of a lattice. [25][26][27] Hence, not only the numbers of molecules, but also their positions are explicit variables. In both cases, CME is a system of ordinary differential equations (ODEs) that propagates the probabilties of all possible states of the system, and the computational costs are exponential. Only, the homogeneous case scales as the number of molecules raised to the number of species, whereas the heterogeneous case scales as the number of species raised to the number of molecules. In this paper, we focus on the heterogeneous case. Nonetheless, due to the mathematical commonalities, results that hold in heterogeneous systems might have counterparts in homogeneous systems, and vice versa.
Since the shear size of state space often makes it impractical to solve the full CME, moment closure approximations have been considered an affordable approach to extract qualitative insight. The kinetic equations are written in terms of n-species subsystems (homogeneous case) or n-site clusters (heterogeneous case), and the higher-order moments, which describe the interactions of the n-th order moments with the rest of the system, are approximated using a moment closure. In homogeneous systems, the most popular clo-sures are stochastic closures, such as the normal, 28,29 Poisson, 30 and log-normal closures. 31 Recently, Smadbeck and Kaznessis proposed an alternative scheme that computes the higherorder moments and their probability distribution by maximizing information entropy. 23 In heterogeneous systems, the most popular closures are the mean-field (MF) and the pair approximation (PA). [32][33][34][35] There have been attempts to go to higher-order moments, such as the triple approximation, 36 the approximate master equations, 37,38 and the cluster meanfield approximation. 26 In principle, moment closure approximations become more accurate as higher-order moments are used as the basis. However, an increase in the order is accompanied by a steep rise in the computational costs. In analogy to the empirical construction of density functionals, we believe that ML could provide a breakthrough in overcoming the complexity-accuracy trade-off of moment closure approximations, provided that it can be formulated in a way that is intuitive and accessible to the chemical community.
In this paper, we show that a simple ML architecture can be used to construct an accurate moment closure for chemical kinetics. Our choice of feed-forward neural networks (FFNNs) has both theoretical and practical relevance. On the theoretical side, FFNNs are the simplest neural networks. They are oblivious of time, nor do they have memory of the previous inputs and outputs. Indeed, they are functions that approximate the instantaneous values of the higher-order moments using the instantaneous values of the lower-order moments.
On the practical side, FFNNs are fast to train and evaluate. They might scale better to larger numbers of species and higher orders of moments. Moreover, FFNNs are already available in popular software libraries, such as TensorFlow 39 and scikit-learn. 40 Hence, it is an architecture with which many chemists are already familiar.
The remainder of this paper is organized as follows. We begin by introducing the lattice Lotka-Volterra model (LLVM) as the model system, and we explain the origin, strengths, and weaknesses of MF and PA. Then, we train FFNNs to estimate the three-site occupation probabilities, using the results of kinetic Monte Carlo (KMC) simulations at select values of rate constants and initial conditions. At the same level of input as PA, the machine learning moment closure (MLMC) can reduce the absolute and relative errors in the threesite probabilities by an order of magnitude. Furthermore, MLMC gives drastic improvements in the simulated dynamics and improved descriptions of the dynamical regimes throughout the parameter space of the model system.

Model System
The Lokta-Volterra model is a classic model system in which the activities of competing components lead to the emergence of oscillations. Originally, it was devised to describe autocatalytic chemical reactions, 41 but its application has been extended to biological systems, [42][43][44] where it had another intuitive interpretation. Our implemtnation of LLVM is given by where one often interprets O as the vacancy, A as the prey, and B as the predator. To our knowledge, there is no chemical reaction that follows this mechanism per se. However, it can be regarded as a coarse-grain approximation. In the SI, we discuss how the NO + CO reaction on the Pt(100)-(1 × 1) surface 45 might be coarse-grained on to LLVM.
Variations of LLVM have been a subject of interest in the physics community. [46][47][48] They are known to display a number of features that are insensitive to the implementation. 49 In particular, the collective activities of the prey and the predator give rise to spatiotemporal patterns, 50 along with density oscillations 51 that average out in the thermodynamic limit. 47,48 Some of the spatiotemporal patterns in our implementation are shown in Figure 1. It is interesting to see traveling wave patterns emerge. First, A grows into islands (Figures 1a   and 1b). As the islands of A expand, B begins to grow ( Figure 1c) and then proceeds to overrun the islands (Figures 1d and 1e). The cycle resets as B gives way to O (Figure 1f).
We emphasize that the individual molecules are immobile in our implementation, but their formation and consumption as a collective give rise to the apparent pursuit-evasion behavior.
Self-organization of reactants is not uncommon in surface reactions. Even the benign

Chemical Master Equation and Moment Closure Approixmation
Consider a chemical reaction on a lattice. The molecules can adsorb on a vacant site, desorb, diffuse to a neighboring site, or react with another molecule. Let p Ψ denote the probability of finding the lattice in configuration Ψ. The chemical master equation (CME) is given by where k Ψ→Φ is a sum of the elementary rate constants, if any, that would take a lattice in configuration Ψ to configuration Φ. Assuming Markovian processes and an a priori knowledge of the rate constants, CME gives an exact treatment of both static disorder (site-to-site variations that are reflected in the rate constants, k Ψ→Φ ) 57,58 and dynamic correlation (segre-gation and self-organization of reactants that manifest on the explicit lattice configurations, Ψ). 52,53 Unfortunately, the dimensionality of CME scales as S L , where S is the number of species and L is the number of sites on the lattice, making CME intractable in many systems of practical relevance.
In princple, the kinetic Monto Carlo (KMC) 20,25,59 can recover the static disorder and dynamic correlation. A stochastic simulation of the lattice amounts to sampling a trajectory through the configuration space. By averaging over multiple simulations, one can approach the full CME results. However, the computational cost of the simulations can be formidable, especially if rapid equilibrium or diffusion is involved. 25,59,60 Since the desired outcome in chemical kinetics is often an ensemble average, such as the surface coverage or the reaction rate, we are motivated to rewrite the kinetic equations in the occupation probabilities of n-site clusters (n-site probabilities) where i, j, and k are a string of typical sites on the lattice; and δ ψ i X = 1 if the occupant of the site i is ψ i = X and 0 otherwise. These n-site probabilities are special cases of moments.
The kinetic equations of one-site clusters are given by where k XY →U V is the rate constant of the elementary step,

if it exists; and
N is the number of nearest neighbors. Observe that the equations are not closed. Unless the elementary steps consist of unimolecular reactions only, the equations of n-site clusters are going to depend on information about (n+1)-site clusters. In order to create a closed system of equations, we need a prescription to approximate the higher-order moments using just the information about the lower-order moments -hence, a moment closure approximation. The simplest and the most popular closure is the mean-field approximation (MF), which neglects any correlation that might exist between the sites.
The next simplest closure, which incorporates some site-to-site correlation, is the pair approximation (PA). Consider the kinetic equations of two-site clusters With the two-site probabilities, [XY ], as the variables, the kinetic equations now depend on the three-site probabilities, [XY Z], which must be approximated in terms of the two-site probabilities. PA estimates the three-site probabilities using the definition of conditional probability Due to its simple rationale, it has been invented many times by independent workers in chemistry, 32 population biology, 33,34 and epidemiology. 35 In principle, the closed system of equations would become more accurate as higherorder moments are used as the basis of the moment closure approximation. Formally, one can interpret MF and PA as special cases of product approximations, 61 Pair Approximation and Machine Learning Moment Closure  about the short-range (two-site) correlation, it is not able to anticipate the formation of islands or traveling waves that span a large number of sites. That is, not without information about the mechanism and the nature of the correlation built into the approximation. The goal of MLMC is a smarter closure that still takes low-order moments as the input, but uses mechanism-specific information to give a more accurate output. But what is the lowest order at which the moments can be expect to be predictive of the correlation? For the case of LLVM, Figure 3 gives some insight. Since the system is undergoing non-equilibrium dynamics, it is not possible in priciple to write the n-site probabilities as functions of the k-site probabilities (k < n). Indeed, Figure 3a shows that it is not plausible to write the two-site probability,  Figures S1-S3). Therefore, we conjecture that the three-site probabilities can be written to a good approximation as functions of the two-site probabilities. A viable ML model of the three-site probabilities could be constructed using no more than the two-site probabilities as the input.

Results and Discussion
As detailed in Methods, we trained FFNNs to predict the three-site probabilities in terms of the two-site probabilities     Figure S6f). Because the fictitious ocillations are slow and sustained, the triangular region is almost invisible in the frequency plot ( Figure S7) and highlighted in the damping ratio plot ( Figure S8). We expect that more training data in that part of the parameter space would have mitigated the erroneous dynamics.
We emphasize that MLMC has not been trained to predict the dynamics. The FFNNs were optimized to predict the instantaneous three-site probabilities using the instantaneous two-site probabilities. On one hand, it is not surprising that accurate estimates of the threesite probabilities could improve the dynamics, On the other hand, it is not obvious that using the best estimates of the three-site probabilities at each instant would yield the best estimate of the overall dynamics. MLMC has conceptual similarities to proposals of ML differential equations, where a ML model is trained on time-dependent data to extract the right hand side of the underlying equation. [67][68][69] The kinetic equations in terms of moments can be regarded as a special case, where the linear terms are known and the nonlinear terms have well known properties. In the future, it would be interesting to investigate training the FFNNs to output values that yield the best estimate of the dynamics, rather than the best estimate of instantaneous values.
A weakness of MLMC is that it needs KMC data to be trained in the first place. However, we have shown that MLMC can be applied to combinations of rate constants and initial conditions to which it has not been exposed. The internal transferability suggests two practical applications of MLMC.
In the short term, MLMC could be used to interpolate KMC simulations. Often, one needs to run numerous simulations in a small region of the parameter space in order to obtain a phase diagram, locate a critical point, or fit rate constants. Considerable savings in computational costs can result if a select subset of the simulations were performed using KMC and the gaps were filled using MLMC. Indeed, the 500 × 500 lattice simulations to create Figure

Conclusion
We have explored the application of ML to derive moment closures for chemical kinetics. As demonstrated by the case of LLVM, PA exhibits weaknesses in systems with strong longrange correlation. In order to capture the long-range correlation at the same level of input as PA, we trained FFNNs to predict the three-site probabilities using the two-site probabilities.
MLMC reduced the absolute and relative errors in the three-site probabilities by an order of magnitude. Furthermore, MLMC gave drastic improvements in the simulated dynamics.
The amplitude and period in the oscillatory regime could be predicted to good accuracies, and the dynamical transitions to the non-oscillatory regime could be located to a reasonable precision. Based on these outcomes, we propose that MLMC could be used to interpolate KMC simulations or construct pre-trained closures to avoid KMC in certain systems.
In the future, we want to demonstrate MLMC on realistic models of specific chemical

Methods
The stochastic simulations were performed using rejection-free KMC 25 we chose seven combinations of rate constants that gave distinct dynamics (Table S1) We found it useful to apply a log transform y = log y to both the two-site probabilities, [XY ], and the three-site probabilities, [XY Z], so that the ML models minimize the relative error, as opposed to the absolute error. Constraining the relative error ensures that the reaction rates and hence the dynamics would remain accurate when one or more of the probabilities are small. In applying the log transform, we replaced the zero-valued two-site probabilities with small values δ = 10 −p /(2 · 500 2 ) where p = 1, 2, 3 and the zero-valued three-site probabilities with PA estimates. Even though the replacement somewhat reduces the accuracy of the ML models in themselves, we found that it improves the robustness of the chemical kinetic simulations as one or more of the probabilities approach zero. The treatment of the zeros increased the number of data points to 1,232,155. Finally, we standardized the input and output data with the mean µ = 0 and standard deviation σ = 1.
The ML models were constructed and trained using TensorFlow 1.13. 39 We trained a separate FFNN for each of the six three-site probabilities that appear in the kinetic equations.
The FFNNs shared a simple architecture: 6 → 100 → 100 → 75 → 50 → 25 → 12 → 1 units on each layer, which were fully connected and had a rectified linear unit (ReLU) activation, except the linear input and output. As a means of regularization, the data was shuffled and split into six sets, one of which was set aside as the test set, and the other five sets were utilized in a 5-fold cross-validation (CV). At each cycle of the CV, a FFNN was trained by stochastic gradient descent (SGD) on the mean squared error (in the log space) via the Adam optimizer. 70 The SGD used a batch size of 10000 and stopped at 150 iterations. At the end of the cycles, the FFNN that yielded the smallest loss on the test set was chosen as the final model. Beside the CV, no further regularization methods were employed.

Acknowledgement
This work has been supported by the U.S. Department of Energy (DE-FG02-07ER46474).

Supporting Information Available
Coarse-graining of NO + CO / Pt (100)