Article A Dual-Branch Coupled Fourier Neural Operator for High-Resolution Multi-Phase Flow Modeling in Porous Media Hassan Al Hashim 1,* , Odai Elyas 2 and John Williams 2 1 Department of Computational Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 2 Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA * Correspondence: hwhashim@mit.edu Abstract This paper investigates a physics-informed surrogate modeling framework for multi-phase flow in porous media based on the Fourier Neural Operator. Traditional numerical sim- ulators, though accurate, suffer from severe computational bottlenecks due to fine-grid discretizations and the iterative solution of highly nonlinear partial differential equations. By parameterizing the kernel integral directly in Fourier space, the operator provides a discretization-invariant mapping between function spaces, enabling efficient spectral con- volutions. We introduce a Dual-Branch Adaptive Fourier Neural Operator with a shared Fourier encoder and two decoders: a saturation branch that uses an inverse Fourier trans- form followed by a multilayer perceptron and a pressure branch that uses a convolutional decoder. Temporal information is injected via Time2Vec embeddings and a causal temporal transformer, conditioning each forward pass on step index and time step to maintain consistent dynamics across horizons. Physics-informed losses couple data fidelity with residuals from mass conservation and Darcy pressure, enforcing the governing constraints in Fourier space; truncated spectral kernels promote generalization across meshes without retraining. On SPE10-style heterogeneities, the model shifts the infinity-norm error mass into the 10−2 to 10−1 band during early transients and sustains lower errors during pseudo- steady state. In zero-shot three-dimensional coarse-to-fine upscaling from 30× 110× 5 to Academic Editor: Vittorio Di Federico 60× 220× 5, it attains R2 = 0.90, RMSE = 4.4× 10−2, and MAE = 3.2× 10−2, with more Received: 21 October 2025 than 90% of voxels below five percent absolute error across five unseen layers, while the Revised: 16 November 2025 end-to-end pipeline runs about three times faster than a full-order fine-grid solve and Accepted: 19 November 2025 preserves water-flood fronts and channel connectivity. Benchmarking against established Published: 23 November 2025 baselines indicates a scalable, high-fidelity alternative for high-resolution multi-phase flow Citation: Al Hashim, H.; Elyas, O.; simulation in porous media. Williams, J. A Dual-Branch Coupled Fourier Neural Operator for Keywords: Fourier Neural Operator; physics-informed machine learning; surrogate High-Resolution Multi-Phase Flow modeling; groundwater flow; heterogeneous porous media; Darcy flow; aquifer flow Modeling in Porous Media. Water 2025, and transport simulation; aquifer; hydraulic conductivity 17, 3351. https://doi.org/10.3390/ w17233351 Copyright: © 2025 by the authors. Licensee MDPI, Basel, Switzerland. 1. Introduction This article is an open access article distributed under the terms and Multi-phase flow in porous media underpins critical applications in subsurface engi- conditions of the Creative Commons neering—from groundwater remediation to carbon sequestration. The governing equations, Attribution (CC BY) license nonlinear partial differential equations (PDEs) that capture the intricate interactions among (https://creativecommons.org/ fluid phases and the heterogeneous porous structure, are notoriously difficult to solve licenses/by/4.0/). Water 2025, 17, 3351 https://doi.org/10.3390/w17233351 Water 2025, 17, 3351 2 of 20 numerically. Traditional solvers, whether implemented in commercial software or open- source tools, require extremely fine discretizations to capture the essential dynamics, which in turn leads to prohibitive computational costs and long turnaround times, especially when conducting multi-parameter studies or producing high-resolution predictions. While deep learning has revolutionized fields such as imaging, speech recognition, and natural language processing [1], the application of machine learning techniques that explicitly incorporate physical laws is relatively new. Early contributions such as the physics-informed neural networks (PINNs) introduced [2] and later refined in works like those of [3,4] have demonstrated that embedding governing equations into the training process can guide the solution toward physically consistent behavior. Yet, these methods have often yielded modest gains when compared with the dramatic successes observed in other areas of deep learning. A particularly transformative approach comes from the emerging field of operator learning, which seeks to approximate mappings between infinite-dimensional function spaces. In this context, the Fourier Neural Operator (FNO) introduced [5] and extended [6] represents a paradigm shift. By reformulating the kernel integral in Fourier space and truncating the high-frequency components, FNO achieves spectral convolution that is both computationally efficient and inherently suited to enforcing physical constraints. Crucially, its invariance with respect to discretization allows the learned operator to generalize seamlessly across arbitrary grids without requiring re-training—a property of immense value for high-resolution simulations. In porous media flow, where multi-scale heterogeneities and strong nonlinearities prevail, integrating physics-informed loss terms into the FNO framework (as demonstrated in physics-informed neural operators [7,8]) has proven essential. This integration ensures that the surrogate model rigorously adheres to the underlying PDEs, thereby reducing the computational effort needed to achieve accurate predictions. For example, implementations on platforms such as NVIDIA Modulus have shown that FNO-based approaches can reduce simulation times by up to three orders of magnitude while preserving accuracy across different discretizations [9]. More recently, several studies have adapted FNO- style architectures specifically to multiphase flow in porous media and related subsurface applications, including enhanced multiphase FNO variants, residual-corrected neural operators for nonlinear PDE inverse problems, and operator-based surrogates for field- scale reservoir simulations [10–13]. However, existing applications of Fourier Neural Operators to porous-media flow still face important limitations. Most studies focus on single-phase or weakly coupled settings and treat only one state variable (typically pressure or saturation), thereby neglecting the strong two-way coupling that characterizes multiphase displacement. Purely data-driven FNOs also tend to oversmooth sharp saturation fronts and underestimate channelized flow in highly heterogeneous formations such as SPE10, particularly during early transients. In many implementations, time is handled as a simple scalar feature or as an additional spatial coordinate, which can degrade accuracy over long horizons and lead to drift between transient and pseudo-steady regimes. Moreover, existing FNO surrogates rarely enforce PDE residuals during training, so saturation and pressure predictions may violate mass balance and Darcy’s law, reducing robustness when extrapolating across injection scenarios. This work presents an advanced framework for applying FNO to the physics-informed simulation of multi-phase flow in porous media. Building on a dual-branch formulation (DB-AFNO), the physics-based loss functions are integrated with different temporal encod- ings (e.g., Time2Vec, TFT/TST) to honor the governing PDE constraints in both saturation and pressure fields. Our comprehensive computational analysis demonstrates that DB- AFNO outperforms conventional, purely data-driven FNO approaches across diverse tracer Water 2025, 17, 3351 3 of 20 transport configurations, reducing errors during early transient phases and maintaining ac- curacy in later pseudo-steady states. In particular, DB-AFNO with Time2Vec and TFT/TST concentrates the majority of saturation predictions in the lower error ranges, preserving critical flow structures that conventional FNO often oversmooth. Blind testing on top and bottom layers of the 10th SPE Comparative Solution Project [14] highlights the framework’s potential for scalable, high-resolution simulations without incurring the computational costs of traditional full-order solvers. The uniqueness of this approach lies in explicitly coupling the two PDE variables using a multitask, time-aware architecture. While conventional FNOs typically address a single field, our model retains the core spectral convolution operations and extends them via key modifications. First, a shared spectral encoder predicts both pressure and saturation, extracting spatial features that naturally encode interdependencies. Second, two distinct time encoders—one for saturation and one for pressure—transform the scalar time t into high-dimensional representations that capture differing temporal sensitivities while preserving mutual spatial information. Finally, after decoding via separate branches, the outputs are fused through an additional convolutional layer that enforces consistency between the predictions through cross-information exchange. 2. Methods This section presents a technical description of the problem using the dynamical state- space framework. The focus of this formulation is on the incompressible and immiscible displacement of two phases within a porous media. More recently, the same principles have been extended to carbon sequestration efforts, specifically in the displacement of brine by CO2 [15]. In this work, we adopt a Darcy-scale, incompressible and immiscible two-phase formulation on a fixed Cartesian grid. The two phases are denoted wetting and non-wetting, with saturations Sw and Snw and corresponding pressures Pw and Pnw. The physical domain Ω is discretized into N control volumes on a structured grid, and the primary unknowns collected in the state vector are the cell-wise saturation and pressure fields. The fluid system is isothermal, with constant viscosities and densities, and rock properties (porosity and absolute permeability) are time-invariant and defined per grid cell. Flow is governed by Darcy’s law under a single-continuum representation, without explicit geomechanical deformation, fracture propagation, or matrix–fracture transfer terms, and without capillary- pressure hysteresis. Gravity is included through a constant acceleration vector, while molecular diffusion and dispersion are neglected so that transport is driven by advection and sources/sinks only. Under these assumptions, the saturation constraint reduces to Sw + Snw = 1, and a monotone capillary-pressure law Pc(Sw) together with Corey-type relative permeability functions krψ(Sψ) closes the system. The governing equations fully describe the displacement process considered here. Gas-phase transport, solution-gas liberation, and evaporation/condensation processes are neglected; consequently, no gas saturation or gas pressure variables are introduced into the state vector, and all volumetric source terms represent injection and production of the two liquid phases only. A concise summary of the formulation is provided in Tables 1–3. Table 1. State-Space Variables for Multiphase Flow in Porous Media. Component Description/Equation State vector x x = [ Sw1, Snw1, . . . , SwN , SnwN ] Parameter vector p p = [ ϕ, K, µw, µnw, ρw, ρnw, Pcap, M ] Excitation vector u(t) u(t) = [ qw1(t), qnw1(t), . . . , qwN(t), qnwN(t) ] Output vector y y = [ Sw1, Snw1, P1, . . . , SwN , SnwN , PN ] Water 2025, 17, 3351 4 of 20 Table 2. Governing Equations for Multiphase Flow in Porous Media. Component Description/Equation Mass conservation ∂(ϕSψ) +∇· (vψ) = qψ, ψ ∈ {w, nw} ∂t Kk ( ) Darcy’s law rψvψ = − ∇Pψ − ρψg , ψ ∈ {w, nw} µψ Saturation constraint Sw + Snw = 1 Capillary pressure Pcψ = Pψ − Pw, ψ ∈ {nw} ∗ 2 ∗ 2 Relative permeabilities (s ) (1− s ) ∗ s− swcλw(s) = , λnw(s) = , s = µw µnw 1− snwr − swc Effective mobility krψ = f (Sψ), ψ ∈ {w, nw} Table 3. Discrete Saturation Dynamics and Coupling Terms. Component Description/Equation ( ) ∂S Saturation Dynamics ψ K krψ ( ) qψ = −∇· ∇Pψ − ρψg + ∂t µψ ϕ ∂Sψ ( ) qψ or ϕ +∇· fψ(s) v = ∂t ρψ ( )−1 Transmissibility, T ∆xT 2|A | i ∆xj ij = ij + λi,ij λj,ij Pressure Equation Solve T P = q (Two-Point Flux Approx.) ∂S Coupling ψ = diag( fw1, . . . , fwN) A + [ fi1, . . . , ft iN ], ∂ where A captures velocity interactions and source scaling by local PV. Notes: In the above table, Swi and Snwi denote the saturations/concentration of the wetting and non-wetting phases in the i-th grid block, respectively; ϕ is the porosity; K represents permeability; µw and µnw are the viscosities of the wetting and non-wetting phases; ρw and ρnw are their densities; Pcap is the capillary pressure; and M includes any additional parameters. The excitation vector u(t) consists of injection/production rates qwi(t) and qnwi(t). The output vector y pairs saturation and pressure measurements. Mass conservation (qψ) and Darcy’s law (with velocity vψ) govern flow, while the saturation constraint and capillary pressure ensure physical consistency. Relative permeabilities λw(s) and λnw(s) (with normalized saturation s ∗) and effective mobility krψ = f (Sψ) model fluid transport properties. The transmissibility matrix T (computed via TPFA) couples the pressure field P to the source q, and the pressure-to-saturation coupling (via matrix A) reflects the influence of the velocity field and local pore volume (PV) on saturation evolution. The above formulation was coded in Python using Pytorch and Cuda to allow for interoperable transfer between the simulator and the neural operator architecture. In the following sections, we will present the numerical methods used for the simulation part along with the neural operator architectures. 2.1. Model Description In this study, we used the SPE10 Model 2 dataset [14], an open source benchmark known to evaluate up-scaling methods for aquifer flow and tracer transport. This is used as an analogue to an underground water aquifer for tracer transport. This data set is composed of a Brent sequence mapped on a Cartesian grid with dimensions of 60× 220× 85, resulting in 1,122,000 cells. The model encompasses two distinct formations: the shallow-marine Tarbert formation occupying the upper 35 layers, characterized by relatively smooth perme- ability, and the fluvial Upper-Ness formation constituting the lower 50 layers, noted for its significant permeability variations spanning 8 to 12 orders of magnitude. These formations Water 2025, 17, 3351 5 of 20 present markedly different permeability structures, as highlighted in Figure 1, where the model is inverted to emphasize the heterogeneity within the Upper-Ness formation. This complex permeability and porosity distribution poses a rigorous challenge, making it an ideal test case for operator learning. Figure 1. Permeability distributions in the SPE10 Model 2. (A) The left panel shows log(Kh) for the static model. (B) The right panel shows log(Kv) for the static model. The color scale indicates the permeability values, with red representing high permeability and blue representing low permeability. 2.2. Numerical Methods We have implemented a framework that incorporates an adaptive trapezoidal integra- tion scheme. The Generalized Conjugate Residual (GCR) method serves as a robust solver for linear systems within each Newton iteration, providing efficient handling of sparse matrix structures and enhancing the convergence characteristics of the overall method. The Newton’s algorithm, integrated with the GCR solver, is designed to leverage its fast convergence properties, significantly reducing the computational overhead associated with direct methods like LU decomposition, while maintaining the capacity for handling nonlinear systems effectively. This integration of the adaptive trapezoidal method with iterative linear solvers is crucial in our simulations, striking an optimal balance between computational efficiency and the accuracy of transient dynamic responses in complex multiphase flows. There are two encountered main challenges in this work: (1) building a stable model to act as a base for our neural operator implementation and (2) ensuring that the model is computationally efficient. To circumvent these challenges, we have implemented several advanced numerical techniques and optimization methods that enhance both the stability and efficiency of the model. These techniques include the adaptive dynamic stepping scheme, the use of GCR with a banded LU preconditioner, and the employment of automatic differentiation for efficient Jacobian computation. 2.2.1. Preconditioned Iterative Solvers Iterative methods were deemed to be more suitable for our approach due to the ability to predetermine the accuracy needed for phase saturations. However, iterative methods, specifically GCR, may not be efficient for sparse matrices out-of-the-box. Therefore, using a preconditioner is important to achieve fast convergence while having control over the desired accuracy. Water 2025, 17, 3351 6 of 20 Banded LU/ILU preconditioners are particularly suitable for systems where the ma- trix A exhibits a banded structure [16]. This approach is advantageous for large sparse systems with a banded pattern, as it significantly reduces the fill-in compared to a full LU decomposition, thereby preserving the computational efficiency. (U−1L−1κ A) < κ(A), for banded A. (1) Alternatively, when the Jacobian matrix A is symmetric and positive-definite, a Cholesky decomposition can be used for preconditioning [17]. The preconditioned system is solved via L−T L−1 Ax = L−T L−1b, (2) Hence, multi-grid preconditioners are also common for fluid flow problems, however, Python does not support GPU direct implement similar to PyAMG on the CPU therefore it was not included in our benchmark. The choice between banded LU and Cholesky preconditioning depends on the structure and properties of the Jacobian matrix, Cholesky preferred for symmetric positive-definite systems because of its computational efficiency. In our model, we found that ILU is more stable for our Neumann boundary condition pressure solver. 2.2.2. Adaptive Dynamic Stepping: In the adaptive time-stepping scheme implemented within the simulation framework, the trapezoidal method is utilized to preserve numerical stability without the need for the implementation of CFL, particularly crucial in the context of stiff systems encountered in multiphase flow simulations. The integration process begins with a prediction step that takes advantage of the explicit Forward Euler method to approximate saturations at the subsequent timestep, denoted as Spredictor. This preliminary prediction is calculated according to the following equation: ( ) dS Spredictor = Scurrent + ∆t · dt current where Scurrent represents the current saturation state, ∆t signifies the current timestep,( ) and dSdt is the rate of change in saturation evaluated in the current state. Thiscurrent predictor step serves as an initial approximation for the subsequent implicit corrector step, which refines this estimate to enhance stability and accuracy. The corrector phase is carried out using an implicit trapezoidal integration, where the corrected saturation Snext is determined by resolving the nonlinear equation: (( ) ( ) ) ∆t dS dS Snext − Scurrent − + = 0 2 dt current dt next ( ) Here, dSdt denotes the rate of saturation change evaluated in the next predictednext state. This equation is solved iteratively using the Newton–Raphson method. To dynamically adjust the timestep ∆t, an error estimation mechanism is incorporated that compares the norm of the discrepancy between the predictor and corrector solutions against a predefined tolerance. This adaptive time-step control is vital for managing computational efficiency and accuracy throughout varying dynamics of the modeled system. If the error estimate exceeds the tolerance threshold, ∆t is reduced to enhance resolution; conversely, if the error is significantly below the tolerance, ∆t is increased to expedite computations without sacrificing significant accuracy.In the context of dynamically regulating the time step ∆t within numerical simulations, a sophisticated approach is required to ensure stability and precision under varying conditions. The reduction factor Water 2025, 17, 3351 7 of 20 for adjusting ∆t can be mathematically defined using an exponential function, providing a smooth and continuous regulation mechanism. The reduction factor R for the time step ∆t is given by the following equation: R(∆t) = α + (β− α) exp(−γ∆t) where α is the minimum reduction factor, β is the maximum reduction factor and γ is a scaling parameter controlling the steepness of the exponential decay. In this formulation, as ∆t increases, the reduction factor R approaches the lower bound α, whereas as ∆t decreases, R approaches the upper bound β. To dynamically adjust the time step ∆t, the new time step ∆tnew is defined as ( ) ∆tnew = max ∆t× R(∆t), ∆tmin , where ∆tmin is the minimum allowable step size, preventing excessively small steps that would be computationally inefficient. The stability constraints for ∆t are derived from the spectral properties of the system’s Jacobian J, using the inverse of its maximum eigenvalue, 1 , and a more conservative bound, 0.1 . These limits ensure that timestep updates λmax(J) λmax(J) remain within a range that preserves the numerical stability of the integration method for multiphase flow. All pseudocode listings for the above algorithms, Algorithms A1–A4, are provided in Appendix A. 2.3. Neural Operator Architecture We use a DB–AFNO neural operator that fuses spectral, spatial, and physics-based information to predict mean saturation Ŝ ∈ RNx×Ny and variance N ×Nσ2 ∈ R x y . Static geological properties—porosity ϕ(x), permeability k(x)—initial states Pinitial, Sinitial, and phase viscosities µw, µnw are combined with temporal encoding via Time2Vec and a causal transformer that conditions each step on the time index and ∆t. Fourier layers capture the coupled space–time response, and training minimizes a composite loss that blends data misfit with PDE residuals enforcing mass conservation and Darcy flow. In this way, the network advances the state in time and returns high-resolution, physically consistent pressure P and saturation S fields at arbitrary query times, as summarized in Figure 2. Figure 2. physics-informed Fourier neural-operator framework integrating static geological inputs, dynamic well controls, and temporal encoders with Fourier operator blocks to predict subsurface pressure and saturation fields. Solid boxes denote trainable neural-network modules (encoders, Fourier blocks, and decoder), while dashed boxes represent physical input and output fields. Water 2025, 17, 3351 8 of 20 2.3.1. Data Preparation The data preparation pipeline for the physics-informed Fourier Neural Operator framework integrates simulation outputs—including grid parameters, K fields, well con- figurations, initial saturation distributions, and fluid properties—stored in serialized files. The raw data is reformatted for a consistent spatial representation; for example, the K tensor is converted to an effective scalar field via magnitude computation. Temporal dynamics are captured using a multi-frequency positional encoding with sine and cosine functions over logarithmically spaced scales, which is then spatially broadcast and concatenated with static input channels to form a comprehensive feature tensor. Global statistical measures (mean and standard deviation) are computed over the entire dataset and applied to both input and target fields to standardize the data, reducing numerical variability and aiding convergence during training. Early time steps are replicated to better represent rapid transients. Over 1000 simulation runs at half fidelity were performed, with the data partitioned 80% for training and 20% for validation, plus additional cases reserved for blind testing. These runs span a wide range of heterogeneous SPE10 Model 2 realizations, as well as idealized cases with spatially uniform permeability and porosity that are included explicitly in the training set. Across all runs, we vary initial water saturations, viscosity ratios, well locations, and in- jection schedules, resulting in substantial variability in flood-front shapes, breakthrough times, and recovery factors. A statistical overview of the key petrophysical variables is provided in Figure 3, and descriptive statistics are listed in Table 4, demonstrating that the training set covers a broad portion of the oil–water displacement behavior within this benchmark setting. This process is implemented using a modular Python framework built on PyTorch (version 2.9.1, CUDA 12.6), supporting efficient file handling, dynamic sample generation, and mini-batch processing for large-scale data, with the simulations executed on hardware platforms equipped with RTX Titan GPUs. Figure 3. Statistical analysis of key training parameters. Top row (left to right): permeability distribution, log(Kx), in log(mD); porosity, ϕ (fraction); and initial water saturation, Sw (−). Middle row: saturation evolution over normalized time, t/ttotal (blue curves: individual realizations; thick red curve: ensemble mean S̄w); normalized well locations with X, Y ∈ [0, 1] (orange markers: wells; blue frame: domain); and wetting-phase viscosity, µ (cP) (bars: histogram; dashed green line: smoothed distribution). Bottom row: non-wetting-phase viscosity, µ (cP) (bars: histogram; dashed orange line: smoothed distribution). Water 2025, 17, 3351 9 of 20 Table 4. Statistical summary of petrophysical properties and fluid parameters. For each parameter, N is the number of samples, µ is the mean, σ is the standard deviation, and Min/Max denote the range of values. Parameter Unit N µ σ Min/Max Kx mD 2.89× 106 3.86× 102 1.16× 103 5.08× 10 −3/3.98× 104 ϕ - 2.89× 106 1.92× 10−1 8.60× 10−2 1.00× 10−1/5.00× 10−1 Sw,init - 2.89× 10 6 1.97× 10−1 5.81× 10−2 1.00× 10−1/3.00× 10−1 µ cP 8.75× 102 3.48× 10−1 8.61× 10−2 2.00× 10−1w /5.00× 10 −1 µ 2nw cP 8.75× 10 2.97 5.62× 10−1 2.00/4.00 2.3.2. Pure Data-Driven Approach The model processes inputs X via: ( ) Z0 = InputProj X⊕F −1(W⊙F (G)) , where F and F−1 denote Fourier and inverse Fourier transforms, W are learnable spectral weights, and G represents learned Fourier features. Subsequent layers combine spectral, spatial, and physics-guided paths: ( ) 2 ∑ (l)Zl+1 = GELU wk Pk(Zl) + Zl , l ∈ {1, ..., L}, k=1 where P1, and P2 correspond to spectral convolution and spatial depthwise convolu- tion, respectively. 2.3.3. Dual Output Approach In this physics-informed multi-task framework, the model jointly predicts transient pressure (P) and saturation (S) fields governed by two-phase flow dynamics. ( ) ( ) K, wells(q), Si, Pi, µw, µnw, ϕ, t 7−→ Ŝ(x, t), P̂(x, t) . The composite loss function is designed to enforce both data fidelity and the underly- ing physical constraints through a dual-task formulation. It is defined as ( ) L = ∥Ŝ− S 2true∥ + ∥P̂− P ∥ 2 + R2 2true α S +RP , ︸ ︷︷ ︸ ︸ ︷︷ ︸ Data terms PDE residuals where α is a weighting hyperparameter. The PDE residuals are expressed as ( ) ∂Ŝ RS = +∇ · f (Ŝ) v̂ , ∂t ( ) RP = ∇ · λt(Ŝ)K∇P̂ − q. λt(Ŝ)K v̂ = − ∇P̂, ϕ In this formulation, pressure gradients drive the propagation of the saturation front via the nonlinear interaction expressed by the product f (S)λt(S), while porosity ϕ modu- lates the effective velocity in the aquifer. This dual-output approach, therefore, not only minimizes discrepancies between the predicted and true fields but also rigorously enforces the physical laws governing two-phase flow in porous media. Water 2025, 17, 3351 10 of 20 For this coupling, handling the temporal part becomes a necessity. The temporal part of the PDE was tackled by incorporating a temporal representation into the input features with the goal of capturing both short-term variations and long-term trends. In our work, we consider three complementary approaches: Temporal Sinusoidal Encoding Temporal sinusoidal encoding relies on fixed periodic functions to map a scalar time t into a high-dimensional vector. At time step t, the encoding is defined as [ ( ) ( ) ( ) ( )] t t t t Pt = sin , cos , . . . , sin , cos , λ1 λ1 λd/2 λd/2 where λ = 104i/di and d is the embedding dimension. This encoding is then concate- nated with the static input features xt (e.g., permeability, well locations) and the previous saturation St−1 to form zt = [St−1, xt, Pt]. This approach, introduced in the Transformer architecture [18], remains a popular and effective means to provide temporal context. Time2Vec Time2Vec is a learnable temporal encoding method that transforms the scalar time t into a vector composed of one linear component and several periodic components. The en- coding is given by T(t) = [a0 t + b0, sin(a1 t + b1), . . . , sin(ad−1 t + bd−1)], where ai and bi are learnable parameters. By learning these parameters from data, Time2Vec is capable of capturing complex, nonlinear temporal patterns. This method has demon- strated advantages over fixed encodings in various time-series applications [19]. Transformer-Based Temporal Encoding Transformer-based temporal encoding leverages self-attention mechanisms to gen- erate context-aware representations of time. Dedicated time-series transformer architec- tures—such as the Temporal Fusion Transformer (TFT) [20] and the Time-series-specific Transformer family [21–23]—often incorporate gating mechanisms and variable selec- tion networks to enhance stability and interpretability. In this approach, the scalar-time input is first projected into a high-dimensional space and then processed as a sequence via a Transformer encoder. The resulting outputs are aggregated (e.g., by mean pool- ing) to yield a dynamic embedding that effectively captures both local and global temporal dependencies. The model predicts both saturation (Ŝ) and pressure (P̂) fields via separate decoding branches. Recognizing that the pressure signal is typically lower in magnitude, we apply a learnable amplification factor γP (initialized at 2.0) to the pressure output: P̂amp = γP P̂. These outputs are then fused using a cross-gated mechanism that implements bidirec- tional modulation: the pressure branch provides context for refining saturation, and the saturation branch provides context for refining pressure. Concretely, we first construct gating maps by passing each branch through a shallow convolutional block followed by a Water 2025, 17, 3351 11 of 20 sigmoid nonlinearity, so that the gates lie in [0, 1] and can act as spatially and temporally varying masks. The fused outputs are computed as ( ) Ŝfused = Ŝ + α σ C1(P̂amp) ⊙ P̂amp, ( ) P̂fused = P̂amp + β σ C2(Ŝ) ⊙ Ŝ, where α and β are learnable scalar (or channel-wise) parameters that control the strength of cross-branch contributions, σ(·) denotes the sigmoid function, and ⊙ is element-wise multiplication. In this formulation, σ(C1(P̂amp)) acts as a data-dependent gate that se- lectively injects pressure information into the saturation field, while σ(C2(Ŝ)) plays the analogous role in the pressure branch. This cross-gated fusion allows the network to dy- namically downweight noisy or less informative regions and to emphasize locations where pressure and saturation must remain tightly coupled (e.g., near displacement fronts and high-permeability channels). As a result, the model can better balance the relative contribu- tions of pressure and saturation, leading to final predictions that are more consistent with the underlying two-phase flow physics and less prone to spurious decoupling between the two fields. final predictions adhere to the underlying physics of two-phase flow. Figure 4 shows a dual-branch, time-aware FNO architecture that splits shared spectral features into separate saturation and pressure paths. Figure 4. Schematic of the proposed multi-task, time-aware FNO architecture. 2.4. Training and Optimization All models were implemented in PyTorch and trained on a single NVIDIA GPU. Inputs were standardized per channel using statistics computed once on the training set; saturation and pressure targets were standardized separately. Losses were computed in standardized space, and predictions were de-standardized for reporting. The network uses per-channel normalization in the spectral trunk and BatchNorm in the convolutional de- coder; dropout is applied only in the pressure decoder (p = 0.2). A learned 64-dimensional temporal/positional embedding of the scalar time/step is concatenated to the static inputs and broadcast over the spatial grid. No geometric augmentation was used; to emphasize early transients we oversampled early steps (t ≤ 10) by a factor of 2. Water 2025, 17, 3351 12 of 20 Optimization used AdamW (initial learning rate 5× 10−3, weight decay 10−2). A cosine-annealing schedule with warm restarts (CosineAnnealingWarmRestarts; T0 = 100, Tmult = 2, ηmin = 10 −8) was stepped per optimizer update and not reset across curriculum phases. Mini-batch size was 64, with global-norm gradient clipping at 2. The data term was Smooth-L1 (Huber) with β = 0.01, applied to saturation and pressure with equal weights. A physics-residual term (conservation and Darcy consistency) was weighted by λphys(e) = exp(−e/25), where e is the global epoch index; a total-variation penalty on the pressure head was available but set to zero in the main runs. Training followed a staged curriculum over the simulation horizon. In the first phase, the network was trained only on the earliest portion of each trajectory (up to time index 25), and in subsequent phases the maximum time index was progressively increased to 100, 500, and finally the full trajectory (∞), using 50, 50, 50, and 500 epochs per phase, respectively. This schedule forces the model to first capture early-time transients and sharp displacement fronts before fitting the later-time, quasi-steady-state behavior over the entire horizon. A concise summary of the training and configuration settings is provided in Table 5. Table 5. Training and configuration settings. Block Item Setting Data Normalization z-score per channel; separate S, P Model Temporal/Coord emb. 64-D time/step; grid (x, y, (z)) FNO Width/Layers W = 64, L = 8 FNO Activation/Residual GELU; skips FNO Paths Spectral conv + depthwise conv FNO Fourier modes (curric.) (2, 2, 2)→ (6, 6, 3)→ (18, 18, 8) Decoders S/P S: iFFT→MLP; P: Conv, dropout p = 0.2 Fusion Gating/Amp. Cross-gated conv + σ; γP = 2.0 Optim. Batch/Optimizer 64; AdamW (lr = 5× 10−3, wd 10−2) Optim. LR schedule Cosine restarts (T0 = 100, Tmult = 2, ηmin = 10 −8) Optim. Grad clip Global norm = 2 Losses Data/Physics Huber (β = 0.01) on S, P; λphys(e) = exp(−e/25) Protocol Sampling/Split Early t ≤ 10× 2; 80/20 train/val Protocol Curriculum/CKPT tmax ∈ {25, 100, 500, ∞}; epochs {50, 50, 50, 500}; best val 3. Results 3.1. Numerical Validation For numerical validation, we examine both single-phase and multi-phase flow sce- narios. In the single-phase case, the formation is modeled as a homogeneous 64× 64× 1 grid with unit porosity and viscosities, and irreducible saturations (sw, snw) set to zero, allowing full pore space utilization. Injection occurs at one corner and production at the opposite. Unlike Aarnes et al. [24], who normalize the injection rate to one pore volume of water, our non-normalized simulation reproduces the expected physics, as shown in Figure 5 (A vs. B). For the multi-phase case, we use SPE10 Model 2 with an incompress- ible two-phase system (µw = 0.3 cP, µnw = 3.0 cP, swc = snwr = 0.2). Figure 5 presents breakthrough saturation distributions for the top (A1 vs. B1) and bottom layers (A2 vs. B2). Our model matches the reference, with minor discrepancies attributable to numerical smoothing differences. Water 2025, 17, 3351 13 of 20 Figure 5. Single- and multiphase saturation fields used for model validation. The saturation colormap spans from 0 (blue) to 1 (red). Panels (A) and (B) compare single-phase simulations in a homogeneous domain for our model and the reference solution from [24], respectively. Panels (A1,B1) show the top layer of SPE10 Model 2 at breakthrough for our model and the reference, while panels (A2,B2) show the corresponding bottom-layer saturation fields, demonstrating that our model reproduces the main displacement fronts and flow patterns. 3.2. 2D FNO Results We compare (i) a purely data-driven FNO baseline and (ii) a DB-AFNO with different temporal encodings (Positional Encoding, Time2Vec, TFT/TST). Numerical simulations serve as ground truth. We evaluate absolute error fields and spatio-temporal accuracy over time, including blind tests on the top and bottom layers of SPE10 (Figure 6). Figure 6. Distribution of ℓ∞ errors, ∥Ŝ− S∥∞, across normalized time t ∗ for early (t∗ < 0.25) and late (t∗ ≥ 0.25) phases. As shown in Figure 6, DB-AFNO with Time2Vec and TFT/TST yields lower early-time errors (t∗ < 0.25), with more samples in the 10−2–10−1 range. For later times (t∗ ≥ 0.25), Time2Vec and TFT/TST remain close; Positional Encoding improves over the purely data- driven FNO but trails the former two. In Figure 7, each column corresponds to a different injection time, and each row compares the ground-truth simulator, the conventional FNO baseline, and DB-AFNO. The baseline FNO visibly diffuses the displacement front, particu- larly along high-permeability channels, and in several time slices the advancing water front Water 2025, 17, 3351 14 of 20 becomes disconnected or breaks into isolated patches. In contrast, DB-AFNO preserves the continuity of the main displacement front and better reproduces the fingering patterns and channelized advance observed in the simulator. This is most evident near breakthrough, where DB-AFNO aligns both the front position and the saturation contrast across the front, whereas the baseline front arrives too early and exhibits a smeared transition zone. These qualitative differences are consistent with the error distributions in Figure 6, where DB-AFNO has a higher proportion of predictions in the 10−2–10−1 error band at both early and late times. Figure 7. Saturation fields at multiple injection times. (Top): ground truth. (Middle): conventional FNO. (Bottom): DB-AFNO. Red dashed boxes and circles in the middle row highlight oversmoothed regions and spurious saturation artifacts produced by the conventional FNO, which are not present in the DB-AFNO predictions. Per-panel error metrics corroborate DB-AFNO accuracy. 3.3. Zero-Shot Super Resolution DB-AFNO is trained once—as an upscaler—on a single coarse/fine pair from the upper five layers of SPE10: input lattice 30× 110× 5, target lattice 60× 220× 5. After convergence, weights are frozen. At deployment, the coarse simulator supplies (Scoarse, Pcoarse), which the operator upsamples ∣ ( ) G : (Scoarse, Pcoarse ∣ θ , K, ϕ, µw, µnw, q, t) 7−→ Ŝ, P̂ ∣ . Ω60×220×5 The FNO backbone includes a Cartesian coordinate-grid embedding (normalized (x, y, z) appended per voxel) for absolute spatial context on any mesh. Training uses an incremental multi-resolution curriculum: start with 2× 2× 2 retained Fourier modes on a 2× subsampled grid, then raise spectral/spatial resolutions [(2, 2, 2)→(6, 6, 3)→(18, 18, 8)] once validation- loss gradients plateau. In this zero-shot setup the full model (including gating and temporal channels) converges in 100 epochs, with ≈1.5 GPU-min per case on a single RTX Titan. With >90% of voxels below a 5% absolute-error threshold. Figure 8 shows 3-D render- ings of Ŝ and |S− Ŝ|; fronts, channel connectivity, and high-saturation ridges are preserved at 8× volumetric up-resolution. Figure 9 provides layer-wise scatter plots at the final time step; similar behavior holds at other times. Results remain sensitive to the coarse grid: using 15× 55× 5 did not reproduce the same accuracy. Water 2025, 17, 3351 15 of 20 Figure 8. Zero-shot super-resolution on SPE10: saturation Ŝ and absolute error |S− Ŝ| at four times. Total coarse→fine pipeline: 4.5 min. Figure 9. Layer-wise comparison for L = 1:5 at 60 × 220 resolution. Scatter plots report R2, RMSE, and MAE; red dashed line denotes perfect agreement. Water 2025, 17, 3351 16 of 20 4. Discussion Across the 2-D benchmarks, DB-AFNO consistently outperforms a purely data-driven FNO. Error distributions in ℓ∞ norm show that Time2Vec- and TFT/TST-encoded variants concentrate early-time errors (t∗ < 0.25) in the 10−2–10−1 band, whereas Positional Encod- ing and the conventional FNO exhibit heavier tails (Figure 6). Near pseudo–steady state (t∗ ≥ 0.25), Time2Vec and TFT/TST remain closely matched and continue to dominate Positional Encoding. Visual comparisons corroborate these statistics: the baseline FNO pro- duces over-smoothed fields with disconnected saturation artifacts, while DB-AFNO retains sharp, physically plausible fronts and spatial connectivity (Figure 7). These gains arise from (i) the explicit coupling of pressure and saturation through dual branches, (ii) temporal conditioning that disambiguates early transients, and (iii) physics-informed losses that penalize mass-balance and Darcy residuals. For zero-shot super-resolution, a train-low, infer-high regimen enables coarse-to-fine mapping without retraining. A single coarse/fine pair 30× 110× 5→ 60× 220× 5 is suffi- cient to learn mesh-agnostic upscaling, aided by a Cartesian coordinate-grid embedding for absolute spatial context and an incremental multi-resolution curriculum that increases retained Fourier modes and spatial resolution only after validation plateaus. This curricu- lum halves wall-clock training and reduces GPU memory by ∼40% relative to single-shot high-resolution training, while the deployed pipeline (coarse TPFA + one forward pass) achieves ∼3× speedup over full-order fine-grid simulation (Table 6). On five unseen layers, we obtain R2 = 0.90, RMSE = 4.4× 10−2, and MAE = 3.2× 10−2, with >90% of voxels below 5% absolute error. Three-dimensional renderings confirm preservation of water- flood fronts, channel continuity, and high-saturation ridges at 8× volumetric up-resolution (Figure 8), and layer-wise scatter plots indicate tight agreement at the final time step with similar behavior at other times (Figure 9). Table 6. Runtime benchmark for zero-shot upscaling vs. full-order TPFA on five unseen layers. Step Wall-Time Comment Full-order TPFA on 60 × 220 × 5 ≈15 min Ground truth Coarse TPFA on 30 × 110 × 5 3 min Generates inputs DB-AFNO zero-shot upscale 1.5 min Single forward pass Total 4.5 min ∼3 × faster Note: All runs on a single RTX Titan; identical simulator settings except grid resolution. A key technical limitation of the zero-shot path is its dependence on the fidelity of the coarse solution: the mapping presumes that the coarse grid resolves the dominant transport features. When the input was degraded to 15× 55× 5, the same accuracy was not recovered, indicating a lower bound on coarse resolution for reliable upscale inference. Minor discrepancies in the validation figures are consistent with numerical smoothing choices rather than architectural deficiencies. Overall, the results indicate that (i) enforcing coupled physics with dual-branch decoding and temporal encodings improves early-time accuracy and suppresses over-smoothing; (ii) the operator generalizes to unseen layers within SPE10 when conditioned on physically meaningful inputs; and (iii) zero-shot super-resolution provides accurate fine-grid fields at a fraction of the computational cost, provided the coarse simulation captures the essential flow physics. The DB-AFNO surrogates presented here are trained entirely on incompressible, im- miscible two-phase displacements, and thus are most reliable for formations and operating conditions with similar heterogeneity statistics, viscosity ratios, and flow regimes; all results in this study pertain to such systems with no explicit gas phase or compositional effects. When the underlying physics changes significantly—e.g., strong capillary hysteresis, gas Water 2025, 17, 3351 17 of 20 evolution or gas caps, or fracture-dominated flow—retraining or transfer learning on sim- ulations from the new regime would be required. While the dual-branch architecture is, in principle, extensible to three-phase or compositional flow (e.g., via an additional output branch and corresponding physics residuals), this lies beyond the scope of the present work and is left for future investigation. 5. Conclusions 1. The proposed DB-AFNO framework achieves a significant reduction in computa- tional cost compared to traditional full-order solvers while preserving high-resolution prediction accuracy, as demonstrated on the SPE10 benchmark. 2. The dual-branch architecture effectively couples pressure and saturation fields, ensur- ing that the predicted outputs remain physically consistent and adhere closely to the governing PDE constraints. 3. The incorporation of advanced temporal encoding methods (e.g., Time2Vec, TFT/TST) enhances the model’s ability to capture dynamic transient behaviors, resulting in lower validation errors during both early transient and pseudo-steady states. 4. By integrating physics-informed loss functions, the model not only minimizes data- driven prediction errors but also enforces adherence to the underlying flow physics, thereby improving overall reliability. 5. The overall architecture demonstrates scalability to higher-resolution simulations, given that it is trained on similar lower-fidelity cases. 6. Despite its high-fidelity performance within the training distribution, DB-AFNO- based approaches may fall out-of-distribution when the underlying physics deviates from those encountered during training. 7. The scalability of DB-AFNO was found to be similar to that of FNO-based models. It is limited in scenarios involving sudden or drastic changes in governing physical processes, indicating the need for adaptive re-training or architectural modifications when the physics evolve beyond the original training regime. 8. Zero-shot super-resolution cuts the end-to-end turnaround for a 60× 220× 5 grid from ∼15 min (full TPFA) to 4.5 min by upscaling a coarse 30× 110× 5 run with a single forward pass, achieving R2 = 0.90 and RMSE = 4.4 × 10−2 without additional fine-grid training. Author Contributions: Conceptualization, H.A.H. and J.W.; methodology, H.A.H.; software, H.A.H. and O.E.; validation, H.A.H., O.E. and J.W.; formal analysis, H.A.H.; investigation, H.A.H. and O.E.; resources, H.A.H. and O.E.; data curation, O.E.; writing—original draft preparation, H.A.H.; writing—review and editing, H.A.H., O.E. and J.W.; visualization, H.A.H. and O.E.; supervision, J.W.; project administration, J.W.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript. Funding: The authors did not receive any specific grant from funding agencies in the public, com- mercial, or not-for-profit sectors for the research, authorship, or publication of this work. Data Availability Statement: The data used in this study are openly available in the SPE Comparative Solution Project (Dataset 2, Model 2) at https://www.spe.org/web/csp/datasets/set02.htm (accessed on 14 November 2025). Conflicts of Interest: The authors declare no conflicts of interest. Water 2025, 17, 3351 18 of 20 Abbreviations 2-D Two-dimensional 3-D Three-dimensional AdamW Adam optimizer with decoupled weight decay AFNO Adaptive Fourier Neural Operator DB-AFNO Dual-Branch Adaptive Fourier Neural Operator FNO Fourier Neural Operator PINNs Physics-Informed Neural Networks Time2Vec Learnable temporal encoding method TFT Temporal Fusion Transformer TST Time-series Transformer (family) GCR Generalized Conjugate Residual BiCGStab BiConjugate Gradient Stabilized JF-BiCGStab Jacobian-Free BiCGStab LU Lower–Upper factorization ILU Incomplete LU TPFA Two-Point Flux Approximation CFL Courant–Friedrichs–Lewy condition TV Total Variation PDE Partial Differential Equation PV Pore Volume SPE10 10th SPE Comparative Solution Project RMSE Root-Mean-Square Error MAE Mean Absolute Error R2 Coefficient of determination GPU Graphics Processing Unit CPU Central Processing Unit CUDA Compute Unified Device Architecture PyTorch Python-based deep learning framework GELU Gaussian Error Linear Unit cP Centipoise mD Millidarcy RL Reinforcement Learning Appendix A. Algorithms and Pseudocode Algorithm A1 Generalized Conjugate Residual (GCR) [16] Require: A ∈ Rn×n, b, initial x0, tolerance ε 1: r0 ← b− Ax0, k← 0 2: while ∥rk∥ > ε do 3: p̃← rk 4: for i = 0 to k− 1 do (Ap̃, Api) 5: βik ← − , p̃← p̃ + β p (Api, Ap ik i i) 6: pk ← p̃/∥Ap̃∥ (rk, Apk) 7: αk ← (Apk, Apk) 8: xk+1 ← xk + αk pk, rk+1 ← rk − αk Apk 9: k← k + 1 10: return xk Water 2025, 17, 3351 19 of 20 Algorithm A2 Newton–Krylov with GCR linear step [25] Require: Nonlinear residual F(x), Jacobian JF(x), initial x0, tolerance tol f , max iters kmax 1: k← 0, x ← x0 2: repeat 3: f ← F(x) 4: J ← JF(x) 5: Solve J ∆x = − f with GCR⇒ ∆x 6: x ← x + ∆x, k← k + 1 7: until ∥ f ∥∞ ≤ tol f or k ≥ kmax 8: return x Algorithm A3 Adaptive trapezoidal time integration [26] Require: ẋ = f (x, p, u), x(ts) = x0, ts < t f , initial ∆t, tolerances 1: t← ts, x ← x0 2: while t < t f do 3: Predict xpred ← x + ∆t f (x, p, u) ( ) 4: Correct solve for xnew s.t. xnew − x− ∆t2 f (x) + f (x new) = 0 (Newton) 5: Estimate error e← ∥xnew − xpred∥ 6: if e small then 7: t← t + ∆t, x ← xnew 8: Update ∆t using controller (e.g., PI or exponential rule); enforce bounds 9: return trajectory {x(t)} Algorithm A4 Jacobian-free BiCGStab [25,27] Require: Residual F(x), current x, right-hand side b, finite-diff. step ε, tolerance tol ( ) 1: function MATVEC(v) return F(x + εv)− F(x) /ε 2: δ← 0, r ← b− MATVEC(δ), r̂ ← r 3: ρ← 1, α← 1, ω ← 1, v← 0, p← 0 4: for k = 0 to kmax − 1 do 5: ρ ⊤new ← r̂ r 6: if |ρnew| < 10 −14 then break 7: if k = 0 then 8: p← r 9: else 10: β← (ρnew/ρ)(α/ω); p← r + β(p−ωv) 11: v← MATVEC(p); α← ρnew/(r̂ ⊤v) 12: s← r− αv 13: if ∥s∥ < tol then 14: δ← δ + αp; return δ 15: t← MATVEC(s); ω ← (t⊤s)/(t⊤t) 16: δ← δ + αp + ωs; r ← s−ωt 17: if ∥r∥ < tol then return δ 18: ρ← ρnew 19: return δ References 1. Sejnowski, T.J. The unreasonable effectiveness of deep learning in artificial intelligence. Proc. Natl. Acad. Sci. USA 2020, 117, 30033–30038. [CrossRef] 2. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [CrossRef] 3. Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021, 3, 422–440. [CrossRef] Water 2025, 17, 3351 20 of 20 4. Lu, L.; Meng, X.; Yang, L.; Karniadakis, G.E. DeepXDE: A deep learning library for solving differential equations. SIAM Rev. 2021, 63, 208–228. [CrossRef] 5. Li, Z.; Kovachki, N.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.; Anandkumar, A. Fourier neural operator for parametric partial differential equations. arXiv 2020, arXiv:2010.08895. [CrossRef] 6. Kovachki, N.; Li, Z.; Liu, B.; Azizzadenesheli, K.; Bhattacharya, K.; Stuart, A.; Anandkumar, A. Neural operator: Learning maps between function spaces. J. Mach. Learn. Res. 2021, 24, 4061–4157. 7. Li, Z.; Zheng, H.; Kovachki, N.; Jin, D.; Chen, H.; Liu, B.; Azizzadenesheli, K.; Anandkumar, A. Physics-informed neural operator for learning partial differential equations. arXiv 2021, arXiv:2111.03794. [CrossRef] 8. Patel, R.G.; Trask, N.A.; Wood, M.A.; Cyr, E.C. A physics-informed operator regression framework for extracting data-driven continuum models. Comput. Methods Appl. Mech. Eng. 2021, 373, 113500. [CrossRef] 9. NVIDIA Modulus Team. Darcy Flow with Fourier Neural Operator. 2024. Available online: https://docs.nvidia.com/ physicsnemo/25.08/physicsnemo-sym/user_guide/neural_operators/darcy_fno.html (accessed on 18 July 2024). 10. Wen, G.; Li, Z.; Azizzadenesheli, K.; Anandkumar, A.; Benson, S.M. U-FNO—An enhanced Fourier neural operator-based deep-learning model for multiphase flow. Adv. Water Resour. 2022, 163, 104180. [CrossRef] 11. Cao, L.; O’Leary-Roseberry, T.; Jha, P.K.; Oden, J.T.; Ghattas, O. Residual-based error correction for neural operator accelerated infinite-dimensional Bayesian inverse problems. J. Comput. Phys. 2023, 486, 112104. [CrossRef] 12. Ma, X.; Zhong, R.; Zhan, J.; Zhou, D. Enhancing subsurface multiphase flow simulation with Fourier neural operator. Heliyon 2024, 10, e38103. [CrossRef] [PubMed] 13. Jain, N.; Roy, S.; Kodamana, H.; Nair, P. Scaling the predictions of multiphase flow through porous media using operator learning. Chem. Eng. J. 2025, 503, 157671. [CrossRef] 14. Society of Petroleum Engineers. 10th SPE Comparative Solution Project, Model 2. 2024. Available online: https://www.spe.org/ web/csp/datasets/set02.htm (accessed on 18 July 2024). 15. Buckley, S.E.; Leverett, M.C. Mechanism of fluid displacement in sands. Trans. Aime 1941, 146, 107–116. [CrossRef] 16. Demmel, J.W. Applied Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [CrossRef] 17. Golub, G.H.; Van Loan, C.F. Matrix Computations; Johns Hopkins Studies in the Mathematical Sciences; Johns Hopkins University Press: Baltimore, MD, USA, 2013. 18. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems 30 (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [CrossRef] 19. Kazemi, S.M. Time2Vec: Learning a vector representation of time. arXiv 2019, arXiv:1907.05321. [CrossRef] 20. Lim, B.; Zohdy, M.; Phan, H.; Lee, S.J. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J. Forecast. 2019, 36, 628–645. [CrossRef] 21. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Liu, R.; Shen, J.; Pan, G. Informer: Beyond efficient transformer for long sequence time-series forecasting. arXiv 2021, arXiv:2012.07436. [CrossRef] 22. Xu, H.; Wu, W.; Jiang, P.; Ding, Y.; Lu, Z.; Li, X. Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy. In Proceedings of the Tenth International Conference on Learning Representations (ICLR 2022), Virtual, 25–29 April 2022. [CrossRef] 23. Nie, Y.; Chi, X.; Xue, Y.; Liu, Y.; Zhang, T.; Pei, Y.; Wang, C.; Li, Y. PatchTST: Training back-bone is all you need for time-series forecasting. arXiv 2022, arXiv.2211:14730. [CrossRef] 24. Aarnes, J.E.; Gimse, T.; Lie, K.A. An Introduction to the Numerics of Flow in Porous Media Using MATLAB; Springer: Berlin, Germany, 2007. [CrossRef] 25. Knoll, D.A.; Keyes, D.E. Jacobian-free Newton–Krylov methods: A survey of approaches and applications. J. Comput. Phys. 2004, 193, 357–397. [CrossRef] 26. Hairer, E.; Wanner, G. Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems; Springer Series in Computational Mathematics; Springer: Berlin, Germany, 1996. [CrossRef] 27. Kelley, C.T. Solving Nonlinear Equations with Newton’s Method; SIAM: Philadelphia, PA, USA, 2003. [CrossRef] Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.