Article
A Dual-Branch Coupled Fourier Neural Operator for
High-Resolution Multi-Phase Flow Modeling in Porous Media
Hassan Al Hashim 1,* , Odai Elyas 2 and John Williams 2
1 Department of Computational Science and Engineering, Massachusetts Institute of Technology,
Cambridge, MA 02139, USA
2 Department of Civil and Environmental Engineering, Massachusetts Institute of Technology,
Cambridge, MA 02139, USA
* Correspondence: hwhashim@mit.edu
Abstract
This paper investigates a physics-informed surrogate modeling framework for multi-phase
flow in porous media based on the Fourier Neural Operator. Traditional numerical sim-
ulators, though accurate, suffer from severe computational bottlenecks due to fine-grid
discretizations and the iterative solution of highly nonlinear partial differential equations.
By parameterizing the kernel integral directly in Fourier space, the operator provides a
discretization-invariant mapping between function spaces, enabling efficient spectral con-
volutions. We introduce a Dual-Branch Adaptive Fourier Neural Operator with a shared
Fourier encoder and two decoders: a saturation branch that uses an inverse Fourier trans-
form followed by a multilayer perceptron and a pressure branch that uses a convolutional
decoder. Temporal information is injected via Time2Vec embeddings and a causal temporal
transformer, conditioning each forward pass on step index and time step to maintain
consistent dynamics across horizons. Physics-informed losses couple data fidelity with
residuals from mass conservation and Darcy pressure, enforcing the governing constraints
in Fourier space; truncated spectral kernels promote generalization across meshes without
retraining. On SPE10-style heterogeneities, the model shifts the infinity-norm error mass
into the 10−2 to 10−1 band during early transients and sustains lower errors during pseudo-
steady state. In zero-shot three-dimensional coarse-to-fine upscaling from 30× 110× 5 to
Academic Editor: Vittorio Di Federico 60× 220× 5, it attains R2 = 0.90, RMSE = 4.4× 10−2, and MAE = 3.2× 10−2, with more
Received: 21 October 2025 than 90% of voxels below five percent absolute error across five unseen layers, while the
Revised: 16 November 2025 end-to-end pipeline runs about three times faster than a full-order fine-grid solve and
Accepted: 19 November 2025 preserves water-flood fronts and channel connectivity. Benchmarking against established
Published: 23 November 2025 baselines indicates a scalable, high-fidelity alternative for high-resolution multi-phase flow
Citation: Al Hashim, H.; Elyas, O.; simulation in porous media.
Williams, J. A Dual-Branch Coupled
Fourier Neural Operator for Keywords: Fourier Neural Operator; physics-informed machine learning; surrogate
High-Resolution Multi-Phase Flow
modeling; groundwater flow; heterogeneous porous media; Darcy flow; aquifer flow
Modeling in Porous Media. Water 2025,
and transport simulation; aquifer; hydraulic conductivity
17, 3351. https://doi.org/10.3390/
w17233351
Copyright: © 2025 by the authors.
Licensee MDPI, Basel, Switzerland.
1. Introduction
This article is an open access article
distributed under the terms and Multi-phase flow in porous media underpins critical applications in subsurface engi-
conditions of the Creative Commons neering—from groundwater remediation to carbon sequestration. The governing equations,
Attribution (CC BY) license nonlinear partial differential equations (PDEs) that capture the intricate interactions among
(https://creativecommons.org/
fluid phases and the heterogeneous porous structure, are notoriously difficult to solve
licenses/by/4.0/).
Water 2025, 17, 3351 https://doi.org/10.3390/w17233351
Water 2025, 17, 3351 2 of 20
numerically. Traditional solvers, whether implemented in commercial software or open-
source tools, require extremely fine discretizations to capture the essential dynamics, which
in turn leads to prohibitive computational costs and long turnaround times, especially
when conducting multi-parameter studies or producing high-resolution predictions.
While deep learning has revolutionized fields such as imaging, speech recognition,
and natural language processing [1], the application of machine learning techniques that
explicitly incorporate physical laws is relatively new. Early contributions such as the
physics-informed neural networks (PINNs) introduced [2] and later refined in works like
those of [3,4] have demonstrated that embedding governing equations into the training
process can guide the solution toward physically consistent behavior. Yet, these methods
have often yielded modest gains when compared with the dramatic successes observed in
other areas of deep learning.
A particularly transformative approach comes from the emerging field of operator
learning, which seeks to approximate mappings between infinite-dimensional function
spaces. In this context, the Fourier Neural Operator (FNO) introduced [5] and extended [6]
represents a paradigm shift. By reformulating the kernel integral in Fourier space and
truncating the high-frequency components, FNO achieves spectral convolution that is both
computationally efficient and inherently suited to enforcing physical constraints. Crucially,
its invariance with respect to discretization allows the learned operator to generalize
seamlessly across arbitrary grids without requiring re-training—a property of immense
value for high-resolution simulations.
In porous media flow, where multi-scale heterogeneities and strong nonlinearities
prevail, integrating physics-informed loss terms into the FNO framework (as demonstrated
in physics-informed neural operators [7,8]) has proven essential. This integration ensures
that the surrogate model rigorously adheres to the underlying PDEs, thereby reducing the
computational effort needed to achieve accurate predictions. For example, implementations
on platforms such as NVIDIA Modulus have shown that FNO-based approaches can
reduce simulation times by up to three orders of magnitude while preserving accuracy
across different discretizations [9]. More recently, several studies have adapted FNO-
style architectures specifically to multiphase flow in porous media and related subsurface
applications, including enhanced multiphase FNO variants, residual-corrected neural
operators for nonlinear PDE inverse problems, and operator-based surrogates for field-
scale reservoir simulations [10–13].
However, existing applications of Fourier Neural Operators to porous-media flow still
face important limitations. Most studies focus on single-phase or weakly coupled settings
and treat only one state variable (typically pressure or saturation), thereby neglecting the
strong two-way coupling that characterizes multiphase displacement. Purely data-driven
FNOs also tend to oversmooth sharp saturation fronts and underestimate channelized flow
in highly heterogeneous formations such as SPE10, particularly during early transients.
In many implementations, time is handled as a simple scalar feature or as an additional
spatial coordinate, which can degrade accuracy over long horizons and lead to drift between
transient and pseudo-steady regimes. Moreover, existing FNO surrogates rarely enforce
PDE residuals during training, so saturation and pressure predictions may violate mass
balance and Darcy’s law, reducing robustness when extrapolating across injection scenarios.
This work presents an advanced framework for applying FNO to the physics-informed
simulation of multi-phase flow in porous media. Building on a dual-branch formulation
(DB-AFNO), the physics-based loss functions are integrated with different temporal encod-
ings (e.g., Time2Vec, TFT/TST) to honor the governing PDE constraints in both saturation
and pressure fields. Our comprehensive computational analysis demonstrates that DB-
AFNO outperforms conventional, purely data-driven FNO approaches across diverse tracer
Water 2025, 17, 3351 3 of 20
transport configurations, reducing errors during early transient phases and maintaining ac-
curacy in later pseudo-steady states. In particular, DB-AFNO with Time2Vec and TFT/TST
concentrates the majority of saturation predictions in the lower error ranges, preserving
critical flow structures that conventional FNO often oversmooth. Blind testing on top and
bottom layers of the 10th SPE Comparative Solution Project [14] highlights the framework’s
potential for scalable, high-resolution simulations without incurring the computational
costs of traditional full-order solvers.
The uniqueness of this approach lies in explicitly coupling the two PDE variables
using a multitask, time-aware architecture. While conventional FNOs typically address
a single field, our model retains the core spectral convolution operations and extends
them via key modifications. First, a shared spectral encoder predicts both pressure and
saturation, extracting spatial features that naturally encode interdependencies. Second,
two distinct time encoders—one for saturation and one for pressure—transform the scalar
time t into high-dimensional representations that capture differing temporal sensitivities
while preserving mutual spatial information. Finally, after decoding via separate branches,
the outputs are fused through an additional convolutional layer that enforces consistency
between the predictions through cross-information exchange.
2. Methods
This section presents a technical description of the problem using the dynamical state-
space framework. The focus of this formulation is on the incompressible and immiscible
displacement of two phases within a porous media. More recently, the same principles have
been extended to carbon sequestration efforts, specifically in the displacement of brine by
CO2 [15]. In this work, we adopt a Darcy-scale, incompressible and immiscible two-phase
formulation on a fixed Cartesian grid. The two phases are denoted wetting and non-wetting,
with saturations Sw and Snw and corresponding pressures Pw and Pnw. The physical domain
Ω is discretized into N control volumes on a structured grid, and the primary unknowns
collected in the state vector are the cell-wise saturation and pressure fields. The fluid
system is isothermal, with constant viscosities and densities, and rock properties (porosity
and absolute permeability) are time-invariant and defined per grid cell. Flow is governed
by Darcy’s law under a single-continuum representation, without explicit geomechanical
deformation, fracture propagation, or matrix–fracture transfer terms, and without capillary-
pressure hysteresis. Gravity is included through a constant acceleration vector, while
molecular diffusion and dispersion are neglected so that transport is driven by advection
and sources/sinks only. Under these assumptions, the saturation constraint reduces to
Sw + Snw = 1, and a monotone capillary-pressure law Pc(Sw) together with Corey-type
relative permeability functions krψ(Sψ) closes the system. The governing equations fully
describe the displacement process considered here. Gas-phase transport, solution-gas
liberation, and evaporation/condensation processes are neglected; consequently, no gas
saturation or gas pressure variables are introduced into the state vector, and all volumetric
source terms represent injection and production of the two liquid phases only. A concise
summary of the formulation is provided in Tables 1–3.
Table 1. State-Space Variables for Multiphase Flow in Porous Media.
Component Description/Equation
State vector x x = [ Sw1, Snw1, . . . , SwN , SnwN ]
Parameter vector p p = [ ϕ, K, µw, µnw, ρw, ρnw, Pcap, M ]
Excitation vector u(t) u(t) = [ qw1(t), qnw1(t), . . . , qwN(t), qnwN(t) ]
Output vector y y = [ Sw1, Snw1, P1, . . . , SwN , SnwN , PN ]
Water 2025, 17, 3351 4 of 20
Table 2. Governing Equations for Multiphase Flow in Porous Media.
Component Description/Equation
Mass conservation ∂(ϕSψ) +∇· (vψ) = qψ, ψ ∈ {w, nw}
∂t
Kk ( )
Darcy’s law rψvψ = − ∇Pψ − ρψg , ψ ∈ {w, nw}
µψ
Saturation constraint Sw + Snw = 1
Capillary pressure Pcψ = Pψ − Pw, ψ ∈ {nw}
∗ 2 ∗ 2
Relative permeabilities (s ) (1− s ) ∗ s− swcλw(s) = , λnw(s) = , s =
µw µnw 1− snwr − swc
Effective mobility krψ = f (Sψ), ψ ∈ {w, nw}
Table 3. Discrete Saturation Dynamics and Coupling Terms.
Component Description/Equation
( )
∂S
Saturation Dynamics ψ
K krψ ( ) qψ
= −∇· ∇Pψ − ρψg +
∂t µψ ϕ
∂Sψ ( ) qψ
or ϕ +∇· fψ(s) v =
∂t ρψ
( )−1
Transmissibility, T ∆xT 2|A | i
∆xj
ij = ij +
λi,ij λj,ij
Pressure Equation Solve T P = q (Two-Point Flux Approx.)
∂S
Coupling ψ = diag( fw1, . . . , fwN) A + [ fi1, . . . , ft iN
],
∂
where A captures velocity interactions and source scaling by
local PV.
Notes: In the above table, Swi and Snwi denote the saturations/concentration of the wetting and non-wetting
phases in the i-th grid block, respectively; ϕ is the porosity; K represents permeability; µw and µnw are the
viscosities of the wetting and non-wetting phases; ρw and ρnw are their densities; Pcap is the capillary pressure; and
M includes any additional parameters. The excitation vector u(t) consists of injection/production rates qwi(t) and
qnwi(t). The output vector y pairs saturation and pressure measurements. Mass conservation (qψ) and Darcy’s law
(with velocity vψ) govern flow, while the saturation constraint and capillary pressure ensure physical consistency.
Relative permeabilities λw(s) and λnw(s) (with normalized saturation s
∗) and effective mobility krψ = f (Sψ)
model fluid transport properties. The transmissibility matrix T (computed via TPFA) couples the pressure field P
to the source q, and the pressure-to-saturation coupling (via matrix A) reflects the influence of the velocity field
and local pore volume (PV) on saturation evolution.
The above formulation was coded in Python using Pytorch and Cuda to allow for
interoperable transfer between the simulator and the neural operator architecture. In the
following sections, we will present the numerical methods used for the simulation part
along with the neural operator architectures.
2.1. Model Description
In this study, we used the SPE10 Model 2 dataset [14], an open source benchmark
known to evaluate up-scaling methods for aquifer flow and tracer transport. This is used as
an analogue to an underground water aquifer for tracer transport. This data set is composed
of a Brent sequence mapped on a Cartesian grid with dimensions of 60× 220× 85, resulting
in 1,122,000 cells. The model encompasses two distinct formations: the shallow-marine
Tarbert formation occupying the upper 35 layers, characterized by relatively smooth perme-
ability, and the fluvial Upper-Ness formation constituting the lower 50 layers, noted for its
significant permeability variations spanning 8 to 12 orders of magnitude. These formations
Water 2025, 17, 3351 5 of 20
present markedly different permeability structures, as highlighted in Figure 1, where the
model is inverted to emphasize the heterogeneity within the Upper-Ness formation. This
complex permeability and porosity distribution poses a rigorous challenge, making it an
ideal test case for operator learning.
Figure 1. Permeability distributions in the SPE10 Model 2. (A) The left panel shows log(Kh) for the
static model. (B) The right panel shows log(Kv) for the static model. The color scale indicates the
permeability values, with red representing high permeability and blue representing low permeability.
2.2. Numerical Methods
We have implemented a framework that incorporates an adaptive trapezoidal integra-
tion scheme. The Generalized Conjugate Residual (GCR) method serves as a robust solver
for linear systems within each Newton iteration, providing efficient handling of sparse
matrix structures and enhancing the convergence characteristics of the overall method.
The Newton’s algorithm, integrated with the GCR solver, is designed to leverage its fast
convergence properties, significantly reducing the computational overhead associated
with direct methods like LU decomposition, while maintaining the capacity for handling
nonlinear systems effectively. This integration of the adaptive trapezoidal method with
iterative linear solvers is crucial in our simulations, striking an optimal balance between
computational efficiency and the accuracy of transient dynamic responses in complex
multiphase flows.
There are two encountered main challenges in this work: (1) building a stable model
to act as a base for our neural operator implementation and (2) ensuring that the model is
computationally efficient.
To circumvent these challenges, we have implemented several advanced numerical
techniques and optimization methods that enhance both the stability and efficiency of the
model. These techniques include the adaptive dynamic stepping scheme, the use of GCR
with a banded LU preconditioner, and the employment of automatic differentiation for
efficient Jacobian computation.
2.2.1. Preconditioned Iterative Solvers
Iterative methods were deemed to be more suitable for our approach due to the ability
to predetermine the accuracy needed for phase saturations. However, iterative methods,
specifically GCR, may not be efficient for sparse matrices out-of-the-box. Therefore, using
a preconditioner is important to achieve fast convergence while having control over the
desired accuracy.
Water 2025, 17, 3351 6 of 20
Banded LU/ILU preconditioners are particularly suitable for systems where the ma-
trix A exhibits a banded structure [16]. This approach is advantageous for large sparse
systems with a banded pattern, as it significantly reduces the fill-in compared to a full LU
decomposition, thereby preserving the computational efficiency.
(U−1L−1κ A) < κ(A), for banded A. (1)
Alternatively, when the Jacobian matrix A is symmetric and positive-definite,
a Cholesky decomposition can be used for preconditioning [17]. The preconditioned
system is solved via
L−T L−1 Ax = L−T L−1b, (2)
Hence, multi-grid preconditioners are also common for fluid flow problems, however,
Python does not support GPU direct implement similar to PyAMG on the CPU therefore
it was not included in our benchmark. The choice between banded LU and Cholesky
preconditioning depends on the structure and properties of the Jacobian matrix, Cholesky
preferred for symmetric positive-definite systems because of its computational efficiency.
In our model, we found that ILU is more stable for our Neumann boundary condition
pressure solver.
2.2.2. Adaptive Dynamic Stepping:
In the adaptive time-stepping scheme implemented within the simulation framework,
the trapezoidal method is utilized to preserve numerical stability without the need for the
implementation of CFL, particularly crucial in the context of stiff systems encountered
in multiphase flow simulations. The integration process begins with a prediction step
that takes advantage of the explicit Forward Euler method to approximate saturations at
the subsequent timestep, denoted as Spredictor. This preliminary prediction is calculated
according to the following equation:
( )
dS
Spredictor = Scurrent + ∆t · dt current
where Scurrent represents the current saturation state, ∆t signifies the current timestep,( )
and dSdt is the rate of change in saturation evaluated in the current state. Thiscurrent
predictor step serves as an initial approximation for the subsequent implicit corrector
step, which refines this estimate to enhance stability and accuracy. The corrector phase is
carried out using an implicit trapezoidal integration, where the corrected saturation Snext is
determined by resolving the nonlinear equation:
(( ) ( ) )
∆t dS dS
Snext − Scurrent − + = 0
2 dt current dt next
( )
Here, dSdt denotes the rate of saturation change evaluated in the next predictednext
state. This equation is solved iteratively using the Newton–Raphson method.
To dynamically adjust the timestep ∆t, an error estimation mechanism is incorporated
that compares the norm of the discrepancy between the predictor and corrector solutions
against a predefined tolerance. This adaptive time-step control is vital for managing
computational efficiency and accuracy throughout varying dynamics of the modeled
system. If the error estimate exceeds the tolerance threshold, ∆t is reduced to enhance
resolution; conversely, if the error is significantly below the tolerance, ∆t is increased to
expedite computations without sacrificing significant accuracy.In the context of dynamically
regulating the time step ∆t within numerical simulations, a sophisticated approach is
required to ensure stability and precision under varying conditions. The reduction factor
Water 2025, 17, 3351 7 of 20
for adjusting ∆t can be mathematically defined using an exponential function, providing a
smooth and continuous regulation mechanism. The reduction factor R for the time step ∆t
is given by the following equation:
R(∆t) = α + (β− α) exp(−γ∆t)
where α is the minimum reduction factor, β is the maximum reduction factor and γ is a
scaling parameter controlling the steepness of the exponential decay.
In this formulation, as ∆t increases, the reduction factor R approaches the lower bound
α, whereas as ∆t decreases, R approaches the upper bound β. To dynamically adjust the
time step ∆t, the new time step ∆tnew is defined as
( )
∆tnew = max ∆t× R(∆t), ∆tmin ,
where ∆tmin is the minimum allowable step size, preventing excessively small steps that
would be computationally inefficient. The stability constraints for ∆t are derived from the
spectral properties of the system’s Jacobian J, using the inverse of its maximum eigenvalue,
1 , and a more conservative bound, 0.1 . These limits ensure that timestep updates
λmax(J) λmax(J)
remain within a range that preserves the numerical stability of the integration method for
multiphase flow. All pseudocode listings for the above algorithms, Algorithms A1–A4, are
provided in Appendix A.
2.3. Neural Operator Architecture
We use a DB–AFNO neural operator that fuses spectral, spatial, and physics-based
information to predict mean saturation Ŝ ∈ RNx×Ny and variance N ×Nσ2 ∈ R x y . Static
geological properties—porosity ϕ(x), permeability k(x)—initial states Pinitial, Sinitial, and
phase viscosities µw, µnw are combined with temporal encoding via Time2Vec and a causal
transformer that conditions each step on the time index and ∆t. Fourier layers capture
the coupled space–time response, and training minimizes a composite loss that blends
data misfit with PDE residuals enforcing mass conservation and Darcy flow. In this way,
the network advances the state in time and returns high-resolution, physically consistent
pressure P and saturation S fields at arbitrary query times, as summarized in Figure 2.
Figure 2. physics-informed Fourier neural-operator framework integrating static geological inputs,
dynamic well controls, and temporal encoders with Fourier operator blocks to predict subsurface
pressure and saturation fields. Solid boxes denote trainable neural-network modules (encoders,
Fourier blocks, and decoder), while dashed boxes represent physical input and output fields.
Water 2025, 17, 3351 8 of 20
2.3.1. Data Preparation
The data preparation pipeline for the physics-informed Fourier Neural Operator
framework integrates simulation outputs—including grid parameters, K fields, well con-
figurations, initial saturation distributions, and fluid properties—stored in serialized files.
The raw data is reformatted for a consistent spatial representation; for example, the K tensor
is converted to an effective scalar field via magnitude computation. Temporal dynamics are
captured using a multi-frequency positional encoding with sine and cosine functions over
logarithmically spaced scales, which is then spatially broadcast and concatenated with static
input channels to form a comprehensive feature tensor. Global statistical measures (mean
and standard deviation) are computed over the entire dataset and applied to both input and
target fields to standardize the data, reducing numerical variability and aiding convergence
during training. Early time steps are replicated to better represent rapid transients. Over
1000 simulation runs at half fidelity were performed, with the data partitioned 80% for
training and 20% for validation, plus additional cases reserved for blind testing. These runs
span a wide range of heterogeneous SPE10 Model 2 realizations, as well as idealized cases
with spatially uniform permeability and porosity that are included explicitly in the training
set. Across all runs, we vary initial water saturations, viscosity ratios, well locations, and in-
jection schedules, resulting in substantial variability in flood-front shapes, breakthrough
times, and recovery factors. A statistical overview of the key petrophysical variables is
provided in Figure 3, and descriptive statistics are listed in Table 4, demonstrating that
the training set covers a broad portion of the oil–water displacement behavior within this
benchmark setting. This process is implemented using a modular Python framework built
on PyTorch (version 2.9.1, CUDA 12.6), supporting efficient file handling, dynamic sample
generation, and mini-batch processing for large-scale data, with the simulations executed
on hardware platforms equipped with RTX Titan GPUs.
Figure 3. Statistical analysis of key training parameters. Top row (left to right): permeability
distribution, log(Kx), in log(mD); porosity, ϕ (fraction); and initial water saturation, Sw (−). Middle
row: saturation evolution over normalized time, t/ttotal (blue curves: individual realizations; thick red
curve: ensemble mean S̄w); normalized well locations with X, Y ∈ [0, 1] (orange markers: wells; blue
frame: domain); and wetting-phase viscosity, µ (cP) (bars: histogram; dashed green line: smoothed
distribution). Bottom row: non-wetting-phase viscosity, µ (cP) (bars: histogram; dashed orange line:
smoothed distribution).
Water 2025, 17, 3351 9 of 20
Table 4. Statistical summary of petrophysical properties and fluid parameters. For each parameter,
N is the number of samples, µ is the mean, σ is the standard deviation, and Min/Max denote the
range of values.
Parameter Unit N µ σ Min/Max
Kx mD 2.89× 106 3.86× 102 1.16× 103 5.08× 10
−3/3.98× 104
ϕ - 2.89× 106 1.92× 10−1 8.60× 10−2 1.00× 10−1/5.00× 10−1
Sw,init - 2.89× 10
6 1.97× 10−1 5.81× 10−2 1.00× 10−1/3.00× 10−1
µ cP 8.75× 102 3.48× 10−1 8.61× 10−2 2.00× 10−1w /5.00× 10
−1
µ 2nw cP 8.75× 10 2.97 5.62× 10−1 2.00/4.00
2.3.2. Pure Data-Driven Approach
The model processes inputs X via:
( )
Z0 = InputProj X⊕F
−1(W⊙F (G)) ,
where F and F−1 denote Fourier and inverse Fourier transforms, W are learnable spectral
weights, and G represents learned Fourier features. Subsequent layers combine spectral,
spatial, and physics-guided paths:
( )
2
∑ (l)Zl+1 = GELU wk Pk(Zl) + Zl , l ∈ {1, ..., L},
k=1
where P1, and P2 correspond to spectral convolution and spatial depthwise convolu-
tion, respectively.
2.3.3. Dual Output Approach
In this physics-informed multi-task framework, the model jointly predicts transient
pressure (P) and saturation (S) fields governed by two-phase flow dynamics.
( ) ( )
K, wells(q), Si, Pi, µw, µnw, ϕ, t 7−→ Ŝ(x, t), P̂(x, t) .
The composite loss function is designed to enforce both data fidelity and the underly-
ing physical constraints through a dual-task formulation. It is defined as
( )
L = ∥Ŝ− S 2true∥ + ∥P̂− P ∥
2 + R2 2true α S +RP ,
︸ ︷︷ ︸
︸ ︷︷ ︸
Data terms
PDE residuals
where α is a weighting hyperparameter. The PDE residuals are expressed as
( )
∂Ŝ
RS = +∇ · f (Ŝ) v̂ ,
∂t
( )
RP = ∇ · λt(Ŝ)K∇P̂ − q.
λt(Ŝ)K
v̂ = − ∇P̂,
ϕ
In this formulation, pressure gradients drive the propagation of the saturation front
via the nonlinear interaction expressed by the product f (S)λt(S), while porosity ϕ modu-
lates the effective velocity in the aquifer. This dual-output approach, therefore, not only
minimizes discrepancies between the predicted and true fields but also rigorously enforces
the physical laws governing two-phase flow in porous media.
Water 2025, 17, 3351 10 of 20
For this coupling, handling the temporal part becomes a necessity. The temporal part
of the PDE was tackled by incorporating a temporal representation into the input features
with the goal of capturing both short-term variations and long-term trends. In our work,
we consider three complementary approaches:
Temporal Sinusoidal Encoding
Temporal sinusoidal encoding relies on fixed periodic functions to map a scalar time t
into a high-dimensional vector. At time step t, the encoding is defined as
[ ( ) ( ) ( ) ( )]
t t t t
Pt = sin , cos , . . . , sin , cos ,
λ1 λ1 λd/2 λd/2
where λ = 104i/di and d is the embedding dimension. This encoding is then concate-
nated with the static input features xt (e.g., permeability, well locations) and the previous
saturation St−1 to form
zt = [St−1, xt, Pt].
This approach, introduced in the Transformer architecture [18], remains a popular and
effective means to provide temporal context.
Time2Vec
Time2Vec is a learnable temporal encoding method that transforms the scalar time t
into a vector composed of one linear component and several periodic components. The en-
coding is given by
T(t) = [a0 t + b0, sin(a1 t + b1), . . . , sin(ad−1 t + bd−1)],
where ai and bi are learnable parameters. By learning these parameters from data, Time2Vec
is capable of capturing complex, nonlinear temporal patterns. This method has demon-
strated advantages over fixed encodings in various time-series applications [19].
Transformer-Based Temporal Encoding
Transformer-based temporal encoding leverages self-attention mechanisms to gen-
erate context-aware representations of time. Dedicated time-series transformer architec-
tures—such as the Temporal Fusion Transformer (TFT) [20] and the Time-series-specific
Transformer family [21–23]—often incorporate gating mechanisms and variable selec-
tion networks to enhance stability and interpretability. In this approach, the scalar-time
input is first projected into a high-dimensional space and then processed as a sequence
via a Transformer encoder. The resulting outputs are aggregated (e.g., by mean pool-
ing) to yield a dynamic embedding that effectively captures both local and global
temporal dependencies.
The model predicts both saturation (Ŝ) and pressure (P̂) fields via separate decoding
branches. Recognizing that the pressure signal is typically lower in magnitude, we apply a
learnable amplification factor γP (initialized at 2.0) to the pressure output:
P̂amp = γP P̂.
These outputs are then fused using a cross-gated mechanism that implements bidirec-
tional modulation: the pressure branch provides context for refining saturation, and the
saturation branch provides context for refining pressure. Concretely, we first construct
gating maps by passing each branch through a shallow convolutional block followed by a
Water 2025, 17, 3351 11 of 20
sigmoid nonlinearity, so that the gates lie in [0, 1] and can act as spatially and temporally
varying masks. The fused outputs are computed as
( )
Ŝfused = Ŝ + α σ C1(P̂amp) ⊙ P̂amp,
( )
P̂fused = P̂amp + β σ C2(Ŝ) ⊙ Ŝ,
where α and β are learnable scalar (or channel-wise) parameters that control the strength
of cross-branch contributions, σ(·) denotes the sigmoid function, and ⊙ is element-wise
multiplication. In this formulation, σ(C1(P̂amp)) acts as a data-dependent gate that se-
lectively injects pressure information into the saturation field, while σ(C2(Ŝ)) plays the
analogous role in the pressure branch. This cross-gated fusion allows the network to dy-
namically downweight noisy or less informative regions and to emphasize locations where
pressure and saturation must remain tightly coupled (e.g., near displacement fronts and
high-permeability channels). As a result, the model can better balance the relative contribu-
tions of pressure and saturation, leading to final predictions that are more consistent with
the underlying two-phase flow physics and less prone to spurious decoupling between the
two fields. final predictions adhere to the underlying physics of two-phase flow. Figure 4
shows a dual-branch, time-aware FNO architecture that splits shared spectral features into
separate saturation and pressure paths.
Figure 4. Schematic of the proposed multi-task, time-aware FNO architecture.
2.4. Training and Optimization
All models were implemented in PyTorch and trained on a single NVIDIA GPU.
Inputs were standardized per channel using statistics computed once on the training set;
saturation and pressure targets were standardized separately. Losses were computed in
standardized space, and predictions were de-standardized for reporting. The network uses
per-channel normalization in the spectral trunk and BatchNorm in the convolutional de-
coder; dropout is applied only in the pressure decoder (p = 0.2). A learned 64-dimensional
temporal/positional embedding of the scalar time/step is concatenated to the static inputs
and broadcast over the spatial grid. No geometric augmentation was used; to emphasize
early transients we oversampled early steps (t ≤ 10) by a factor of 2.
Water 2025, 17, 3351 12 of 20
Optimization used AdamW (initial learning rate 5× 10−3, weight decay 10−2). A
cosine-annealing schedule with warm restarts (CosineAnnealingWarmRestarts; T0 = 100,
Tmult = 2, ηmin = 10
−8) was stepped per optimizer update and not reset across curriculum
phases. Mini-batch size was 64, with global-norm gradient clipping at 2. The data term
was Smooth-L1 (Huber) with β = 0.01, applied to saturation and pressure with equal
weights. A physics-residual term (conservation and Darcy consistency) was weighted by
λphys(e) = exp(−e/25), where e is the global epoch index; a total-variation penalty on the
pressure head was available but set to zero in the main runs.
Training followed a staged curriculum over the simulation horizon. In the first
phase, the network was trained only on the earliest portion of each trajectory (up to
time index 25), and in subsequent phases the maximum time index was progressively
increased to 100, 500, and finally the full trajectory (∞), using 50, 50, 50, and 500 epochs
per phase, respectively. This schedule forces the model to first capture early-time
transients and sharp displacement fronts before fitting the later-time, quasi-steady-state
behavior over the entire horizon. A concise summary of the training and configuration
settings is provided in Table 5.
Table 5. Training and configuration settings.
Block Item Setting
Data Normalization z-score per channel; separate S, P
Model Temporal/Coord emb. 64-D time/step; grid (x, y, (z))
FNO Width/Layers W = 64, L = 8
FNO Activation/Residual GELU; skips
FNO Paths Spectral conv + depthwise conv
FNO Fourier modes (curric.) (2, 2, 2)→ (6, 6, 3)→ (18, 18, 8)
Decoders S/P S: iFFT→MLP; P: Conv, dropout p = 0.2
Fusion Gating/Amp. Cross-gated conv + σ; γP = 2.0
Optim. Batch/Optimizer 64; AdamW (lr = 5× 10−3, wd 10−2)
Optim. LR schedule Cosine restarts (T0 = 100, Tmult = 2, ηmin = 10
−8)
Optim. Grad clip Global norm = 2
Losses Data/Physics Huber (β = 0.01) on S, P; λphys(e) = exp(−e/25)
Protocol Sampling/Split Early t ≤ 10× 2; 80/20 train/val
Protocol Curriculum/CKPT tmax ∈ {25, 100, 500, ∞}; epochs {50, 50, 50, 500}; best val
3. Results
3.1. Numerical Validation
For numerical validation, we examine both single-phase and multi-phase flow sce-
narios. In the single-phase case, the formation is modeled as a homogeneous 64× 64× 1
grid with unit porosity and viscosities, and irreducible saturations (sw, snw) set to zero,
allowing full pore space utilization. Injection occurs at one corner and production at the
opposite. Unlike Aarnes et al. [24], who normalize the injection rate to one pore volume
of water, our non-normalized simulation reproduces the expected physics, as shown in
Figure 5 (A vs. B). For the multi-phase case, we use SPE10 Model 2 with an incompress-
ible two-phase system (µw = 0.3 cP, µnw = 3.0 cP, swc = snwr = 0.2). Figure 5 presents
breakthrough saturation distributions for the top (A1 vs. B1) and bottom layers (A2
vs. B2). Our model matches the reference, with minor discrepancies attributable to
numerical smoothing differences.
Water 2025, 17, 3351 13 of 20
Figure 5. Single- and multiphase saturation fields used for model validation. The saturation colormap
spans from 0 (blue) to 1 (red). Panels (A) and (B) compare single-phase simulations in a homogeneous
domain for our model and the reference solution from [24], respectively. Panels (A1,B1) show the top
layer of SPE10 Model 2 at breakthrough for our model and the reference, while panels (A2,B2) show
the corresponding bottom-layer saturation fields, demonstrating that our model reproduces the main
displacement fronts and flow patterns.
3.2. 2D FNO Results
We compare (i) a purely data-driven FNO baseline and (ii) a DB-AFNO with different
temporal encodings (Positional Encoding, Time2Vec, TFT/TST). Numerical simulations
serve as ground truth. We evaluate absolute error fields and spatio-temporal accuracy over
time, including blind tests on the top and bottom layers of SPE10 (Figure 6).
Figure 6. Distribution of ℓ∞ errors, ∥Ŝ− S∥∞, across normalized time t
∗ for early (t∗ < 0.25) and late
(t∗ ≥ 0.25) phases.
As shown in Figure 6, DB-AFNO with Time2Vec and TFT/TST yields lower early-time
errors (t∗ < 0.25), with more samples in the 10−2–10−1 range. For later times (t∗ ≥ 0.25),
Time2Vec and TFT/TST remain close; Positional Encoding improves over the purely data-
driven FNO but trails the former two. In Figure 7, each column corresponds to a different
injection time, and each row compares the ground-truth simulator, the conventional FNO
baseline, and DB-AFNO. The baseline FNO visibly diffuses the displacement front, particu-
larly along high-permeability channels, and in several time slices the advancing water front
Water 2025, 17, 3351 14 of 20
becomes disconnected or breaks into isolated patches. In contrast, DB-AFNO preserves the
continuity of the main displacement front and better reproduces the fingering patterns and
channelized advance observed in the simulator. This is most evident near breakthrough,
where DB-AFNO aligns both the front position and the saturation contrast across the
front, whereas the baseline front arrives too early and exhibits a smeared transition zone.
These qualitative differences are consistent with the error distributions in Figure 6, where
DB-AFNO has a higher proportion of predictions in the 10−2–10−1 error band at both early
and late times.
Figure 7. Saturation fields at multiple injection times. (Top): ground truth. (Middle): conventional
FNO. (Bottom): DB-AFNO. Red dashed boxes and circles in the middle row highlight oversmoothed
regions and spurious saturation artifacts produced by the conventional FNO, which are not present
in the DB-AFNO predictions. Per-panel error metrics corroborate DB-AFNO accuracy.
3.3. Zero-Shot Super Resolution
DB-AFNO is trained once—as an upscaler—on a single coarse/fine pair from the upper
five layers of SPE10: input lattice 30× 110× 5, target lattice 60× 220× 5. After convergence,
weights are frozen. At deployment, the coarse simulator supplies (Scoarse, Pcoarse), which
the operator upsamples
∣
( )
G : (Scoarse, Pcoarse
∣
θ , K, ϕ, µw, µnw, q, t) 7−→ Ŝ, P̂ ∣ .
Ω60×220×5
The FNO backbone includes a Cartesian coordinate-grid embedding (normalized (x, y, z)
appended per voxel) for absolute spatial context on any mesh. Training uses an incremental
multi-resolution curriculum: start with 2× 2× 2 retained Fourier modes on a 2× subsampled
grid, then raise spectral/spatial resolutions [(2, 2, 2)→(6, 6, 3)→(18, 18, 8)] once validation-
loss gradients plateau. In this zero-shot setup the full model (including gating and temporal
channels) converges in 100 epochs, with ≈1.5 GPU-min per case on a single RTX Titan.
With >90% of voxels below a 5% absolute-error threshold. Figure 8 shows 3-D render-
ings of Ŝ and |S− Ŝ|; fronts, channel connectivity, and high-saturation ridges are preserved
at 8× volumetric up-resolution. Figure 9 provides layer-wise scatter plots at the final time
step; similar behavior holds at other times. Results remain sensitive to the coarse grid:
using 15× 55× 5 did not reproduce the same accuracy.
Water 2025, 17, 3351 15 of 20
Figure 8. Zero-shot super-resolution on SPE10: saturation Ŝ and absolute error |S− Ŝ| at four times.
Total coarse→fine pipeline: 4.5 min.
Figure 9. Layer-wise comparison for L = 1:5 at 60 × 220 resolution. Scatter plots report R2, RMSE,
and MAE; red dashed line denotes perfect agreement.
Water 2025, 17, 3351 16 of 20
4. Discussion
Across the 2-D benchmarks, DB-AFNO consistently outperforms a purely data-driven
FNO. Error distributions in ℓ∞ norm show that Time2Vec- and TFT/TST-encoded variants
concentrate early-time errors (t∗ < 0.25) in the 10−2–10−1 band, whereas Positional Encod-
ing and the conventional FNO exhibit heavier tails (Figure 6). Near pseudo–steady state
(t∗ ≥ 0.25), Time2Vec and TFT/TST remain closely matched and continue to dominate
Positional Encoding. Visual comparisons corroborate these statistics: the baseline FNO pro-
duces over-smoothed fields with disconnected saturation artifacts, while DB-AFNO retains
sharp, physically plausible fronts and spatial connectivity (Figure 7). These gains arise from
(i) the explicit coupling of pressure and saturation through dual branches, (ii) temporal
conditioning that disambiguates early transients, and (iii) physics-informed losses that
penalize mass-balance and Darcy residuals.
For zero-shot super-resolution, a train-low, infer-high regimen enables coarse-to-fine
mapping without retraining. A single coarse/fine pair 30× 110× 5→ 60× 220× 5 is suffi-
cient to learn mesh-agnostic upscaling, aided by a Cartesian coordinate-grid embedding
for absolute spatial context and an incremental multi-resolution curriculum that increases
retained Fourier modes and spatial resolution only after validation plateaus. This curricu-
lum halves wall-clock training and reduces GPU memory by ∼40% relative to single-shot
high-resolution training, while the deployed pipeline (coarse TPFA + one forward pass)
achieves ∼3× speedup over full-order fine-grid simulation (Table 6). On five unseen layers,
we obtain R2 = 0.90, RMSE = 4.4× 10−2, and MAE = 3.2× 10−2, with >90% of voxels
below 5% absolute error. Three-dimensional renderings confirm preservation of water-
flood fronts, channel continuity, and high-saturation ridges at 8× volumetric up-resolution
(Figure 8), and layer-wise scatter plots indicate tight agreement at the final time step with
similar behavior at other times (Figure 9).
Table 6. Runtime benchmark for zero-shot upscaling vs. full-order TPFA on five unseen layers.
Step Wall-Time Comment
Full-order TPFA on 60 × 220 × 5 ≈15 min Ground truth
Coarse TPFA on 30 × 110 × 5 3 min Generates inputs
DB-AFNO zero-shot upscale 1.5 min Single forward pass
Total 4.5 min ∼3 × faster
Note: All runs on a single RTX Titan; identical simulator settings except grid resolution.
A key technical limitation of the zero-shot path is its dependence on the fidelity of
the coarse solution: the mapping presumes that the coarse grid resolves the dominant
transport features. When the input was degraded to 15× 55× 5, the same accuracy was
not recovered, indicating a lower bound on coarse resolution for reliable upscale inference.
Minor discrepancies in the validation figures are consistent with numerical smoothing
choices rather than architectural deficiencies.
Overall, the results indicate that (i) enforcing coupled physics with dual-branch decoding
and temporal encodings improves early-time accuracy and suppresses over-smoothing; (ii) the
operator generalizes to unseen layers within SPE10 when conditioned on physically meaningful
inputs; and (iii) zero-shot super-resolution provides accurate fine-grid fields at a fraction of the
computational cost, provided the coarse simulation captures the essential flow physics.
The DB-AFNO surrogates presented here are trained entirely on incompressible, im-
miscible two-phase displacements, and thus are most reliable for formations and operating
conditions with similar heterogeneity statistics, viscosity ratios, and flow regimes; all results
in this study pertain to such systems with no explicit gas phase or compositional effects.
When the underlying physics changes significantly—e.g., strong capillary hysteresis, gas
Water 2025, 17, 3351 17 of 20
evolution or gas caps, or fracture-dominated flow—retraining or transfer learning on sim-
ulations from the new regime would be required. While the dual-branch architecture is,
in principle, extensible to three-phase or compositional flow (e.g., via an additional output
branch and corresponding physics residuals), this lies beyond the scope of the present work
and is left for future investigation.
5. Conclusions
1. The proposed DB-AFNO framework achieves a significant reduction in computa-
tional cost compared to traditional full-order solvers while preserving high-resolution
prediction accuracy, as demonstrated on the SPE10 benchmark.
2. The dual-branch architecture effectively couples pressure and saturation fields, ensur-
ing that the predicted outputs remain physically consistent and adhere closely to the
governing PDE constraints.
3. The incorporation of advanced temporal encoding methods (e.g., Time2Vec, TFT/TST)
enhances the model’s ability to capture dynamic transient behaviors, resulting in
lower validation errors during both early transient and pseudo-steady states.
4. By integrating physics-informed loss functions, the model not only minimizes data-
driven prediction errors but also enforces adherence to the underlying flow physics,
thereby improving overall reliability.
5. The overall architecture demonstrates scalability to higher-resolution simulations,
given that it is trained on similar lower-fidelity cases.
6. Despite its high-fidelity performance within the training distribution, DB-AFNO-
based approaches may fall out-of-distribution when the underlying physics deviates
from those encountered during training.
7. The scalability of DB-AFNO was found to be similar to that of FNO-based models.
It is limited in scenarios involving sudden or drastic changes in governing physical
processes, indicating the need for adaptive re-training or architectural modifications
when the physics evolve beyond the original training regime.
8. Zero-shot super-resolution cuts the end-to-end turnaround for a 60× 220× 5 grid
from ∼15 min (full TPFA) to 4.5 min by upscaling a coarse 30× 110× 5 run with
a single forward pass, achieving R2 = 0.90 and RMSE = 4.4 × 10−2 without
additional fine-grid training.
Author Contributions: Conceptualization, H.A.H. and J.W.; methodology, H.A.H.; software, H.A.H.
and O.E.; validation, H.A.H., O.E. and J.W.; formal analysis, H.A.H.; investigation, H.A.H. and
O.E.; resources, H.A.H. and O.E.; data curation, O.E.; writing—original draft preparation, H.A.H.;
writing—review and editing, H.A.H., O.E. and J.W.; visualization, H.A.H. and O.E.; supervision,
J.W.; project administration, J.W.; funding acquisition, J.W. All authors have read and agreed to the
published version of the manuscript.
Funding: The authors did not receive any specific grant from funding agencies in the public, com-
mercial, or not-for-profit sectors for the research, authorship, or publication of this work.
Data Availability Statement: The data used in this study are openly available in the SPE Comparative
Solution Project (Dataset 2, Model 2) at https://www.spe.org/web/csp/datasets/set02.htm (accessed
on 14 November 2025).
Conflicts of Interest: The authors declare no conflicts of interest.
Water 2025, 17, 3351 18 of 20
Abbreviations
2-D Two-dimensional
3-D Three-dimensional
AdamW Adam optimizer with decoupled weight decay
AFNO Adaptive Fourier Neural Operator
DB-AFNO Dual-Branch Adaptive Fourier Neural Operator
FNO Fourier Neural Operator
PINNs Physics-Informed Neural Networks
Time2Vec Learnable temporal encoding method
TFT Temporal Fusion Transformer
TST Time-series Transformer (family)
GCR Generalized Conjugate Residual
BiCGStab BiConjugate Gradient Stabilized
JF-BiCGStab Jacobian-Free BiCGStab
LU Lower–Upper factorization
ILU Incomplete LU
TPFA Two-Point Flux Approximation
CFL Courant–Friedrichs–Lewy condition
TV Total Variation
PDE Partial Differential Equation
PV Pore Volume
SPE10 10th SPE Comparative Solution Project
RMSE Root-Mean-Square Error
MAE Mean Absolute Error
R2 Coefficient of determination
GPU Graphics Processing Unit
CPU Central Processing Unit
CUDA Compute Unified Device Architecture
PyTorch Python-based deep learning framework
GELU Gaussian Error Linear Unit
cP Centipoise
mD Millidarcy
RL Reinforcement Learning
Appendix A. Algorithms and Pseudocode
Algorithm A1 Generalized Conjugate Residual (GCR) [16]
Require: A ∈ Rn×n, b, initial x0, tolerance ε
1: r0 ← b− Ax0, k← 0
2: while ∥rk∥ > ε do
3: p̃← rk
4: for i = 0 to k− 1 do
(Ap̃, Api)
5: βik ← − , p̃← p̃ + β p
(Api, Ap
ik i
i)
6: pk ← p̃/∥Ap̃∥
(rk, Apk)
7: αk ←
(Apk, Apk)
8: xk+1 ← xk + αk pk, rk+1 ← rk − αk Apk
9: k← k + 1
10: return xk
Water 2025, 17, 3351 19 of 20
Algorithm A2 Newton–Krylov with GCR linear step [25]
Require: Nonlinear residual F(x), Jacobian JF(x), initial x0, tolerance tol f , max iters kmax
1: k← 0, x ← x0
2: repeat
3: f ← F(x)
4: J ← JF(x)
5: Solve J ∆x = − f with GCR⇒ ∆x
6: x ← x + ∆x, k← k + 1
7: until ∥ f ∥∞ ≤ tol f or k ≥ kmax
8: return x
Algorithm A3 Adaptive trapezoidal time integration [26]
Require: ẋ = f (x, p, u), x(ts) = x0, ts < t f , initial ∆t, tolerances
1: t← ts, x ← x0
2: while t < t f do
3: Predict xpred ← x + ∆t f (x, p, u)
( )
4: Correct solve for xnew s.t. xnew − x− ∆t2 f (x) + f (x
new) = 0 (Newton)
5: Estimate error e← ∥xnew − xpred∥
6: if e small then
7: t← t + ∆t, x ← xnew
8: Update ∆t using controller (e.g., PI or exponential rule); enforce bounds
9: return trajectory {x(t)}
Algorithm A4 Jacobian-free BiCGStab [25,27]
Require: Residual F(x), current x, right-hand side b, finite-diff. step ε, tolerance tol
( )
1: function MATVEC(v) return F(x + εv)− F(x) /ε
2: δ← 0, r ← b− MATVEC(δ), r̂ ← r
3: ρ← 1, α← 1, ω ← 1, v← 0, p← 0
4: for k = 0 to kmax − 1 do
5: ρ ⊤new ← r̂ r
6: if |ρnew| < 10
−14 then break
7: if k = 0 then
8: p← r
9: else
10: β← (ρnew/ρ)(α/ω); p← r + β(p−ωv)
11: v← MATVEC(p); α← ρnew/(r̂
⊤v)
12: s← r− αv
13: if ∥s∥ < tol then
14: δ← δ + αp; return δ
15: t← MATVEC(s); ω ← (t⊤s)/(t⊤t)
16: δ← δ + αp + ωs; r ← s−ωt
17: if ∥r∥ < tol then return δ
18: ρ← ρnew
19: return δ
References
1. Sejnowski, T.J. The unreasonable effectiveness of deep learning in artificial intelligence. Proc. Natl. Acad. Sci. USA 2020, 117,
30033–30038. [CrossRef]
2. Raissi, M.; Perdikaris, P.; Karniadakis, G.E. Physics-informed neural networks: A deep learning framework for solving forward
and inverse problems involving nonlinear partial differential equations. J. Comput. Phys. 2019, 378, 686–707. [CrossRef]
3. Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys.
2021, 3, 422–440. [CrossRef]
Water 2025, 17, 3351 20 of 20
4. Lu, L.; Meng, X.; Yang, L.; Karniadakis, G.E. DeepXDE: A deep learning library for solving differential equations. SIAM Rev.
2021, 63, 208–228. [CrossRef]
5. Li, Z.; Kovachki, N.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.; Anandkumar, A. Fourier neural operator for
parametric partial differential equations. arXiv 2020, arXiv:2010.08895. [CrossRef]
6. Kovachki, N.; Li, Z.; Liu, B.; Azizzadenesheli, K.; Bhattacharya, K.; Stuart, A.; Anandkumar, A. Neural operator: Learning maps
between function spaces. J. Mach. Learn. Res. 2021, 24, 4061–4157.
7. Li, Z.; Zheng, H.; Kovachki, N.; Jin, D.; Chen, H.; Liu, B.; Azizzadenesheli, K.; Anandkumar, A. Physics-informed neural operator
for learning partial differential equations. arXiv 2021, arXiv:2111.03794. [CrossRef]
8. Patel, R.G.; Trask, N.A.; Wood, M.A.; Cyr, E.C. A physics-informed operator regression framework for extracting data-driven
continuum models. Comput. Methods Appl. Mech. Eng. 2021, 373, 113500. [CrossRef]
9. NVIDIA Modulus Team. Darcy Flow with Fourier Neural Operator. 2024. Available online: https://docs.nvidia.com/
physicsnemo/25.08/physicsnemo-sym/user_guide/neural_operators/darcy_fno.html (accessed on 18 July 2024).
10. Wen, G.; Li, Z.; Azizzadenesheli, K.; Anandkumar, A.; Benson, S.M. U-FNO—An enhanced Fourier neural operator-based
deep-learning model for multiphase flow. Adv. Water Resour. 2022, 163, 104180. [CrossRef]
11. Cao, L.; O’Leary-Roseberry, T.; Jha, P.K.; Oden, J.T.; Ghattas, O. Residual-based error correction for neural operator accelerated
infinite-dimensional Bayesian inverse problems. J. Comput. Phys. 2023, 486, 112104. [CrossRef]
12. Ma, X.; Zhong, R.; Zhan, J.; Zhou, D. Enhancing subsurface multiphase flow simulation with Fourier neural operator. Heliyon
2024, 10, e38103. [CrossRef] [PubMed]
13. Jain, N.; Roy, S.; Kodamana, H.; Nair, P. Scaling the predictions of multiphase flow through porous media using operator learning.
Chem. Eng. J. 2025, 503, 157671. [CrossRef]
14. Society of Petroleum Engineers. 10th SPE Comparative Solution Project, Model 2. 2024. Available online: https://www.spe.org/
web/csp/datasets/set02.htm (accessed on 18 July 2024).
15. Buckley, S.E.; Leverett, M.C. Mechanism of fluid displacement in sands. Trans. Aime 1941, 146, 107–116. [CrossRef]
16. Demmel, J.W. Applied Numerical Linear Algebra; SIAM: Philadelphia, PA, USA, 1997. [CrossRef]
17. Golub, G.H.; Van Loan, C.F. Matrix Computations; Johns Hopkins Studies in the Mathematical Sciences; Johns Hopkins University
Press: Baltimore, MD, USA, 2013.
18. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In
Proceedings of the Advances in Neural Information Processing Systems 30 (NeurIPS), Long Beach, CA, USA, 4–9 December 2017.
[CrossRef]
19. Kazemi, S.M. Time2Vec: Learning a vector representation of time. arXiv 2019, arXiv:1907.05321. [CrossRef]
20. Lim, B.; Zohdy, M.; Phan, H.; Lee, S.J. Temporal fusion transformers for interpretable multi-horizon time series forecasting. Int. J.
Forecast. 2019, 36, 628–645. [CrossRef]
21. Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Liu, R.; Shen, J.; Pan, G. Informer: Beyond efficient transformer for long sequence
time-series forecasting. arXiv 2021, arXiv:2012.07436. [CrossRef]
22. Xu, H.; Wu, W.; Jiang, P.; Ding, Y.; Lu, Z.; Li, X. Anomaly Transformer: Time Series Anomaly Detection with Association
Discrepancy. In Proceedings of the Tenth International Conference on Learning Representations (ICLR 2022), Virtual, 25–29 April
2022. [CrossRef]
23. Nie, Y.; Chi, X.; Xue, Y.; Liu, Y.; Zhang, T.; Pei, Y.; Wang, C.; Li, Y. PatchTST: Training back-bone is all you need for time-series
forecasting. arXiv 2022, arXiv.2211:14730. [CrossRef]
24. Aarnes, J.E.; Gimse, T.; Lie, K.A. An Introduction to the Numerics of Flow in Porous Media Using MATLAB; Springer: Berlin, Germany,
2007. [CrossRef]
25. Knoll, D.A.; Keyes, D.E. Jacobian-free Newton–Krylov methods: A survey of approaches and applications. J. Comput. Phys. 2004,
193, 357–397. [CrossRef]
26. Hairer, E.; Wanner, G. Solving Ordinary Differential Equations II: Stiff and Differential-Algebraic Problems; Springer Series in
Computational Mathematics; Springer: Berlin, Germany, 1996. [CrossRef]
27. Kelley, C.T. Solving Nonlinear Equations with Newton’s Method; SIAM: Philadelphia, PA, USA, 2003. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.