Systematically differentiating parametric discontinuities

Emerging research in computer graphics, inverse problems, and machine learning requires us to differentiate and optimize parametric discontinuities. These discontinuities appear in object boundaries, occlusion, contact, and sudden change over time. In many domains, such as rendering and physics simulation, we differentiate the parameters of models that are expressed as integrals over discontinuous functions. Ignoring the discontinuities during differentiation often has a significant impact on the optimization process. Previous approaches either apply specialized hand-derived solutions, smooth out the discontinuities, or rely on incorrect automatic differentiation. We propose a systematic approach to differentiating integrals with discontinuous integrands, by developing a new differentiable programming language. We introduce integration as a language primitive and account for the Dirac delta contribution from differentiating parametric discontinuities in the integrand. We formally define the language semantics and prove the correctness and closure under the differentiation, allowing the generation of gradients and higher-order derivatives. We also build a system, Teg, implementing these semantics. Our approach is widely applicable to a variety of tasks, including image stylization, fitting shader parameters, trajectory optimization, and optimizing physical designs.

. We propose a language for the automatic differentiation of integrals with discontinuities. Existing auto-diff frameworks require integrals to be discretized into summations prior to differentiation, and therefore lose the derivative contribution from discontinuities. Our method produces a statistically consistent derivative program by introducing integration as a language primitive, which allows us to differentiate discontinuities in continuous space, before discretizing them into summations over discrete samples.
Emerging research in computer graphics, inverse problems, and machine learning requires us to differentiate and optimize parametric discontinuities. These discontinuities appear in object boundaries, occlusion, contact, and sudden change over time. In many domains, such as rendering and physics simulation, we differentiate the parameters of models that are expressed as integrals over discontinuous functions. Ignoring the discontinuities during differentiation often has a significant impact on the optimization process. Previous approaches either apply specialized hand-derived solutions, smooth out the discontinuities, or rely on incorrect automatic differentiation. We propose a systematic approach to differentiating integrals with discontinuous integrands, by developing a new differentiable programming

INTRODUCTION
Automatic differentiation, now indispensable for optimization and machine learning, usually treats discontinuities (e.g., if-else branches) by ignoring them. Doing so is usually correct almost everywhere (in the technical sense). However, the situation is different when computing the derivative of an integral, when discontinuities of the integrand cannot be ignored. We propose a differentiable programming language that can soundly compute derivatives of integrals with discontinuities, producing correct results almost everywhere.
Parametric discontinuities in graphics. Emerging research that combines techniques in computer graphics, inverse problems, and machine learning often requires optimizing discontinuities. We often want to compute the derivatives of integrals of discontinuous functions: where is a program that can have parametric discontinuities. Parametric discontinuities are branching expressions containing free parameters (e.g., in foreach x: (( x < t ) ? 1 : 0)). The derivative contribution of a parametric discontinuity is a Dirac delta function and usually integrates to a non-zero value. In graphics, these parametric discontinuities arise in rigid-body simulations that exhibit collision and contact phenomena; rendered shapes form silhouettes and shadows; geometry has corners and creases. Both integration and differentiation play critical roles in simulating, rendering, and optimizing these models.
For instance, the variational principle of least action defines physical trajectories as minima of an action integral; which may be applied to synthesizing animations and optimizing robot controllers [Stengel 1994;; rendering is defined as a multidimensional integral, and hence inverse rendering involves differentiating that integral [Li et al. 2018a]; and shape optimization uses derivatives of integrated geometric quantities [Hafner et al. 2019]. Combining deep learning models with these problems is only possible if we can compute derivatives through the discontinuities.
Prior work has recognized that ignoring the discontinuities during differentiation can have a significant impact on the optimization. Oftentimes, domain-specific solutions are employed to manually derive the correct derivatives (e.g., [Dyer and McReynolds 1968;Li et al. 2018a]). When adapting to new problems, new derivations are required. Our goal is to instead systematize the domain-specific methods, and develop the foundation of a programming language.
Using our approach and language, we study applications spanning several fields in graphics. We write a 2D differentiable renderer for triangulating images, fit a procedural shader's parameters to a target image, solve frictional contact problems without smoothing, and optimize physical designs with discontinuous properties.
Minimal example. Ignoring discontinuities during differentiation is problematic. Consider the example from Fig. 1 , where [ < ] = 1 if < and 0 otherwise. This integral might represent the expected frequency of a -biased coin flip coming up heads; or the fraction of a pixel covered by a triangle with position ; or the fraction of a time-interval during which a motor is engaged, when it is turned off at time . If we wish to optimize this parameter, then we need to take some such derivative. It is common-place to first discretize integrals and only after that to differentiate them using automatic differentiation. However, existing auto-diff systems define [ < ] = 0. This is unfortunately incorrect in the presence of integration. The expected frequency of a -biased coin coming up heads does change with the bias . This is because the derivative of the step function [ < ] is a Dirac delta ( − ), which integrates to a non-zero value when both sides of the branch are visited in the integration domain. In general, the discretization of the differential of an integral is different than the differential of the discretization of an integral (Fig. 1). Ignoring the Dirac delta often leads to suboptimal results -in this example, the optimization will never move from the initial guess.
A systematic solution. We explore a systematic method for optimizing parametric discontinuities, by specifying the semantics of a new differentiable programming language that is provably correct and can generate higher-order derivatives. Our key idea is to differentiate then discretize in our language. By making integration a language primitive and accounting for the Dirac deltas introduced by differentiating parametric discontinuities, our language computes the correct solution: Analytically eliminating Dirac deltas via an enclosing integralthe second step above -is difficult to systematically guarantee for arbitrary expressions inside Dirac deltas. We guarantee the correct treatment of Dirac deltas under suitable conditions. We require that the expression constituting each of the parametric discontinuities be represented only using differentiable, invertible functions (i.e., diffeomorphisms). We allow such functions to be explicitly defined by the programmer, or automatically inferred for certain common expression classes such as affine expressions.
In this paper, we prioritize the development of a clean core calculus that captures the interactions among derivatives, integrals, and discontinuities. Our language only supports static loop variables (it is not Turing complete). The symbolic transformation used to eliminate the Dirac deltas are global and may lead to an asymptotic increase in expression size, scaling with the number of discontinuities. We provide a proof of correctness for the language and show that it is closed under the derivative operation, allowing for immediate application of the technique to higher-order derivatives. Finally, we build a system, Teg, that implements the semantics. 1 In summary, our contributions are: • We propose a systematic approach to differentiate integrals with discontinuous integrands, by including integration as a language primitive. We define the formal semantics, and prove its correctness and closure under differentiation. • We implement these semantics in a differentiable programming language, Teg, which supports both forward and reversemode automatic differentiation. ]. (a) Traditional automatic differentiation approaches first discretize then differentiate because they do not have an integral primitive. This leads to incorrect results because it ignores the contribution of the Dirac deltas introduced by differentiating the discontinuities. (b) In contrast, we include integral primitive in our language and account for the contribution of the Dirac deltas. Teg first differentiates then discretizes. In this case, we apply the identity • We show novel applications spanning fields in computer graphics. They include a method for fitting parametric discontinuities in inverse shader design, and a frictional contact solver that does not rely on smoothing energy.
To motivate our approach, we describe a few problems and how others have solved them in the past. We only introduce necessary background in the next section and detail additional related work in the section that follows (Sec. 3).

MOTIVATION AND BACKGROUND
We motivate the value of optimizing parametric discontinuities with examples. We describe them in our language, Teg, and show how the compiler produces correct code. Formal semantics and the proof of correctness are detailed in Sec. 4.
Notation and frontend. Teg is designed to mirror mathematical equations as closely as possible. Throughout the paper, when we show the code in our language, we use syntax highlighting and a different font to distinguish from the mathematical formulae. For example, a mathematical formula of an integral < 5] ] ]x 2 in our language. We write this code in Teg's Python frontend library as Teg(0, 1, IfElse(x < 5, 1, 0) * x * x, x).

1D Example
We now more deeply explore the 1D integral parameterized by from the introduction and Fig. 1: Many graphics problems are more complex manifestations of this form, such as anti-aliasing [Mitchell and Netravali 1988], global illumination [Kajiya 1986], integration of certain ordinary differential equations, or simulating physics using the variational principle [Hamilton 1834]. The indicator function appears at, for example, object boundaries, occlusion, or sudden change of force over time.
Our goal is to automatically compute the derivative d d . We want to use the derivative for optimization, machine learning, or describing physical quantities. Later we show more realistic applications, including differentiable rendering and physics simulation. In this 1D case, we have a closed-form solution: Discretize-then-Differentiate produces incorrect derivatives. Solutions to real-world problems often lack closed-form solutions. Computing derivatives by hand takes up significant time. Therefore, we want to rely on automatic differentiation [Griewank and Walther 2008]. A typical compiler does not have integrals as a primitive. Therefore, to use automatic differentiation, we must first manually discretize the integral into a program. For example, ( ) ≈ =0 [ / < ] = ( ). However, differentiating the program with respect to t gives the incorrect derivative of 0, contradicting our closed-form solution [0 < < 1]. This is because the only dependency of the program to is through the indicator [ / < ]. The derivative of the indicator gives rise to a Dirac delta , which describes an infinitesimal size spike at / = , but has value zero everywhere else. A delta can only be realized into a number through integration. Since there are no integral primitives in existing automatic differentiation tools, all of them ignore the delta. Fig. 2 shows a visual explanation.
The issue above is not first observed by us, and other researchers proposed alternative solutions. In particular, our approach is closely related to the recent advancement of differentiable rendering [Li et al. 2018a;Ramamoorthi et al. 2007]. Dyer and McReynolds [1968] also derived the derivatives in the optimal control context. These works recognized that we could explicitly account for the Dirac deltas by evaluating the integrals over them. We systematize these approaches and incorporate them into a programming language.
Our approach: Differentiate-then-Discretize. The key idea is to first differentiate then discretize, all inside the language. Similar to previous differentiable rendering works [Li et al. 2018a;Ramamoorthi et al. 2007], we base our compiler passes on the following properties Our language can be used for computing many applications where one needs to differentiate integrals with discontinuous integrands. In differentiable rendering (a), each pixel is a 2D anti-aliasing integral. The geometry boundaries introduce discontinuities. The figure shows an example where we fit a triangle mesh with constant color within triangles to a target image using the gradient of loss with respect to the triangle parameters. In physics simulation (b), the variational least action principle predicts motion by minimizing an action integral of the Lagrangian energy over time. The Lagrangian energy can contain discontinuities due to contact, sudden change of force over time, or other physical properties. We can formulate a trajectory optimization problem by finding the path parameters of a bouncing ball by minimizing the action integral. Unlike standard approaches that integrate over ordinary differential equations, we do not need to choose a step size, and the contact point between the ball and the floor can be anywhere. Sec. 6 shows more examples.
of indicator functions, Dirac deltas , and integrals: where = ( ) is a variable substitution, or reparametrization. Given an indicator [ ( , ) > 0] under differentiation, we first turn it into a delta ( ( , )). We then apply reparametrizations to transform the delta into the form ( ). Finally, we eliminate the delta using the last rule above. Therefore, for a Teg program ], the compiler differentiates by applying the following transformation: The first step normalizes the condition into the form > 0. The second step moves the derivative operator inside the integral. We then transform the derivative of indicator functions into a Dirac delta. Next, we apply the reparametrization = − . Finally, we eliminate the delta, leading us to the same expression to our closedform solution [0 < < 1]. Sometimes the final expression can be another integral. Teg evaluates delta-free integral expressions by discretizing them. Not all deltas ( ( ( c(x, t) ) ) ) can be easily eliminated, since they must first be reparametrized to the form ( ( ( x) ) ) so that they may then annihilate with an integral. To reparametrize (second rule in Eq. (2)), we need to find the inverse x = c −1 (u, t) in order to substitute the variables outside the delta. Automatically deriving such inversion is hard. Our compiler automatically handles an important case when c is affine. In 1D, it takes the form t 0 x + t 1 , where t 0 and t 1 can depend on other parameters that are not x. For other conditions, we provide a library of diffeomorphisms for users to apply, e.g., a polarto-Cartesian transformation. Crucially, we allow users to define custom diffeomorphisms (bijectivity can be checked with numerical tests). Custom mappings make our language flexible, while also providing a correctness guarantee by decoupling the derivative transformation from the reparametrization.
In Sec. 4, we formally describe our language, compiler transformation passes, and correctness theorem, dealing with multiple conditions, integrals. We also make sure that our language is closed under differentiation, that is, the differentiated code is still a program expressible in our language. The closure property is crucial for higher-order derivatives. Before that, let us introduce two graphics applications that involve the differentiation of integrals.

Case Study I: 2D Differentiable Rendering
Rendering computes an image from a given set of geometric shapes and their shading color. Differentiable rendering [Li et al. 2018a;Loper and Black 2014], on the other hand, computes the gradient of pixel color with respect to scene parameters for inverse rendering or machine learning. Previous approaches either apply approximation [Loper and Black 2014], manually derive the gradient [Li et al. 2018a], or use finite differences [Lawonn and Günther 2019]. Here we show how Teg automatically differentiates a 2D rasterization program, outlined in Figure 3a.
Here we study a simplified rendering model, where the geometry is represented as 2D triangles, and the color inside the triangle is constant. We want to fit the geometry and the color to a target image to produce a stylized image [Lawonn and Günther 2019]. For a pixel at ( , ) and for a triangle with color and vertices 1 , 2 , 3 , the equation representing the color at that pixel is: where, for convenience, we define the function inside as the multiplication of three edge equations: The value at each pixel is a sum of the contribution of all of the overlapping triangles. We want to compute the derivative of the pixel color integral (Program (3)) with respect to the vertices . Therefore, Teg needs to handle the deltas that arise from differentiating the inside function. Using the same techniques as in the previous subsection, Teg automatically generates the derivative, which are three 1D integrals over the three edges of the parameterized triangle. The main difference is that we now transform multiple integrals. Consider the following double integral with an affine condition: and we want to compute the derivative I 2 c . After transforming the indicator into a Dirac delta during differentiation, our compiler automatically detects the affine condition pattern a * x + b * y + c and applies a bijective and rotational reparametrization The compiler generates similar reparametrizations for higher-dimensional spaces. The resulting derivative integral is where B l x , B u x , B l y , and B u y are new integral bounds derived from the reparametrization, and a 2 + b 2 is the Jacobian determinant of the transformation. From here, the compiler reparametrizes ( ( ( x ′ + c) ) ) into ( ( ( x ′′ ) ) ) by applying ′′ = ′ + , and eliminates the delta and corresponding integral over ′′ , leaving a 1D integral. In the triangle case, the same pattern applies with , , being parameters coming from the triangle vertices.
The derivative generated by Teg can then be composed with derivatives generated by other automatic differentiation systems such as PyTorch [Paszke et al. 2019] or TensorFlow [Abadi et al. 2015]. Given a loss function ( ), we can use the chain rule = . Teg computes , while other automatic differentiation systems can be used to compute .
Existing differentiable rendering methods. Recent work has developed different techniques to differentiate the rendering computation. The derivation above is similar to Li et al. [2018a], and is closely related to the Reynolds transport theorem [Li 2019;Li et al. 2020b;Zhang et al. 2020Zhang et al. , 2019]. An alternative approach is to apply reparametrizations to remove the discontinuities [Loubet et al. 2019], which turns out to be equivalent to applying divergence theorem on the derivative line integrals [Bangaru et al. 2020]. Other methods approximate the Dirac deltas by using image filters [Loper and Black 2014], or smoothing over discontinuities [Liu et al. 2019].
Teg mechanizes the Dirac delta computation as part of automatic differentiation, which allows us to generalize the approach. For example, in Sec. 6 we show results on optimizing a color model with linear or quadratic interpolation, or even optimizing a texture generated by Perlin noise [Perlin 1985]. We can also apply it to problems outside of differentiable rendering, as we will show next. However, our compiler currently does not support other strategies for dealing with discontinuities, such as smoothing or transforming boundary integrals into area integrals using divergence theorem, nor does it currently scale to millions of objects (Sec. 7).

Case Study II: Physics-based Animation and Control
Composing derivatives and integrals to form an optimization problem is a common pattern in physics. For didactic purposes, we discuss one of the simplest possible problems: finding the trajectory of a bouncing ball (Fig. 3). We use a piecewise linear model of the trajectory, with parameters . The variational least action principle [Hamilton 1834] predicts motion by finding the stationary points the following action integral over time: where q is the 2D position of the ball (parameterized by b ), q t is its time derivative, and t is time. L is the Lagrangian of the physics system. In this case, it is composed of the potential energy −V (whose spatial derivative − V q is the force) and the kinetic energy 1 2 mq 2 t . Intuitively, we find the path parameter that minimizes a cost function resulting in a stationary action.
In the case of the bouncing ball, the energy of the system is the gravitation potential when . > 0, and the contact force pushes the ball back when . ≤ 0: For our piecewise linear model of the trajectory , we consider a sequence of control points, at varying points in time rather than at regular temporal intervals. Doing so allows our trajectory model to represent a sudden change in velocity at some arbitrary point and time, which is necessary to exactly capture our idealized inelastic collision.
This direct collocation trajectory optimization approach [Hargraves and Paris 1987] is different from standard approaches that differentiate forward simulators, usually called the shooting method. The standard shooting method requires pre-determining time step sizes for integration, leading to issues when the contact point can be anywhere. Trajectory optimization is used for designing animation or in robotics for planning [Betts 2009;Popović et al. 2000;. In computer graphics, trajectory optimization with contact is usually handled by using a smooth contact model in the first place [Hunt and Crossley 1975], or incorporate inequality constraints and then solve it by smoothing the discontinuities [Mordatch et al. 2012;Todorov 2011], or using sophisticated nonlinear programming solvers [Posa et al. 2014].
In Teg, we can define and differentiate physics problems with discontinuous energy, enabling optimization using gradients or higher-order derivatives. Notice that the derivatives appear both inside the integral (q t ) and outside (though this can be problematic theoretically, see Sec. 7.2). This physics formulation extends beyond contact modeling. We can model forces that change suddenly over time, and we can model an elastic potential energy kq 2 t , where the coefficient depends on the length of the spring. In Sec. 6, we present additional applications in more detail.

RELATED WORK
We now provide more detailed references for related applications, systems, and languages.
Rendering and differentiable rendering. Rendering, physics-based or not, involves solving anti-aliasing integrals [Mitchell and Netravali 1988]. Physics-based rendering further formulates the light transport as an integral equation [Kajiya 1986;Veach 1998]. Analytical derivatives have been derived for diffuse interreflection [Arvo 1994], shadow [Ramamoorthi et al. 2007], and spherical harmonics lighting [Wu et al. 2020]. Recently, there are strong interests in graphics, vision, and machine learning communities to build fully differentiable renderers. Some methods ignore geometry derivatives [Gkioulekas et al. 2013;Nimier-David et al. 2019], some approximate the Dirac delta contributions [de La Gorce et al. 2011;Kato et al. 2018;Loper and Black 2014], some smooth out the discontinuities [Liu et al. 2019], and some apply smooth postprocessing using the geometry buffer [Laine et al. 2020]. Meanwhile, other methods derive the correct derivatives using Dirac deltas [Li et al. 2018a], reparametrization [Loubet et al. 2019], or Reynolds transport theorem [Bangaru et al. 2020;Li 2019;Li et al. 2020b;Roger et al. 2005;Zhang et al. 2020Zhang et al. , 2019. Differentiable renderers have also been used for inverse shader designs [Guo et al. 2020;Shi et al. 2020]. Our language lays the foundation for a programmable differentiable rendering system. We have not yet included rendering-specific abstractions and data structures, which are necessary for applying Teg to large-scale rendering problems.
Variational physics, control, and shape modelling. Calculus of variations -the minimization of integrals over cost functionals -can often predict motion. A prominent example is Hamilton's least action principle [Hamilton 1834]. In graphics, robotics, and optimal control, this formulation has often been used for interpolating animation or planning [Barr et al. 1992;Betts 1998Betts , 2009Cohen 1992;Popović and Witkin 1999;. The same principle can be used in 2D or higher-dimensional space to find the surface or volume with least cost [Moreton and Séquin 1992; Welch and Witkin 1992], or finding contours in images ]. In trajectory optimization, fitting the trajectory generated by ordinary differential equations is often called the (multiple) shooting method [Geilinger et al. 2020;McNamara et al. 2004;Popović et al. 2000;Twigg and James 2008]. On the other hand, methods that fit trajectory as splines are called the direct collocation method [Hargraves and Paris 1987]. A recent line of work aims to combine neural network controllers with simulators [de Avila Belbute-Peres et al. 2018;Holl et al. 2020;Hu et al. 2020;Um et al. 2020], or directly model system dynamics using neural networks [Chen et al. 2018].
A common strategy to handle object contact is to introduce constraints and solve nonlinear programming problems [Betts 2009;Li et al. 2020a;Mordatch et al. 2012Mordatch et al. , 2013Posa et al. 2014]. It is also possible to approximate contact using impulse-based physics [Hu et al. 2020;Mirtich 1996]. We explore an alternative formulation to this problem, by allowing certain discontinuous cost functions to be used in optimization. The mathematics of such formulations has been studied [Dyer and McReynolds 1968]. We further show the connection to differentiable rendering.
Domain-specific languages for computer graphics. Several graphics systems handle integration and differentiation: CONDOR [Kass 1992] proposed an interactive programming interface for solving constrained dynamics with automatic differentiation and interval arithmetic. Aether [Anderson et al. 2017] models Monte Carlo integration, and automatically computes the Jacobian of integral reparametrization. Recent graphics systems often include differentiation operations [Devito et al. 2017;Hu et al. 2020;Jakob 2019;Li et al. 2018b;Nimier-David et al. 2019]. In contrast, we study the interaction between integration, differentiation, and discontinuities.
Automatic differentiation and probabilistic programming. It is possible to automate derivative calculation by applying the chain rule to program expressions [Griewank and Walther 2008;Wengert 1964]. Reverse-mode automatic differentiation [Linnainmaa 1970] derives gradients that can be evaluated at the same time complexity as the original program -much faster than finite differences. There is a revitalized interest in revisiting efficient auto-diff systems in deep learning [Abadi et al. 2015;Bradbury et al. 2018;Paszke et al. 2019;Yu et al. 2014] and probabilistic programming [Bingham et al. 2019;Stan Development Team 2015]. Auto-diff has also gained significant attention in programming languages, where researchers specify system semantics and prove the correctness of their systems [Elliott 2018; Lee et al. 2020;Mazza and Pagani 2021;Pearlmutter and Siskind 2008;Sherman et al. 2021]. We study the particular interaction among the derivative, integral, and discontinuities. 3 Inala et al. [2018] relax boolean expressions into real numbers through smoothing, and solve numerical problems by combining gradient descent and satisfiability solvers. We instead exploit the structure of integration.
Integration and differentiation are indispensable to Bayesian inference and probabilistic programming [Gehr et al. 2020;Kucukelbir et al. 2015;Lew et al. 2019]. Many probabilistic programming languages handle integrals, but most do not explicitly handle the differentiation of discontinuities. LFPPL [Zhou et al. 2019] studies discontinuities in the context of Hamiltonian Monte Carlo methods. Closest to our work is Lee et al.'s stochastic variational inference work [2018]. They observed when applying the reparametrization trick to non-differentiable models, existing approaches ignore the deltas. They design a language for stochastic variational inference. The language focused on infinite domain integrals with affine discontinuities. We show a more general language with correctness proof, closure, and bounded domain, and generalize the class of discontinuities through diffeomorphisms. Importantly, we show wide applicability of this class of approaches to many graphics problems.

SEMANTICS AND CORRECTNESS PROOF OF OUR LANGUAGE
We have motivated and discussed our system Teg. In this section, we formally define a simplified core language L, and prove the correctness of the compiler passes and closure under differentiation. Our formal semantics focus on forward-mode automatic differentiation. The derivation of reverse-mode is similar to forward-mode, but it requires adding let bindings (intermediate variables) to the core language. We found this to be distracting for the minimal language for our proof. Our practical implementation supports both forward-mode and reverse-mode. We focus on the scalar case in this work. Arrays and tensors are unrolled into scalars in our language.

Preliminaries
A context is a partial function from variable names to values, other variable names, or expressions. The empty context [] is undefined for all names ([] We will use contexts passed into mathematical functions as a way of specifying argument bindings without positions (e.g., the value of x is 3, of y is 42, etc.). We will also use contexts by directly applying them to expressions in order to perform variable substitution (e.g. [x ↦ → y] (2 * x + z) = 2 * y + z).
We will use the notation ì x as shorthand for a finite list of variables x 1 , x 2 , . . .. The expression y ∈ ì x means that y is a symbol occuring in that shorthand list.

Syntax and Denotational Semantics of L
We first present a minimal language L that demonstrates many of the key features of Teg. We discuss later how to extend the minimal language into our full language Teg. Below is the syntax written in Backus-Naur form: In the indicator function (ì x) > 0 , the vector ì x are the free variables, and are invertible functions drawn from a class of functions that we will discuss later ( §4.5). For now, simply note that 1 is the component of that is branched on. We will generally suppress the specification of free variables ì x. The bounds expressions a and b are expressions in L that do not include integrals or indicator functions, and must be independent of variables of integration. In contrast, arguments to invertible functions ì x inside of indicator functions may depend on variables of integration.
The term f corresponds to a built-in function, and is the mathematical function denoted. We require that is smooth and that the derivatives are syntactically defined. For example, cosine may be defined as the denotation [[cos( ( (e) ) )]] = cos( [[ ]]), and the derivatives are corecursively defined using sin.

Syntactic Sugar
We briefly address convenient extensions to the minimal language L. Via the built-in mechanism f( ( (·) ) ), we add support for division, and other common mathematical functions. More general comparisons (ì x) > (ì x) can be reduced to the canonical formulation (ì x) − (ì x) > 0 . Conjunctions of conditions can be encoded as products of indicator functions, and disjunctions can be represented using the inclusion-exclusion principle.
As presented, L does not have a way of binding intermediate variables. Our prototype includes let-bindings, although certain parts of derivation (Sec. 4.4) may require inlining these bindingswhich can lead to poor compile time in some cases.

Derivative Application
We define the source-to-source derivative application in terms of the derivative transformation [[e]] , where the context specifies a mapping from variables to be differentiated to their corresponding differential variable names (e.g., [x ↦ → dx]).
Taking the derivative of indicator functions produces Dirac deltas, a construct not accounted for in our language L. Therefore, we extend the language to L ′ as follows: Note that besides including the Dirac delta (ì x) , this extension L ′ ensures that the resulting expressions are linear in the delta terms, and will also be linear in any differential terms.

Derivative application [[e]] follows two steps:
• A lifting derivative, which takes in a program in L and outputs a program in L ′ that may include Dirac deltas. • Delta elimination takes in a program in L ′ and outputs a program in L that is delta-free. It achieves this by having Dirac deltas either annihilate with the appropriate integrals or be set to zero if their contribution is of measure zero.
We now detail these steps in the derivative application. [ 4.4.2 Lowering: Delta Elimination. To achieve closure back to our original language L, we need to eliminate Dirac deltas using a series of rewrites. These rewrites distribute multiplication and integration to move Dirac deltas up through the expression until they sit directly inside of integrals-at which point they can be fruitfully manipulated by the subsequent passes. An implementation can be more judicious in the use of the second rule above to reduce code-duplication.
Applying these rewrites results in a normal form: where k i=1 e i is syntactic sugar for the expression e 1 + . . . + e k and m and M are constants because each term [ [ [ > 0] ] ] contributes at most one discontinuity and there are finitely many such terms in an expression in L. Since the derivative is a linear operator, there is at most one delta in each product.
Pass 2: Delta Reparameterization. To eliminate the deltas, we need to first simplify them by jointly reparameterizing the enclosing integrals (Fig. 4). Without loss of generality, let ì x be all of the variables of integration x 1 , . . . , x n and ì z are the other variables that parameterize . The delta reparameterization rewrite is: where the new terms on the right (the substitution −1 within e, the Jacobian term |J |, the new bounds of integration ( ì a ′ , ì b ′ ), and the mask M ) are defined as follows.
As we discuss later (Sec. 4.5), specifies a map (ì z) : ì x → ì y whose first output component y 1 is the value being passed to the Dirac delta. As a diffeomorphism, includes the inverse −1 (ì z) : ì y → ì x. From this inverse map we can define −1 as mapping [x i ↦ → −1 i (ì z, ì y)] (for each i), where these latter −1 components are reducible to expressions in our base language L. |J |(ì z, ì y) is the determinant of the Jacobian of −1 (ì z) evaluated at the base point ì y. This Jacobian determinant is reducible to expressions in our base language, just as −1 is.
Finally, the bounds ( ì a ′ , ì b ′ ) are derived to enclose the image of the original domain of integration. The mask is derived simultaneously to ensure that only points mapped from within the original domain of integration contribute to the new reparameterized integral. Automatic means of deriving these bounds are discussed later (Sec. 4.5).
Pass 3: Delta Annihilation. We now focus on simplifying a single expression in Eq. 7: Our aim is to match the variable of integration with the variable in the delta or to move the delta outside of the integral. The following rewrite rules realize this aim: The first rule annihilates the delta with the integral and adds the delta contribution if the discontinuity is inside the integration domain. The second commutes a delta and an integral if the delta expession is independent of the variable of integration. Each delta either annihilates with an integral or is removed in the following pass.
Pass 4: Measure zero destruction. All remaining deltas follow the structure of the second term in Eq. 7: We set all of the remaining deltas to zero because they do not depend on an integration variable, and thus are only non-zero over a

Conservative bounds transfer
Source space (Euclidean) Map-1 (Polar) Map-2 (Translate) Fig. 4. Delta reparametrization. This flow diagram explains a crucial step in our compiler pass for handling a Dirac delta (ì x) . Our key idea is to reduce this expression to a normal form (e.g., ( ( ( y 1 ) ) )) through a series of reparameterizations by composing diffeomorphisms. This can be thought of as expressing the expression in a new coordinate system where the expression is exactly coincident with an axis of integration. Here we show an example of reparametrizing a condition representing the area of a circle x 2 + y 2 − t . We can transform the coordinates from Cartesian to polar, making it an integral over the angle and radius. Finally we can translate r − √ t to obtain the simplified ( ( ( r ′ ) ) ), which can be automatically eliminated by Teg.
measure zero support.

Reparameterization
We relied on certain properties of the functions (ì x) that occur inside of step functions and Dirac deltas. These functions control the shape of discontinuities via diffeomorphisms (i.e., change-ofcoordinates) on the domain of integration. We rely on the following conventions while defining a diffeomorphism (ì z, ì x). We assume that depends on variables of integration ì x, as well as other ì z; we write [ì a, ì b] to define a multi-dimensional interval (axis-aligned box) with the given expressions as lower and upper bounds; and we write [ [ [ ì · · · > 0] ] ] to indicate a product of multiple component step functions.
The following five components are required for such a diffeomorphism to be well defined.
(1) Mapping: a smooth function (ì z) : ì x → ì y specified as a collection of expressions e i in L not using integration or indicator functions. In particular, (ì z, ì x) = 1 (ì z, ì x) controls where a step function steps or a Dirac delta has a spike.
(2) Inverse: a smooth function −1 (ì z) : ì y → ì x that is the inverse of , specified as expressions in L, similarly to .
(3) Jacobian: a smooth function |J |(ì z, ì y) representing the determinant of the Jacobian of −1 , specified as expressions in L, similarly to . (4) Bounds transfer: a function B (ì z) : (ì a, ì b) → ( ì a ′ , ì b ′ ) that specifies safe bounds for the reparameterized integral, i.e. ∀ì x ∈ While programmers may exploit the ability to specify all five of these components of a diffeomorphism, given (1) and (2) the system can automatically construct (3) the Jacobian by applying automatic differentiation, (4) the bounds-transfer by applying interval arithmetic to , and (5) the bounds-mask by M (ì a, The automatic derivation of M is well-defined. First, −1 is a diffeomorphism since is a diffeomorphism. Second, because bounds expressions ì a and ì b do not depend on the variables of integration (ì y), offsetting by them is equivalent to offsetting by a constant value (with respect to variation of ì y). Thus, the expression = −1 − ì a is a diffeomorphism with (ì y) = −1 (ì y) − ì a and −1 (ì x) = (ì x + ì a). In general, the problem of computing the inverse of functions is uncomputable, and the extension of a function : R → R to : R → R is underspecified. However, in some cases, we can automate even this part of the work. For instance, the case of affine expressions can be treated symbolically and robustly. Appendix A specifies the affine diffeomophism.

Degeneracies
It is well known [Schwartz 1954] that the products of Dirac deltas, indicator functions, and similar such distributions are not always well defined or well-behaved. It is possible to create such ill-defined expressions in our language. Here we characterize this set of illdefined expressions.
One such problematic expression is the derivative of the integral of the product square of a parameterized indicator function: , then the result of the derivative (after delta-simplification) should be 1, as in the first example of this paper.
However, the derivative (pass 1) gives which following delta annihilation (and some added simplification) becomes: which is 0 assuming t > t is false, or 2 if t > t is taken to be true. In general, it is not possible for a compiler to determine whether or not such degeneracies exist, since function equivalence is undecidable. This situation is not that different than the presence of singularities when using a conventional automatic differentiation system -the automatic derivative of numerically stable code is not necessarily numerically stable. However, we can make certain guarantees about the correctness of our automatic differentiation when we can assume that discontinuities occur in general position with respect to each other. A proof sketch is included in Appendix B.

IMPLEMENTATION
In this section, we briefly describe the implementation and relevant additions to the core semantics presented in Sec. 4. Language features. For clarity of exposition, our semantics are limited to a minimal set of key primitives. On the other hand, our implementation supports a wider range of common features.
We support let bindings to allow for function abstraction and to avoid unnecessary code replication. Our implementation also allows for the creation and projection of tuples, which can be used for static, fixed-size arrays. Since we do not currently implement integer types, these arrays cannot be dynamically indexed. Returned results are outputted as Python lists allowing for manipulating outputs.
Additionally, we implement reverse-mode differentiation using a standard source-to-source approach. All compiler passes such as those that manipulate Dirac deltas are shared with the forward derivatives as described in detail in the semantics (Sec. 4.4).
Although the implementation lacks looping facilities, it is easy to meta-program Teg in Python to, for example, compute the integral for all of the pixels in an image.
Execution targets. We embed Teg in Python. The Python code can then be lowered to an intermediate representation (IR), where integrals (Teg) are discretized to for loops with quadrature. The IR is further converted to a C header file that can be inserted into larger projects, or imported as a Python module. Teg expressions can be evaluated in either NumPy or compiled to CUDA device code.
Numerical validation. We implement a test suite of 95 integral expressions to test both execution targets against finite differences. See the supplementary code for the test cases.

APPLICATIONS
We apply Teg to applications spanning several domains in computer graphics and showcase the benefits of automatically accounting for parametric discontinuities during optimization.

Image triangulation
Given an image, we want to generate a stylized version composed of a triangulated pattern [Lawonn and Günther 2019;Yun 2013]. We can formulate the problem similarly to the 2D differentiable rendering example in Sec. 2.2. Eq. 3 from Sec. 2.2 lays out the rendering model for the constant color image triangulation problem. Here, we further extend the example to have more elaborated shading constant linear quadratic target ours ignore delta Fig. 6. Image triangulation. Given a target image (first column), we optimize the position and color of triangles to fit a stylized version of the image (second column) with different color interpolation schemes (Fig. 5). We initialize the triangles as a grid mesh and set all color to black. We compare to an optimization that ignores the Dirac delta terms of the derivatives (third column) -this is equivalent to using a traditional automatic differentiation system to produce the derivatives. In the constant color case, ignoring the delta term makes the position derivative always zero, so no triangles are moved. Even in the smooth case, ignoring the delta terms lead to artifacts, such as oversmoothing in the linear case and deviation from sharp corners in the quadratic case.
(a) linear (b) quadratic Fig. 7. We compare the loss convergence in the image triangulation task in Fig. 6 between our approach (orange) and an approach that ignore the Dirac delta terms in the derivatives (blue). (a) shows the linear color interpolation case in the second row of Fig. 6, while (b) shows the quadratic color interpolation case in the third row of Fig. 6. Since ignoring the delta derivative leads to a biased estimator, it converges to a worse solution. models. Fig. 5 demonstrates the three different shading methods with increasing parameter complexity: (1) Constant. A triangle is assigned a single color independent of the position ( , ) (Eq. 3).  Fig. 8. Differentiating thresholded Perlin noise textures. Instead of reparameterizing the higher-order polynomials formed by the Perlin noise, we solve a different problem by evaluating the Perlin noise at a discrete grid (a) and reconstruct it using bilinear interpolation (b). We then apply thresholding on the bilinear interpolation (c). The bilinear interpolation still leads to deltas with non-affine arguments. To eliminate the deltas, we apply a diffeomorphism between Cartesian coordinates and hyperbolic coordinates (d). This transforms the deltas into simpler affine conditions (e).
(2) Linear. We define three colors C i , one for each vertex on the triangle. We then use barycentric interpolation to compute the value at a given continuous ( , ) point. In Teg, this can be expressed as lin_color = (w 1 C 1 + w 2 C 2 + w 3 C 3 ).
Results. Fig. 6 shows some optimization results with a 2 loss using Adam [Kingma and Ba 2015], using derivatives automatically generated by Teg. We compare against the gradients obtained when ignoring discontinuities, i.e., the incorrect discretize-thendifferentiate approach from Sec. 2.1. We run 150 iterations for both approaches over all images. Fig. 7 shows the convergence plots for both approaches. We compiled to CUDA kernels and run on an RTX 2080 Ti. For the constant fragment case, each full iteration took 0.11s to complete. The more complex linear and quadratic fragment programs took 1.01s and 2.05s per iteration respectively. All reported timings are for a 1200 × 800 image with approximately 2000 triangles. The constant and quadratic fragment programs had peak memory usage statistics of 177 and 180.8 megabytes respectively. initialization guide image ours ignore delta (a) colorized Perlin shader initialization guide image ours ignore delta (b) two-tone Perlin shader Fig. 9. Guided Perlin textures. We optimize for the parameters of two shaders based on Perlin noise. The shader in (a) uses the Perlin noise value to decide between using a flat gray color and a low-resolution color map. We keep the decision threshold value fixed and optimize for the color map as well as the noise vectors of the Perlin grid. Ignoring the delta contribution leads to an unchanged noise pattern, while our approach produces a noise pattern that adhere to the logo structure. The shader in (b) uses two colors instead of a color map. We optimize for the grid vectors, the two colors, and the decision threshold value. Without the delta contribution, only the threshold value and colors have non-zero derivatives, which is insufficient to create a structural pattern.

Guided Perlin textures
We fit the parameters of a discontinuous procedural shader to a target image. A common pattern in procedural shaders is to threshold Perlin noise [1985] -the noise function is compared to a threshold to decide which color to use for producing textures with segmented regions. The thresholding produces complex discontinuities that require special treatment.
The noise function produced from Perlin noise is at least a 4thdegree polynomial in two variables. Eliminating the resulting delta expression thus requires finding a diffeomorphism from this space to space where the delta expression is affine. Solving the polynomial system in a numerically robust way is challenging. Fortunately, there is an easier formulation that avoids this problem (Fig. 8).
We evaluate the Perlin noise function at discrete positions on a grid. We compute the value at continuous points through bilinear interpolation of the four closest grid points: where N i,j are values computed by the noise function. Importantly, the continuous noise function is piecewise bilinear, which is neither linear nor affine.
To use thresholded noise to produce a two-color shader, we arrive at the following expression that selects color C + if the noise is above the threshold , and C − otherwise: The condition [ [ [ noise > T] ] ] is in the general form ( ) + ( ) + ( ) + (1) that traces an off-axis rectangular hyperbola in and . This is not the common affine pattern, so we reparametrize the delta expression through two diffeomorphisms: (i) scale and translate to move the rectangular hyperbola to the center, ( , ) ↦ → ( ′ , ′ ) ( Fig. 8(d)): (ii) convert from Cartesian to hyperbolic coordinates ( ′ , ′ ) ↦ → ( , ) (Fig. 8(e)): Note that there are two possible values for u. This is because the delta expression in this new space takes the form u 2 − c , which is equivalent to 1 2 c −1 u − √ c + u + √ c . Therefore, there are two hyperbolic spaces, which correspond to each of the two delta expressions in the new space, as shown by Fig. 8(c).
Results. Fig. 9 shows the optimization results with an 2 loss using Adam [Kingma and Ba 2015], using the shader discussed here with a modified version of a spatially varying color map C + (x, y) instead of the uniform color C + . We compare against the gradients obtained when ignoring the derivative contribution of discontinuities. The noise structure cannot be optimized when the deltas contributions are ignored. We run the optimization for 300 iterations for both approaches. On an RTX 2080 Ti, each iteration took 0.033s and used 135 megabytes of memory for 400 × 400 noise and guide images, and a 40 × 40 Perlin vector grid.

Trajectory optimization with contact
Consider a minigolf task ( Fig. 10 and 11), where we want to hit a hole-in-one from a starting position, while hitting a pre-specified sequence of walls along the way. We want to find a path q(t) for the ball satisfying its equations of motion and additional boundary conditions, including: q(t 0 ) is at its start location and q(t 1 ) is at the hole, for start and end times t 0 and t 1 . However, this problem is complicated because of discontinuities in the velocity trajectories at points of contact, and contact times that are not known a priori.
Using the methodology introduced in Sec. 2.3, we can optimize the trajectory in Teg, without resorting to complicated machinery such as function smoothing or non-linear programming. We use a Lagrangian energy with a frictional term [Bateman 1931]: where m is the mass of the ball, q t is the time derivative of the position q, and k is the frictional coefficient. The position, q(t), is parametrized as a linear spline. The potential energy, V, contains the gravitational force m * g * q.y. Following Sec. 2.3, we can incorporate elastic contact via a high potential within the barriers m * C * H c (q), Fig. 10. Minigolf. We use our language to search for hole-in-one trajectories in a minigolf game. Given an initial guess of the trajectory (the orange path), we optimize for path parameters that minimize the action integral over a Lagrangian with friction and contact forces. The walls and the windmill blades are potential contact points where the velocity of the ball is discontinuous. We solve the path using a second-order trust-region based Newton method. The two paths are found using slightly varying setups: one with k = 0.4 hitting no walls and one with k = 0.2 hitting the bottom-left wall.
(a) ours (b) without delta Fig. 11. Double minigolf. We solve a similar problem to Fig. 10 with two golf balls on a flat surface that can collide with each other. We compare to a solution that ignores the discontinuity derivatives. Our solution obeys the law of reflection. Ignoring the delta contribution from the discontinuities lead to non-physical behaviors that violate Newton's first law.
where H c is a function that is positive only inside the barrier, such that the contact force is only applied when the object interpenetrates a wall ( is set to a large number). We can also have walls that move with time -the windmill in Fig. 10 is one such example, the blade of the windmill is blocking the golf ball only at a certain timeframes. Our solution paths thus correspond to finding stationary points of the action S = ∫ L. We initialize the optimizer with a non-physical path that contacts the input wall sequence, so that the system finds a physical path with the desired contact behavior.
Due to the size of the scenes and to avoid numerical instability, we parameterize q(t) via a set of generalized coordinates that implicitly constrain the path to contact our given input walls. Note that this choice of coordinates only simplifies the optimization; our problem is still the same -we want to solve for where and when the contact points are, wherein velocities are discontinuous.
We set up two scenes for experimentation: in one scene we have a hill and a windmill, and in the other scene we make two golf balls collide with each other but ask both of them to reach the goal. For the windmill scene we set the friction coefficient to k = 0.2 m m m Fig. 12. Optimizing a discontinuous bungee.
(a) Stress-strain curves (b) Loss convergence Fig. 13. Bungee results: We show the the stress-strain curves and loss for optimizations using finite differences, ignoring discontinuities, and using Teg. The discontinuous stress-strain curve leads to Dirac delta during differentiation. This makes it difficult to tune finite difference step sizes and ignoring the deltas lead to suboptimal results. Our final loss is 18 times lower than ignoring deltas. and 0.4 for two different runs, and for the double minigolf scene we set k = 0.2. Fig. 10 and 11 show the results. We use a secondorder trust-region based Newton method implemented in scipy for optimization. We perform a coarse-to-fine optimization strategy by starting with a low-resolution trajectory, then gradually refining it to higher resolutions. The final resolution is around 10 control vertices between each collision. The method converges quickly with the tolerance of 10 −8 to 10 −12 . In our setting, the collision events need to be known a priori, but the collision position can be anywhere.
We also compare to an optimization that ignore the derivatives coming from the discontinuities in Fig. 11. It clearly converges to a non-physical result, showing the importance of incorporating the Dirac delta terms. The optimization uses a single thread, takes (on average) 1.17s per iteration and uses 779.7 megabytes of memory on an Intel Core i9-9900K.

Optimizing a discontinuous bungee
Now we discuss optimization of a physical design in the presence of discontinuities. Suppose we want to design a spring system for bungee jump (Fig. 12). The motion of a spring is traditionally modeled using Hooke's law = − , where is mass, is gravity, and is the spring constant, which models the stiffness of the spring. The term assumes that the material does not deform or lock and is not part of a composite. For example, metal may bend, fracture, or exhibit structural transitions under extreme heat or cold [Hibbeler 2000;Lin et al. 2017;Tabin et al. 2016]. Composite materials such as rebar concrete (steel-reinforced concrete) exhibit non-linear stress-strain curves corresponding to the transitions between materials [Hibbeler 2000]. Woven materials such as yarn may lock producing discontinuities in strain. In these cases, the term is replaced by a stress function ( ) that is discontinuous, making the system analytically intractable. A natural way to estimate the solution is by using numerical methods.
We study an idealized series of bungee cords that deform. After deformation, a string prevents further extension of the spring. We jointly minimize the time and acceleration of a person connected to this bungee-string system. We optimize the spring constants 1 , 2 and the lengths of the strings 1 , 2 (Fig. 12). We add the hard constraint that the person does not hit the ground to prevent death.
Derivation. With two bungees-string links, the stress function is: where is the plastic deformation factor of the bungees, and 1 and 2 are the individual displacements of two springs that sum up to the total displacement : = 1 + 2 . Also 1 1 = 2 2 , since the springs are massless and the force at the bottom of the first spring must balance the force at the top of the second spring. The corresponding second-order autonomous differential equation: cannot be solved by analytical means due to the discontinuities. If we let = d d then since = d d = d d , we have: d = ( − ( ))d Dividing by , integrating both sides of the equation assuming the system is initially at rest, and solving for gives: where l is lowest acceptable height in the trajectory (right above the ground). Assuming the system is initially at position 0, we solve for time by substituting = d d and integrating after isolating d to one side of the equation. Thus, we find the time it takes for the person to fall is: where u is the initial height above the ground. We bound the total displacement by constraining the velocity to be 0 at which is just above the ground. The final constrained optimization problem is: where a(x) is the acceleration at position x. In words, the goal is to minimize the time to fall plus the squared acceleration given that the final velocity just above the ground is 0. We use the trust region constrained algorithm with BFGS approximated Hessian implemented in scipy to optimize 1 , 2 , 1 , and 2 . Experiment. We now detail the experimental results for optimizing the parameters of the bungee-string system. The system starts 5 meters above the ground and we initialize it with arbitrarily chosen parameters. These parameters do not satisfy the constraint that v(l) = 0. We set the deformation factor = 0.2 and let the bungee bottom out just above the ground at 10 −5 .
We compare to derivatives computed by finite difference or automatic differentiation that ignore the Dirac deltas. Fig. 13a depicts the different stress curves for each of these parameter assignments. Fig. 13b shows the loss during optimization. Before iteration 50, the ignore delta solution is infeasible, whereas ours is feasible by iteration 15. Table 1 includes the final parameter assignments and their corresponding loss. The computation took 0.048s per iteration and used 145 megabytes of memory on an i9-9900K.

LIMITATIONS AND FUTURE WORK
We have shown that our approach to handling parametric discontinuities is applicable to problems in graphics and physical simulation. We now detail the limitations on the expressivity and implementation of our approach and how they may be addressed in the future.

Non-Smooth Builtins and Changes of Coordinates
In practice, we use many functions that are not defined everywhere (division, trigonometric functions) and violate our theoretical assumptions of smoothness; such functions with singularities and asymptotes may also be numerically unstable near such non-smooth regions. This is often acceptable for applications, but often leads to difficult kinds of numerical debugging. Such challenges are familiar for graphics programmers, but the reparameterization machinery we present here can make such problems even harder to debug. It would be helpful to figure out better software engineering tools for analyzing automatically differentiated code, especially in the presence of additional code transformations.

First-Class Derivatives: Inside of Integrals
Our guarantee of correctness does not consider integrals of derivatives. In particular, applying our differentiation to some expression (without the context of integration) will eliminate Dirac deltas arising from discontinuities even if we later place the resulting differentiated expression inside of an integral. However, keeping Dirac deltas around instead would allow users to form expressions that are non-linear in those delta functions (e.g., products of deltas).
Still, there are problems that our system does not support, yet a particular treatment of the Dirac deltas in question leads to a desirable result. For example, the action integral ∫ ( , , ) where is the time derivative of position, contains the kinetic energy 1 2 2 . This is not a problem for any physical trajectory, since all such trajectories are C0 continuous (otherwise objects would instantaneously teleport). However, in the case of our examples, is not C0 continuous, and so 2 is not properly in general position.
Nonetheless, our method appears to work suitably for the physics problems we investigated. Further investigation into the underlying mathematics of distributions, and the corresponding automatic differentiation compilers, is warranted to help ensure reliably correct behavior for products of distributions.

Tensors Manipulation
The proposed semantics and implementation lacks the facility for tensor manipulations such as indexing, block computations, etc. Instead, data is implicitly unrolled and processed as scalar values. Implementation of these additional constructs is important for applying our language to problems in domains including deep learning. Furthermore, there are important considerations with respect to the interactions between tensor operations and other language interactions, specifically, derivatives of integrals with discontinuities.

Performance
For an expression of size with -many deltas, our automatic differentiation can end up duplicating most of as many as times. Furthermore, we can only perform the code transformations (e.g., normalization and reparameterization) that constitute our differentiation as whole-program transformations. For instance, deeplearning frameworks such as TensorFlow or PyTorch expose composable layers using the chain rule as their modularity principle. It is unclear if and how the derivatives of integrals we consider in this paper can be made fully modular and composable in a similar way. However, the methods considered in this paper can be used to write differentiable layers so long as the integrals in question are wholly contained within a single layer.

Approximations other than Integral Discretization
Many other approximate operations other than integral discretization are not commutative with differentiation. For example, approximating a function using a piecewise constant function makes the derivative of the approximation ill-behaved. Extending our idea to general function approximation is an interesting direction for future work.

CONCLUSIONS
We explore a systematic way to solve graphics and physical simulation problems that involve differentiation, integration, and parametric discontinuities. We formalize the semantics of a new programming language and implement these semantics in Teg. In the same way that automatic differentiation frameworks, such as TensorFlow and PyTorch, made implementation of machine learning algorithms accessible, we believe our differentiable programming language makes a significant first step towards making the implementation of differentiable graphics systems accessible.
Finally, the remaining deltas are no longer contained within integrals (since they have either been factored out during Delta Annihilation, or were already outside thanks to normalization). The maps inside these deltas are zero on at most a measure zero subset of the domain of the free variables. This is fine, since we only claim correctness almost everywhere, like most automatic differentiation systems' claim without integrals [Mazza and Pagani 2021]. □