Ten-month-old infants infer the value of goals from the costs of actions

Ranking valuations on the basis of observed choices Obliged to make a choice between two goals, we evaluate the benefits of achieving the goals compared with the costs of the actions required before deciding what to do. This seems perfectly straightforward, and it is unsurprising to learn that we can also apply this reasoning to others; that is, someone that we see choosing a goal that requires a more costly action must value that goal more highly. What is remarkable, as Liu et al. report, is that preverbal children can reason in this same fashion. Science, this issue p. 1038 Infants can assess how worthwhile or valuable an object or goal may be from others’ behaviors in achieving or acquiring it. Infants understand that people pursue goals, but how do they learn which goals people prefer? We tested whether infants solve this problem by inverting a mental model of action planning, trading off the costs of acting against the rewards actions bring. After seeing an agent attain two goals equally often at varying costs, infants expected the agent to prefer the goal it attained through costlier actions. These expectations held across three experiments that conveyed cost through different physical path features (height, width, and incline angle), suggesting that an abstract variable—such as “force,” “work,” or “effort”—supported infants’ inferences. We modeled infants’ expectations as Bayesian inferences over utility-theoretic calculations, providing a bridge to recent quantitative accounts of action understanding in older children and adults.

W hen we observe people's actions, we see more than bodies moving in space. A hand reaching for an apple is not just one object decreasing its distance from another; it can indicate hunger (in the person who is reaching), helpfulness (if the person is reaching on behalf of someone else), or compromise (if the person reaching would prefer a banana, but not enough to go buy one). This fast and automatic ability to interpret the be-havior of others as intentional, goal-directed, and constrained by the physical environment is often termed "intuitive psychology" (1)(2)(3)(4). We used behavioral experiments and computational models to probe the developmental origins and nature of this ability.
Over the past two decades, research has revealed that the building blocks of our intuitive psychology are present as early as the first year of life. Despite infants' limited experience, their interpretations of other people's actions are guided by assumptions about agents' physical properties (5), intentions and goals (6), mental states (7)(8)(9)(10), causal powers (11), and dispositions to act efficiently (7,12,13). This wealth of findings does not reveal, however, whether infants' capacities depend on a host of distinct local abilities (14)(15)(16) or on a single coherent system supporting inference, prediction, and learning (3,(17)(18)(19).
We tackled this question in a case study, based on a computationally precise proposal for a coherent, abstract, and productive system for action understanding (Fig. 1). Previous studies suggest that infants are sensitive to the costs of agents' actions (3,7,12,13) and can infer agents' preferences (6,9). Decision theorists for hundreds of years have recognized these as the two central factors guiding the decisions of rational agents (20)(21)(22). We asked whether infants can integrate these dimensions to infer agents' goals: Do infants use the cost that an agent expends to attain a goal state in order to infer the value of that goal state for the agent?
Such an inference has been proposed to rest on three nested assumptions that together constitute a "naïve utility calculus" (23), analogous to classical economic thinking. First, agents act Rational planning C( Fig. 1. A schematic of our computational model. (A) The forward direction defines the agent as a rational planner that calculates the utilities of different actions from their respective costs and rewards and then selects an action stochastically in proportion to its utility. In this case, the overall utility for approaching the triangle is higher than for approaching the square, so the central agent (circle) will likely choose triangle over square. (B) An observer (i) assuming this model and priors over the costs of different actions can (ii) observe a series of actions and then (iii) infer a posterior distribution over the hidden values of an agent's costs and rewards given its actions. (iv) These posteriors can then be used to predict the actions of the agent in a new situation, in which the same goal states can be reached with different actions.
to maximize their utility U, under constraints (2,4,24,25). Second, this utility separates into rewards and costs, two distinct components that can be individual targets of inference (26). That is, if R(S) is the reward of a goal state S, and C(A) is the cost of an action, then an agent acts to maximize the following Third, the cost of an action is not arbitrary but depends on properties of both the agent and the situation: properties that jointly determine how much effort the agent might need to exert to carry out that action. These assumptions can be formalized as generative models that successfully predict the quantitative and qualitative behavior of adults and older children (4,23,26). In these models, observers who reason that other agents are maximizing their expected utility according to Eq. 1 can use what they know about rewards and costs to predict the agents' future actions. Inverting this process, observers can use the agents' overt actions to infer their hidden rewards and costs, according to where P(R, C | A) is the posterior distribution over the rewards and costs of an agent. By Bayes' theorem, this distribution is proportional to the product of P(A | C, R)-the likelihood of the agent choosing action A given rewards R and costs C, given by a rational planning procedure (4, 23)-and P(R, C), a prior distribution over costs and rewards.
Do infants apply the logic of cost-reward reasoning? Past research suggests that infants are sensitive to the relative value of different goal objects for an agent who chooses to approach one object over another (6,27) as well as the relative efficiency of the actions taken by an agent who approaches a goal object (12,13,28). Past studies do not reveal, however, whether infants have a unified intuitive psychology in the form of a generative model, or separate representations for variables such as cost and reward that become unified only later in development, as children gain experience exerting themselves to achieve goals or communicating with others about their desires and actions. It is also an open question whether infants consider cost and reward in terms of abstract variables-such as work, effort, desire, or value-or whether their understanding is restricted to perceptual features of actions, such as the distance or duration an agent travels or the number of times it selects a particular goal. In physical action contexts, effort often covaries with perceptible properties such as the length or duration of a path traveled, but effort depends ultimately on the amount of force that the agent must exert over time and distance (the amount of work the agent must do). Likewise, value often covaries with the number of times an agent selects a goal but ultimately depends on how strongly the agent desires a goal relative to the cost of achieving it or its value relative to other options.
We designed and conducted three experiments to test whether infants learn about the reward agents place on goals from cost, working backward from the assumption that agents maximize utility and inferring relative rewards from observed actions under varying costs. We then used the data from these experiments, together with the findings from past experiments (6,7,13), to test a variety of computational models of infants' performance, including models with integrated versus isolated, and abstract versus cuebased, representations of costs and rewards (model description is provided in the supplementary materials). Our empirical and computational findings support the view that a productive system grounded in cost-reward trade-offs guides action understanding toward the end of the first year of life.
We tested n = 80 10-month-old infants in three experiments with prespecified designs, procedures, sample sizes, and analysis plans (29). In all experiments, infants first saw an agent move to and refuse to move to each of two target goals under conditions of varying cost. Then, infants watched test events in which the agent chose either the higher-or the lower-value target when both were present at equal cost. If infants infer the reward of the targets to the agent from the effort undertaken to reach them, then they should be more surprised when the agent chooses the lower-value target, looking longer at the test trials displaying that action (30).
In experiment 1 (n = 24 infants), we leveraged events widely used in studies of early action understanding, in which animated characters jump efficiently over barriers of variable heights to arrive at goal objects (3,7,13,31) and indicate their preferences by selecting one goal over another (6,9). During familiarization, infants watched six trials that consisted of four different events involving a central agent and one of two target individuals on a level surface ( Fig. 2A and movie S1). In each event, the target jumped and made a noise, and the agent responded by turning to face and beginning to approach the target, whereupon a barrier fell onto the stage 2 of 4 (A to C) During familiarization, the central agent (circle) accepted a low cost and refused a medium cost for the lower-value target (square) and accepted a medium cost and refused a high cost for the higher-value target (triangle). Other than the sizes of the barriers, ramps, and trenches, and the consequent trajectories of motion, the pairs of events displaying approach or refusal of approach to the two targets were identical. (D and E) At test, the agent looked at each of the two targets and chose either the lower-or higher-value target. White circles indicate start-and end points of action, and white lines indicate trajectories.
directly in the agent's path. On two of these events (one for each target), the agent looked to the top of the barrier, made a positive "Mmmm!" sound, backed up, and then jumped over the barrier, landing next to the target. On the other two events, the agent looked to the top of the barrier, made a neutral "Hmmm…" sound, backed away, and returned to its initial position. The critical distinction between these events concerned the height of the barrier and therefore the length, height, and speed of the jump that the agent undertook so as to clear it (all jumps were equated for duration). For one target, the agent jumped over a low barrier and declined to jump a medium barrier; for the other target, the agent jumped the medium barrier and declined a tall barrier. After this familiarization, the agent appeared between the two equidistant targets on a level surface. Infants viewed two pairs of looped test events ( Fig. 2D and movie S4), order counterbalanced, in which the agent looked at each of the targets and then repeatedly approached either the higher-or the lower-value target. Our prespecified dependent measure was average log-transformed looking time (32) across test trials. In experiment 1, we predicted differential looking at the test events but did not prespecify the direction of this difference.
Infants looked longer at test trials in which the agent chose the target for whom it had jumped a lower barrier (mean = 28.41 s, SD = 14.85), relative to the target for whom it had jumped a higher barrier (mean = 21.79 s, SD = 12.29) [95% confidence interval (CI) (0.062, 0.591), b coefficient (B) = 0.327, SE = 0.130, standardized b coefficient (b) = 0.502, t(24) = 2.523, P = 0.019, two-tailed, mixed effects model with random intercept for participant] (Fig. 3) (30). These findings suggest that infants inferred the rewards that the central agent placed over the targets from the cost the agent was willing to expend to reach these targets, and they therefore expected the agent to choose that target at test. Nevertheless, experiment 1 does not show whether infants used the physical effort undertaken by the agent, or variables that merely correlate with effort (such as distance or speed), in their predictions.
To control for distance and speed of travel, experiment 2 (n = 24 infants) used ramps of three different incline angles to convey cost ( Fig. 2B and movie S2). On each familiarization trial, a target appeared on the top of one ramp, and the agent looked up the ramp and either climbed to the target or returned to its starting position. The agent climbed the shallow ramp and declined to climb the medium ramp for one tar-get and climbed the medium ramp and declined the steep ramp for the other target. The methods were otherwise the same as in experiment 1. Consistent with our prespecified directional prediction, infants again looked longer at the test events in which the agent approached the lower-value target (mean = 30.94 s, SD = 13.31) than test events in which the agent approached the higher-value target (mean = 27.05 s, SD = 17.55) [95% CI (0.028, 0.472), B = 0.250, SE = 0.109, b = 0.408, t(24) = 2.294, P = 0.015, one-tailed, mixed effects model with random intercept for participant] (Fig. 3) (30). This finding further suggests that infants understand agents' actions in accord with abstract, general, and interconnected concepts of cost and reward, but narrower explanations remain. In experiments 1 and 2, the agent was confronted with an obstacle to its forward motion (a barrier or ramp), and the size of the obstacle covaried with the cost of the agent's action, requiring the agent to move further upward to attain the higher-value target. Because infants become sensitive to the effects of gravity on objects on inclined planes well before 10 months of age (33), they may learn that agents will move to greater heights or overcome higher obstacles for more rewarding targets, without invoking a more abstract representation of physical effort. Experiment 3 was undertaken to explore these interpretations.
In experiment 3 (n = 32 infants), the agent was separated from each of the two targets during familiarization not by an obstacle but by a horizontal gap in the supporting surface ( Fig. 2C and movie S3). Infants first saw a ball roll off the edge of a narrow, medium, and wide gap and shatter (movie S6). During familiarization, these three trenches, requiring jumps of variable lengths and speeds but of equal durations and heights, were interposed between the agent and target; the agent moved to the edge of a trench, looked at the far side, and then jumped over a narrow trench for one target (and refused the medium trench) and a medium trench for the other target (and refused the widest trench). The methods were otherwise unchanged ( Fig. 2E and movie S5). The methods and analyses for experiment 3 were preregistered at https://osf.io/k7yjt (29) and tested the same directional prediction as that in experiment 2. Infants again looked longer at the lowervalue choice (mean = 23.05 s, SD = 13.58) relative to the higher-value choice (mean = 17.47 s, SD = 10.69) [95% CI (0.020, 0.501), B = 0.260, SE = 0.119, b = 0. 403, t(32) = 2.185, P = 0.018, one-tailed, mixed effects model with random intercept for participant] (Fig. 3) (30).
Regardless of whether an agent cleared higher barriers (experiment 1), climbed steeper ramps (experiment 2), or jumped wider gaps (experiment 3) for one target over the other, infants expected the agent to choose that target at test.   intercepts for participant and experiment], supporting our general hypothesis that infants infer the values of agents' goals from the costs of their actions. Although past research had shown that infants represent the goal of an agent's action from observations of an agent's choices between two objects (6) and expect agents to give different emotional responses when agents complete versus fail to complete their goals (31), the present experiments provide evidence that infants develop ordinal representations of reward even when the number of choices and expressed emotions are equated across the actions and only the costs of the actions vary. Moreover, they show that infants do not simply attribute higher reward to goals that agents pursue for a longer duration or attain with greater frequency because these variables were equated as well. The findings also provide evidence for longstanding suggestions that infants represent physical cost as a continuous variable that agents seek to minimize (3,13): Infants make appropriate cost assessments even when the specific physical features that distinguished lower-from higher-cost actions-including the relative length, curvature, duration, or speed of motion trajectories-systematically varied.
Together, experiments 1 through 3 suggest that infants represent cost and reward as interconnected, abstract variables that they apply to a wide range of events. The discovery that infants infer the rewards of goals from the costs of achieving them provides empirical support for the thesis that an abstract and productive system guides infants' analysis of agents and their actions (3,17,19). Specifically, we suggest that the cognitive machinery supporting infants' intuitive psychology includes a mental model both of how agents plan actions in the forward direction, in accord with maximizing their utilities (Eq. 1 and Fig. 1A) (23), and a procedure for inverting this model, in accord with the computational framework of inverse planning (Eq. 2 and Fig. 1B) (4). Applying this general framework to our specific experiments, we posit that infants have developed a model of action planning before the experiment: They assume that agents value some goal objects more than others and that agents engage in costlier actions to achieve goals with higher reward. When infants see the agent take costlier actions to arrive at one target than to arrive at another, infants invert this model to infer the relative reward of the two targets to that agent. Then, when infants see the agent flanked by the two targets in a situation in which costs are equal, they apply their knowledge of the targets' relative value to the agent to run their planning model for that agent forward, predicting the target that it will approach. We have implemented this hypothesis in a computational model that accounts not only for the findings of the present experiments but also for a range of past studies of early action understanding (6,7,13). Furthermore, we compared this model with an array of simpler models that focus only on relative costs or rewards in isolation, or on particular cues to effort or value. We found that only the full model with abstract variables for costs and rewards can account for all of the findings ( fig. S3 and supplementary materials).
The present studies raise key questions for future research. First, the cognitive architecture underlying infants' assessment of cost remains to be explored. Our experiments suggest that infants are responding to an abstract notion of cost, rather than specific physical path features such as vertical motion (controlled for in experiment 3), horizontal motion (controlled for in experiment 1), or raw path length (controlled for in experiment 2). We do not know, however, whether infants represent the abstract costs of actions by drawing on a concept of experienced effort or exertion within the domain of naïve psychology, or by leveraging an intuitive concept of force or work done (the integral of force applied over a path) from the domain of naïve physics (34,35), or perhaps both. Also, our experiments investigated only one class of goal states and target-directed actions, leaving open the breadth and generality of infants' intuitive psychology. In particular, cost can be defined in terms of work or effort to produce physical forces, but there are other kinds of costs: Agents could consider variables such as the mental effort of planning (36,37) and the risks of choosing certain actions, neither of which involves applications of force. It is an open question whether these other variables trade off against reward in infants' intuitive psychology the way that physical work or effort does. Last, our studies do not speak to the origins of these abilities. Although 10-monthold infants cannot perform the actions from our experiments or communicate with others about them, their productive system for reasoning about costs and rewards may arise through their experiences observing the actions of other agents or performing actions within their repertoire, such as lifting their arms or balancing their bodies against the force of gravity. Alternatively, this system of intuitive psychology may guide infants' action understanding from the beginning. Testing these possibilities would address fundamental questions concerning the nature, origins, and interrelations between our intuitive psychology and intuitive physics.
However these questions are answered, the present study suggests that our propensity to understand the minds and actions of others in terms of abstract, general, and interrelated concepts begins early. Before human infants learn to walk, leap, and climb, they leverage mental models of agents and actions: forward models of how agents plan, and inverse models for working backward from agents' actions to the causes inside their minds.