Already in Chapter 2, we saw the idea that threat is another cause for suffering, possibly very different from frustration. Threat was also briefly considered in the preceding chapter, as being related to survival. However, threat is actually a much more general concept. In this chapter, a general definition of threat is developed in our computational framework. This requires looking deeper into the application of probability theory in AI, which is largely drawing from the vast literature of decision-making in economics.
In our definition, the perception of a threat is fundamentally an inference that something quite bad might happen in the future, with some probability. Crucially, threat detection means computing beyond the expected rewards that are the basis of the conventional theory of reinforcement learning. Our definition of threat is based on looking at the whole probability distribution of future rewards, including various aspects of uncertainty of future reward.
Although threat thus provides an alternative framework to frustration, we will see that there are many links between the two concepts. In particular, in our definition, a threat is always based on an inference about the possibility of frustration occurring in the future. To put it very simply, a threat is always a threat of frustration. Thus, frustration is primary in the sense that without frustration, there could be no threat.
In the simplest models of AI, the world is seen as a deterministic system. The robot decides to turn left, and so it turns left. It decides to go forward, and it will go forward. As long as the robot understands the basic regularities of the world—for example, that it cannot go through walls—the world is entirely predictable. It may not be entirely controllable, though, because of walls and other nuisances, but there is no uncertainty about what will happen when the robot takes a certain action.
Deterministic modelling was another problem with Good Old-Fashioned AI. In reality, the world is quite unpredictable and not deterministic. An obstacle, such as a human pedestrian, can appear where there was supposed to be none, and the robot cannot go forward. It can start raining and the robot can get stuck in a mud pool. Many unexpected things can happen to human agents as well, often due to other human agents’ unpredictable actions.
In reinforcement learning, such unpredictability was, of course, the basis of frustration: The agent expects a certain amount of reward but does not get it. In Chapter 5, such a prediction was formalized using the definition of mathematical expectation: if the probability of obtaining a reward is 50% and the reward is 10 pieces of chocolate, the expected reward is 5 pieces of chocolate. Frustration meant that the agent computed the expected reward, but the reward was uncertain, and the prediction turned out to be wrong.
In the basic theory of reinforcement learning, only this expectation is used in the prediction, and the fact that there is uncertainty is basically forgotten. However, a really intelligent agent will not be satisfied with just computing the expected reward, which is a single number. It acknowledges that the world is unpredictable, and it will try to understand just how unpredictable any given reward is. It is one thing to predict you get 5 pieces of chocolate for sure, and another thing to predict that you have a 50-50 chance of getting zero pieces or 10 pieces. If you try to describe a lottery, it is rather uninformative to say that each ticket will win 50 cents on the average. Such an average does not have a lot of meaning, and you really want to know what kind of prizes you can win and with which probabilities.
Thus, a sophisticated agent will try to compute the probabilities of all the different amounts of reward that it might get after a certain action. In mathematical terms, it will predict the whole probability distribution of reward. Computing the whole distribution gives the agent much more information to be used in the decision-making: It will be able to make different choices in cases where the expected reward is the same for different actions, but the distributions are otherwise different.
As a fundamental example of how an agent might use the whole distribution of rewards, consider again the case where a reward of 10 chocolate pieces is obtained with 50% probability, so that the expected reward is 5 pieces of chocolate. Such a situation is called a gamble in economic theory, and a lot can be learned about human behavior by looking at what kind of gambles human agents prefer.
So, let us contrast the gamble just defined with a deterministic “gamble” where the agent actually gets 5 pieces of chocolate for sure, without any uncertainty. The basic theory using expectations only says that the two chocolate gambles are equally good, since the expectations are equal. A simple AI agent might use that theory, and if it is given the choice between these two gambles—the 50-50 gamble or the sure-thing gamble— it will not care which one it chooses because it thinks the gambles are equally good. However, this is not at all the case with most humans.
One of the most robust findings in studies of economic decision-making is that humans do not like uncertainty. Most human agents would choose the certain 5 chocolate pieces instead of the 50-50 gamble with 10 pieces. People are even willing to pay to reduce uncertainty: a typical person in an economic experiment might prefer getting only 4 pieces for sure instead of the 50-50 gamble with, possibly, 10 pieces. A gamble with 4 pieces for sure has an expectation which is one piece lower than the 50-50 gamble with 10 pieces (4 pieces vs. 5 pieces); this means the person would be “paying” one chocolate piece to reduce uncertainty. Such a tendency to avoid uncertainty is called risk aversion; it can be evolutionarily advantageous and is observed even in animals.1 In addition to affecting rational economic calculations, uncertainty also feels unpleasant.2 Psychological experiments show that uncertainty can even make physical pain feel worse.3
Of course, risk aversion should not be so dominant that it ruins your chances of getting any reward. Suppose you’re offered a free lottery ticket with which you might win, say, a big chocolate cake. Common sense says that you should take it —disregarding any health issues with eating a whole cake—since you can only win, and there is no cost. However, if you’re really incredibly risk-averse, you should refuse it because the ticket introduces uncertainty. Perhaps you have to wait for a week to know the results, and you would suffer from uncertainty for several days. Few people would be that risk-averse, though. Nevertheless, this example may not be as unrealistic as it seems. Suppose the prize is not a chocolate cake but something you really want, while the chances of winning the lottery are extremely low. It is possible that you would suffer quite a lot from the uncertainty while waiting for the result, perhaps in the form of physiological stress symptoms; an elevated blood pressure might even kill you. Therefore, for some people, it might be better not to accept the lottery ticket. They might regret it afterwards, but that is another story.
The theory of risk aversion is the basis of our definition of threat below. Threat is thus mathematically clearly different from frustration, even if the two concepts are in practice closely related, as we will discuss later on multiple occasions. But first, let us consider the connection between threat and fear.
A threat typically leads to fear, which is central to understanding human suffering. Fear has an obvious connection to self-needs, in particular survival. In fact, it may seem a bit too abstract to talk about suffering as coming from a survival instinct, as I did in Chapter 6: such suffering is usually mediated by a feeling of fear. Fear is actually a multifaceted phenomenon, and we will consider various aspects of fear in later chapters (especially Chapter 10).
Suppose you suddenly find yourself in the presence of a tiger in a jungle. You are likely to suffer at this very moment, but why exactly? It is not that you missed something you wanted to have or some reward you anticipated, so this is not a case of typical frustration. (Nor is it obviously a case of aversion-based frustration, where you didn’t expect something unpleasant to happen but it did, because the tiger hasn’t yet attacked you.) What happens is rather that you are, right now, predicting something terrible to happen in the future, and with a non-negligible probability. Aristotle proposed that “Fear may be defined as a pain or disturbance due to a mental picture of some destructive or painful evil in the future”.4 Here, the “mental picture”, or prediction, of something bad happening is what I consider a threat, which thus causes fear.
I would further argue that a meaningful definition of threat requires uncertainty: It must be possible to avoid the bad thing that is included in the threat. If the bad thing in the future is completely certain to happen, it is something different from a threat, and the ensuing feeling is something different, often described as resignation. Cassell said that “to suffer, there must be a source of thoughts about possible futures”, where I would emphasize the fact that ”futures” must be in plural: the future is not certain and fixed, but different outcomes are possible, and the agent can exercise at least some amount of control on the outcomes.5
Combining the mathematical theory of risk aversion with Aristotle’s and Cassell’s philosophy, we can now approach the modelling of threat. We might initially think about threat as a prediction that there is a sufficient probability of a very small future reward—here, “very small” would typically mean a negative reward of large absolute value. In this way, the concept of threat can be directly linked to the pursuit of any kind of rewards, not only internal ones such as physical safety considered earlier. You might be threatened by a large monetary loss, for example.
Consider again the gamble seen above, where there is a 50% probability of the agent getting 10 pieces of chocolate and 50% probability of not getting any. Now, let us create another gamble to illustrate a probability distribution that is relevant to threat in the particular sense we are interested in. In this new gamble, the agent has 50% change of getting the 11 pieces of chocolate, 49% change of getting nothing, and 1% chance of being charged a penalty of 50 chocolate pieces (in this world, chocolate seems to act as a common currency). Here, we see that there is a great threat to the agent of losing chocolate in the form of the penalty. On the other hand, I changed the main reward from 10 to 11 pieces so that the expected reward is exactly the same as in the earlier 50-50 gamble (the expectation can be calculated as 0.50 × 11 + 0.49 × 0 + 0.01 × (-50) = 5). So, the two gambles are only distinguished by the general distribution of reward, while the expected reward is the same.
Now, it is intuitively compelling that in the gamble with penalty, the agent should behave in a slightly different way since there is the threat, or the risk, of the penalty being charged. It should be “afraid” of the penalty of 50 pieces happening, and try do find a course of action that avoids the penalty, presumably by trying to avoid this gamble in the first place. While this may be intuitively clear, I emphasize that it is only the case if the agent has been programmed to be risk-averse in this particular way, i.e., “threat-averse”. A very simple agent would behave in the same way in these two chocolate scenarios (as well as the sure-thing scenario considered earlier), since it would not understand anything about risks or threats. Even a more sophisticated agent that understands something about uncertainty might not make any difference between the two gambles since both have a lot of uncertainty. But a human-like agent that has been programmed to avoid large losses, that is, large negative rewards, would avoid the latter gamble that includes such a strong threat.6
Such threats are widely discussed in the economic literature. Consider investing in a company. One company is quite stable: you can be sure that the return on investment is 5%. Another promises 10%, but you know that it also has a 5% probability of going bankrupt so that you lose all your money. Again, the expected return on your investment is the same (up to rounding errors), but there is a much larger risk of loss in the latter case. Most humans prefer the first, stable company since they want to avoid the “threat” of bankruptcy.7
To arrive at the final definition of threat, we still need to define what level of possible reward is so small (or so negative) that it actually can be called a threat. In other words, what is a suitable baseline? We can actually borrow the baseline from the definition of the reward prediction error and reward loss, thus comparing the different possible rewards with their expectation. In that case, threat would be the same as a very large reward loss happening with sufficient probability. The crucial difference is that a reward loss (or RPE) is typically computed only after the action, or after the fact, so to say. However, as a very intelligent agent will try to predict any relevant quantities, it would also try to predict the reward loss before it actually acts or the reward loss happens.8
Furthermore, as always in reinforcement learning, the agent should take into account all the future rewards, and look at the distribution of total future reward, not just the reward in the next time step. In earlier chapters, we considered the expectation of total future reward, which is given by the state-value function, but now, we thus need to model the whole probability distribution of total future reward. That is, the single number given by the state-value is replaced by the probabilities of all possible future outcomes of future reward, starting from the current state. In the rest of this chapter, we thus assume the agent is sophisticated enough to actually compute the whole probability distribution of total future reward, or at least something more than just its expectation. In the simple chocolate gambles, the agent should understand that getting a certain amount of chocolate has a certain probability, and not getting any has another probability. In a more realistic scenario where the agent chooses actions at many time points (think about navigation by a robot), it will consider the long-term consequences of its actions by trying to learn the distribution of future rewards for each state, thus going beyond simple state-values. Modelling the whole distribution of total future reward in addition to its expectation is, in fact, a rather recent development in reinforcement learning theory.9 Obviously, this is computationally very challenging and needs a lot of data where all those different outcomes are realized.
Putting this all together, we arrive at a definition of threat as a prediction of sufficiently probable and large reward loss, where the reward loss is computed over the total future reward.10 This definition is very general: it means that threats can come from many different sources. In the preceding chapter, we already briefly mentioned the concept of threat in terms of death and tissue damage, but those are now seen as simply special cases of this general concept of threat, seamlessly integrated to the general reinforcement learning framework. Still, it is true that the biggest threats may be related to survival and self-image, as will be discussed below.11
Threat as defined above is in many ways different from frustration. To summarize, threat is about a prediction of something bad that might happen, while frustration is about realizing that something did go wrong; a threat is mainly used for choosing immediate actions, as will be considered in more detail in Chapter 10, while frustration is a signal for learning. One might further say that a threat is about the future, while frustration is about the immediate past, but this might be oversimplifying since frustration can sometimes refer to mere changes in expectations of future rewards.12
Threat produces a subjective feeling, typically in terms of fear, which is also very different from frustration. This is logical since the computations underlying threat are different from those underlying frustration, and especially the way threat influences behavior and learning must be very different. Thus, fear has to produce a different kind of signal, even if both frustration and threat signals lead to suffering.
Still, frustration and threat often come together. Let’s go back to the case where a tiger appears in front of you. It might eat you and produce a great loss of future rewards, but this is not certain since you might still be able to escape; in this sense, there is a threat but no frustration yet. But there is frustration in the sense that you certainly would have preferred that the tiger does not appear, that is, you wanted to live a peaceful life where tigers are remote, and that desire is now frustrated. In this example, the planning system can amplify the frustration, because planning may be launched with the goal state being any state where the threat is not present: you are frantically thinking about what to do to be safe. Planning is attempted, but it fails: no plan is found that would get rid of the threat, or if such a plan is found, its execution fails. Thus, arguably there is frustration even in the sense of plans failing.13
Another interesting interplay of fear and frustration can seen in the fear of frustration that arises at the moment of making decisions. A person can be afraid of choosing the wrong flavor for his ice cream and spend an embarrasingly long time in the decision-making process. His brain may correctly predict that a frustration will happen in the future if it turns out that he does not like the flavor that much after all. Such a fear might be present surprisingly often when humans make decisions.14
Another intriguing connection is that the very reason why humans are risk-averse can be understood based on frustration of internal rewards, as introduced in Chapter 6. If the agent has a lot of uncertainty about the state of the world, it will find it more difficult to reach its goals or obtain rewards. Thus, uncertainty in itself is something that should be avoided. We saw above that this is exactly what humans do; it is the very essence of risk aversion. We can interpret this phenomenon from the viewpoint of internal rewards. Since uncertainty is bad for future reward, it would clearly make a lot of sense to program an internal reward system that gives a negative reward when the agent is in a state of a lot of uncertainty. Therefore, it may not be surprising that uncertainty creates suffering in itself as well, which is the basis of risk aversion.
In fact, such a logic of internal rewards goes much beyond risk aversion. We can consider unpredictability and uncontrollability in the same framework as uncertainty. All these properties are bad for future rewards, and they increase frustration. This fundamental idea will be considered in detail in later chapters: if the world is, say, uncontrollable, frustration is difficult to avoid. Thus, it could very well be that uncertainty, unpredictability, or uncontrollability are suffering in themselves because they lead to frustration of specific internal rewards. If, say, controllability is lower than some expected standard, a frustration signal could be launched. That would be useful for learning because it signals that the agent has failed in learning about the environment; it should not have gotten itself into a situation where controllability is that low. This is equivalent to a self-evaluation system which considers that the agent should not be in situations that are uncertain, difficult to predict or difficult to control. This is how uncontrollability, as well as uncertainty and unpredictability, can directly lead to suffering. Nevertheless, this tends to happen in states where a threat is observed, according to our definition, since a threat is nothing else than a form of uncertainty.15
This gives an alternative viewpoint of threats, completely reducing them to frustration of internal reward systems. When uncertainty and uncontrollability reach high levels, such an internal reward system gives negative rewards, which produces frustration. However, this account clearly explains only part of what a threat is about. While it cannot be denied that uncertainty and uncontrollability do lead to frustration, the suffering due to a threat simply does not feel the same as frustration: it is more like fear, anxiety, or stress. Thus, such a reduction of threat to frustration is not quite satisfactory, and justifies the separate definition given earlier in this chapter.16
A simple AI agent might only generate the suffering signal when something bad happens, such as when it fails in its tasks—this is the basic case of frustration. Suppose a thermostat connected to a heating system tries to keep the room at a constant temperature. (This is actually a task that the nervous systems of many animals face as well.) It continually monitors the room temperature and adjusts its actions accordingly. Its function is based on a simple error signal created when the room gets too hot or too cold. When the temperature is suitable, there would be no error signals whatsoever, and certainly no suffering.
Now, suppose you make the thermostat very intelligent, so that it is able to predict the future, compute threats, evaluate itself, perhaps even think about its own survival. Then, it might not only suffer when the room temperature is wrong but also when it anticipates that that might happen. Your hyperintelligent thermostat might be reading the weather forecast on the internet. Suppose the forecast says that tomorrow night will be exceptionally cold, beyond the capacities of the heating system. Then, the thermostat anticipates that tomorrow night it will not be able to keep the temperature high enough. Thus, the thermostat suffers due to such a threat—at least in the computational sense.
The extraordinary thing here is that the hyperintelligent thermostat suffers even long before anything bad happens, before, say, actual frustration is produced, merely by virtue of the newly appeared anticipation of possible negative reward. This is perceived as a threat, and produces fear. Becoming more intelligent means the agent can perform computations related to threat, suffer based on those computations, and thus suffer much more than it did earlier. Furthermore, if the thermostat realizes it is unable to properly control the temperature in the future, the uncontrollability may trigger a negative internal reward, and a reward loss. If this happens often, the self-evaluation system might conclude that it is not performing its central task well enough, thus leading to frustration due to the self-evaluation. It is possible that if the thermostat fails to keep the temperature constant, it will be thrown into the garbage bin, and a hyperintelligent thermostat might even worry about its own survival.
“One who fears suffering is already suffering from what he fears” according to Michel de Montaigne.17 Humans suffer enormously because they are too intelligent in this sense, and prone to thinking too much about the future—a theme I will return to in Chapter 11 where I talk about simulation of the future. Yet, if we humans are so incredibly intelligent, why cannot we just decide not to fear anything? Why cannot we take Montaigne’s point seriously: He suggested—actually talking about his chronic pain due to kidney stones—that there is no point in imagining or anticipating future pain since that simply induces more suffering. This is a complex question where part of the answer is the dual-process nature of human cognition, which will be treated in the following chapter.