Chapter 9
Thoughts wandering by default

The moment you lie down on a sofa to relax, your head starts developing different fantasies and daydreams, perhaps wondering why you did such a stupid thing yesterday, or planning what you want to eat tonight. Even when you try to meditate and not think about anything (which is a typical instruction for beginning meditators), you will almost inevitably find yourself thinking about something else after a while. There is a good reason why the human mind is often compared to a monkey in meditation traditions. It jumps here and there, making all kinds of noises, and never seems to rest. Likewise, based on his own method of introspection, David Hume concluded: “One thought chaces another, and draws after it a third, by which it is expelled in its turn.”1

Thoughts that come to your mind when you are trying to concentrate on something else are called “wandering thoughts”. They have some similarities with emotional interrupts: they stop ongoing mental activity and capture attention. Thus, they reduce the control you have over your mind and, eventually, increase suffering. However, the computational underpinnings are quite different in the two cases. In this chapter, I discuss how wandering thoughts are related to the need to repeat experiences for the purposes of iterative learning algorithms, as well as planning the future through search in a tree. Thus, there is an evolutionary reason why we have wandering thoughts: they are not just pointless activity triggered by mistake, as it were.

Wandering thoughts and the default-mode network

Wandering thoughts tend to appear whenever a person tries to focus on a single task or object for a long time. Everybody has encountered a situation, perhaps at school or at work, where she tries to concentrate on something but soon finds herself thinking about what she should say in a job interview tomorrow, or what she did on a previous vacation. Typical tasks where such sustained attention is necessary, but difficult to achieve, are driving a car on a highway, trying to read a book for an exam, or monitoring a screen as in air traffic control or surveillance. Importantly to the theme of this book, sustained attention is essential in most meditation practices. If you are lying on a sofa and have nothing else to do, meandering thoughts about various things are fine and sometimes even enjoyable. However, when you are actually trying to concentrate on a task, unwanted wandering thoughts reduce your performance of the task at hand.2

Psychological experiments confirm the ubiquity of wandering thoughts. Various experiments can be devised where the participant’s task is to monitor a stream of information and report when a rare prespecified event occurs. In a typical experiment, you would be shown random digits (0 to 9), and you have to press a button when you see a target digit, say, 3. The experiment is deliberately designed to be boring so that the participant’s mind will certainly start wandering at times. The basic idea of monitoring for an event that is rare is reminiscent of some of the typical real-life tasks listed above (e.g. driving a car on a quiet highway), where nothing much happens most of the time, and sustained attention is difficult. The experimenters would then use a method called experience sampling, which means they ask, at random intervals, whether the participant was focused on the task or whether they had wandering thoughts. It typically is found that the participant’s performance on the task fluctuates between better and worse; this fluctuation largely reflects whether they had wandering thoughts at that particular time point or not.3

Such experiments can be conducted even when the participants are living ordinary everyday lives. The participants would have a device, such as a mobile phone, which asks at random intervals whether they were focused on whatever task you were performing (such as working, studying, cleaning, driving, etc.) or whether they had wandering thoughts (such as daydreaming, fantasies). It is typically found that during everyday life, the mind is wandering quite a lot: one third, or perhaps even one half of the time.4

Much of brain activity is spontaneous

At the same time, modern neuroimaging confirms the prominence of various kinds of spontaneous brain activity, i.e. activity that “just happens” without any external stimulation or task being performed. In fact, an amazing finding in recent neuroscience is that if you measure human brain activity when the participants of the experiment are simply told to sit or lie still and think about nothing in particular, their brains are far from quiet. Technically, neuroscientists talk about “resting-state” to characterize such a state of not doing anything in particular, since the participant may think she is having a rest—but the brain is definitely not.5

A particular network in the brain is actually even more active during rest than during active tasks. It is called the default-mode network because it seems to be activated “by default”, i.e. when there is no particular reason for anything else to be activated.6 It is also deactivated once the person is stimulated by, for example, sights or sounds from the external world, so that the brain actively starts processing incoming information.

The discovery of the default-mode network around the year 2000 was something of a revolution in human neuroscience. It was completely at odds with the classical way neuroscience experiments were done: the experimenter would instruct an experimental subject to observe some stimuli (e.g. a sequence of digits as we saw above) and possibly perform a task at the same time (e.g. press a button when a target digit appears). Here, in contrast, you don’t tell the participants to do anything and don’t give them any sensory stimulation, such as showing pictures. Then, it is the default-mode network that becomes activated. In fact, since it is silenced (i.e. deactivated) by sensory stimulation and tasks, the experimenters had better not give any stimuli or tasks to be able to observe it.7

It is widely assumed that the default-mode network supports wandering thoughts.8 That would explain why it is particularly activated when the subjects do not receive any stimulation and have no particular task: then, the mind will easily start wandering. It is likely that the default-mode network has other functions as well, although we don’t know very well what they might be.9

Wandering thoughts as replay and planning

The existence of wandering thoughts may feel completely normal to us, but actually, it is rather surprising that the whole phenomenon exists. Why should it be difficult to concentrate on one thing for a long time? Why cannot I just decide to focus on reading a textbook for an exam, say for two hours, without any interruption by any unrelated thinking?10

One intuitively appealing explanation would be that your active neurons—in the exam-reading example, those needed for reading—get “tired”, i.e., somehow run out of energy. Then, other neurons which are full of energy will be able to somehow steal the attention. While there may be some truth in such an explanation, it is not very plausible because sometimes you can concentrate without any problems on a task, especially on a task which is really engaging, such as reading a book you really like (not for an exam), or playing a video game. Furthermore, should not such fatigue of neurons rather lead to having no thoughts at all? It is more plausible that wandering thoughts are actually doing some useful computation—and that they are something that you would like to program in an AI.

So, let’s think about what kind of computational problems could be solved by wandering thoughts. One problem we have seen earlier is that learning typically needs many repetitions of the inputs and the desired outputs, being based on iterative algorithms, as we saw in Chapter 4. Even the very same inputs and outputs may need to be presented many times to the learning algorithm. This is why one thing that modern AI systems have in common is that most of the computing capacity is used for learning. At the same time, planning takes a lot of time as well, as we saw in Chapter 3.

So, as much of the computing capacity as possible should be directed to these learning and planning activities. In particular, when the agent does not receive any special stimulation from outside, there is nothing important for it to do, and no urgent threats are detected, the computing capacity of the agent is free to be used for any internal processing based on previously acquired data—intrinsic activity, in the terminology of neuroscience. In fact, in order not to waste that computational capacity, computations related to learning and planning should be launched. That will enable the agent to act more intelligently when the time to act comes. This is also presumably why evolution has programmed wandering thoughts in us.11

Next, I consider in detail two different ways in which wandering thoughts can help in computation. In the first, the system is planning future actions by internally simulating the world, and trying out different, new actions to see which works best.12 The second one is called experience replay because the system internally repeats memories of past behaviours and events exactly as they were perceived, in order to enable an iterative algorithm to efficiently learn from them.13 In fact, a lot of what people simply call “thinking” falls into these two categories: You plan what to do in the future, and recall what happened to you in the past.14

Planning the future

It is perhaps obvious why thinking about future actions is useful, as far as it is a case of planning. You can go through different kinds of plans and simulate, using your model of the world, what the results of your actions will be, and finally, choose the best one. In the case where you think about your job interview to take place tomorrow, you polish your answers beforehand by simulating what kind of impressions different options will make, eventually memorizing the best ones. Often, such thinking and planning is actually completely voluntary. If you really want to spend some time and energy to elaborate the best course of action, this is quite normal planning activity. When we talk about wandering thoughts, we mean a case where you consciously try to do something other than planning, but unrelated thinking nevertheless appears. It is the unwanted, intrusive quality of wandering thoughts that distinguishes them from ordinary thinking.15

You might actually want to relax and read a novel, but thoughts simulating the job interview just pop into your mind. This is understandable since as I just argued, it is especially during moments where you or the AI have nothing pressing to do that it would be a good idea (from the viewpoint of the designer of the system) to use the computing capacity for such planning. As we saw in Chapter 3, planning paths grow exponentially as a function of time, so there is a real need for using a lot of computation for planning.

The planning during wandering thoughts is a bit special in that it sometimes has no particular goal. It may be just looking at possible future paths in a big search tree to see what could be done to obtain rewards: a kind of ongoing, free-style planning. Such a search could actually be done by the Monte Carlo Tree Search algorithms (discussed in Chapter 7): they are randomly searching for plans, while focusing more on branches which seem to be more rewarding. It’s a bit like thinking about what to do during the weekend when you’re supposed to be concentrating on your work. Sometimes wandering thoughts do focus on planning for a specific goal: One theory proposes that wandering thoughts focus on goals that have been selected but not yet reached.16 Typically, when you have been thinking about a difficult problem for a while without finding a solution, it will be difficult to relax and think about anything else, since that problem will constantly intrude on your mind.17

Experience replay for learning value functions

In contrast to planning, it may be more difficult to understand why any system would like to simply repeat past experiences. You already saw what happened yesterday, so why repeat it in your mind, and why so many times? The reason is in the structure of the algorithms used in learning.

As we saw in Chapter 4, modern AI systems are based on learning from the data by using iterative algorithms. We saw the general idea of stochastic gradient methods: the data points (e.g. images) are presented to the system one by one, and a huge number of repetitions is needed. Most reinforcement algorithms are not, strictly speaking, stochastic gradient descent methods, but are closely related and share those properties. They proceed by observing the state of the world both before and after each action, as well as any reward obtained or punishment received. There are thus four pieces of information in what we might call a single “data point”: the state before the action, the state after the action, the action taken, and the reinforcement. Based on these, the system updates the state-value function.

What is crucial here is that, again, learning proceeds by making tiny modifications to the parameters of the system, in this case those computing the state-value function. Successful learning usually requires a huge number of iterations, or presentations of such actions and their consequences to the learning system. If you have access to really large amounts of data, you may just present each data point once, and learning will be successful since the algorithm will have enough iterations anyway. However, the amount of data is typically limited. In the case of reinforcement learning, what is particularly slow is that the agent may need to act in a real environment and observe the consequences of its actions to gather data. One action by a robot can take a second or so, which is extremely slow compared to the processing speed of most computers and the potential speed of learning. Likewise, humans do not collect new experiences on, say, job interviews, that often.18

This is where experience replay is useful. It means that in reinforcement learning, the system is not just using the data related to the most recent action and then throwing it away; instead, it stores the data, and re-uses past actions and the states associated with them many times. That is, it “replays” or recalls past actions and events and uses them in the iterative learning algorithm as if they happened now. This improves the performance of the learning algorithm by enabling it to make many more iterations with each data point, and thus many more iterations for the same limited amount of data. This is how more information is extracted from the data. Another great utility is that usually the agent can retrieve past events from memory much faster than actually acting in the world, and thus replay enables learning much faster. There are other reasons as well, as we will see below.

Obviously, there is a trade-off here: If you just use all your time replaying old events from memory, you will not get new data about reinforcement resulting from actions. So, you cannot use all your time for just replay. It should be smart to engage in replay when the environment does not enable too many meaningful actions — in plain English, when nothing interesting is happening and the agent is “bored”, which points directly at wandering thoughts.

It is also possible to do something between pure replay and planning. You can replay past events while trying out different actions in a simulation. This means the system starts by recalling something that happened earlier, but then it simulates what would have happened if it had acted differently. Certainly, we all have experienced such wandering thoughts: “If, yesterday, in that situation, I had done X instead of Y...” This is even better than just replaying actual past events since the system is then creating new data using past events together with its model of the world.19

Experience replay focuses on reinforcing events

Any replay method must choose which events, or short “episodes” of events, it will replay. A system that has gathered a lot of data on past actions cannot replay all its history. Likewise for planning: if the system starts planning in its idle time, it needs to choose the starting state for its plan—what kind of situation does your fantasy start in—and perhaps a goal as well.

A dominant idea in AI is that replay should prioritize events where any kind of reinforcement signal was obtained, whether positive or negative, and this seems to be the case in the brain as well. Experienced episodes containing such events are the most important in computing the state-value function. This may help explain why we have so many wandering thoughts about negative events. When you do something embarrassing, it may replay in your mind many, many times. This should be useful so that you learn to associate the negative reinforcement (social embarrassment) with the actions you took in that particular situation, thus improving your estimate of the state-value function—and future behaviour. A similar mechanism might be at play when choosing the starting states for planning, although current theory does not explicitly explain that.

It has been found that replay of past events can be particularly useful if the experience is replayed backwards, starting from reinforcing events. Suppose a robot gets a particularly nice reward (say, a lot of energy in its batteries) whenever it finds itself in room #42 of a building where it cleans the floors. Based on this experience alone, it will immediately assign a large state-value to room #42. But in order to find room #42 in the future, it has to code its location with respect to the other rooms in the state-value function. This is easy to do if it replays its path to room #42 in reverse order. Suppose just before arriving in room #42 it was in room #13, and before that, room #21. It replays the sequence in reverse: #42, #13, #21. Now, it will assign large but slightly decreasing state-values to each of these rooms, so that the state-value is decreasing the further the replay goes—the decreases are justified by the theory of discounting. The end result is that while #42 has the largest state-value, #13 has a rather large one as well, and #21 is not far behind. Now, if the agent ever finds itself again next to room #21, it knows that to find a state with a large state-value, it should enter room #21, and there it will understand the best choice for the next state is #13, and eventually #42. (It may sound like all this could be learned by a single replay, but in reality it must happen by smaller increments to properly combine information from many different paths and data points.) Combining such backward replay with the above-mentioned prioritization of reinforcing events leads to a method called “prioritized sweeping”.20

If wandering thoughts use such a prioritizing form of replay, we see that they are closely related to the theory of emotions as interrupts discussed in Chapter 8. Both mechanisms direct the agent’s processing (one might say attention) towards dangerous or rewarding events. Emotional interrupts are more primitive, typically focused on easily identifiable and evolutionarily important threats which are present in the current state. In contrast, wandering thoughts are about learning when no threat is currently observed, potentially leading to quite sophisticated behaviours.21

Replay exists in rats, humans, and machines

Replay has long been observed in neuroscience experiments. Typical experiments measure brain activity in rats which are running in a maze, seeking food or drink. A brain area called the hippocampus is specialized in storing episodes and events —such as the sequence of running forward, turning left or right, and finding cheese. It is thought that the hippocampus replays such episodes, simultaneously signalling them to other brain areas, which then use such replayed input for learning. Replay was initially observed during sleep, but it can also be seen in awake rats.22 Recent experiments also show that something similar to prioritized sweeping, where the events are replayed backwards, seems to be happening in rodents.23

Research has also found brain activations that look like planning: a rat can initiate sequences of events which it has not yet experienced, but which it might perform in the future. For example, the rat can in some sense “imagine” a possible trajectory in a maze, which it may or may not follow later.24 So, the mammalian brain seems to use strategies which are very similar to what you would expect from the design considerations of AI. This is not surprising since the brain and AI are trying to solve the same computational problems; but it is also the case because the AI designs have been influenced by our knowledge of what happens in the brain.

It may in fact be that such processing in rats is not very different from wandering thoughts considered in human psychology. Something at least resembling replay by prioritized sweeping can also be observed in the human brain, although the limitations in measurement technology make it difficult to draw exact parallels.25 While replay is usually connected with the hippocampus, and planning with the default-mode network, the hippocampus is actually part of the default-mode network according to some definitions.26 (Rats do have a default-mode network just like humans.27) The connection between wandering thoughts and the hippocampus is also seen in the fact that people with damage in the hippocampus have difficulties in imagining new experiences.28

Some scientists are reluctant to make such parallels between hippocampal replay and wandering thoughts, since they would seem to imply that rats “think” or “imagine” like humans, at least in the sense that rats engage in planning by imagining different sequences of actions and choose the best one.29 Likewise, we immediately run into the question of whether such replay in an AI means that we would have to admit that an AI can “think”. “Thinking” is not a well-defined concept in either neuroscience or AI, which makes this question difficult to answer.30

Creative thinking

The discussion so far considers wandering thoughts as rather mechanistic solutions to some well-defined computational problems. This does not do justice to the variety of wandering thoughts in humans. Spontaneous thinking can be tremendously creative; in fact, it is one of the critical aspects of human creativity.

Now, what is creativity? As a first approach, we might actually think of planning as a creative activity. You have the current state, a goal, and you have to somehow create a path between the two. In fact, many different kinds of problem-solving could be seen as special cases of such planning: even proving a mathematical theorem can be formalized as planning a “route” from the premises to the conclusion of the theorem. However, some would argue that this is just running an algorithm, so it cannot be called creative. I wonder why running an algorithm could not be called creative. What else does an intelligent system do anyway? On a sufficiently high level of abstraction, is not all our thinking a product of various kinds of algorithms? I shall not attempt to answer the question of what creativity is here; I will just note that creativity is not easy to define; it is in that sense very similar to the concept of intelligence.

I think a randomized algorithm, such as Monte Carlo Tree Search, could be quite convincing as an example of creativity. Such algorithms contain certain randomness in their computation which makes the algorithm try out completely new paths or ideas. They are not just deterministically finding a single solution to a given problem, but rather creatively imagining, as it were, a number of possible things to do, or steps towards a solution to the problem. As we saw above, such randomized algorithms have been very successful in game-playing AI, and they also offer a plausible model for some of the wandering thoughts. From this viewpoint, it is natural that the computations performed by wandering thoughts can also result in creative problem-solving.31

In fact, there are also some wandering thoughts that cannot be plausibly considered as replay or planning. Perhaps, while lying idly on your sofa, you have a series of seemingly unrelated mental images, or a superhero fantasy that could never actually happen in reality. One function of such wandering thoughts may be to create completely new ideas and associations, even new goals. In AI approaches to creativity, a “generate and test” approach is wide-spread: it means that new items are more or less randomly generated by one part of the system, and then another part of the system tests whether they make any sense. Unrealistic, weird, and unstructured wandering thoughts could be the result of such random generation; hopefully, our more rational part then tests them and decides which ones make any sense and should be taken seriously.32

Wandering thoughts multiply suffering

So far, we have seen that while mind wandering may be detrimental for whatever you’re trying to do at the present moment, it helps in planning and learning, perhaps even allowing some creativity. From a purely information-processing viewpoint, it is probably a useful thing since similar ideas are currently used in AI systems, and after all, evolution would not have “programmed” us to have a wandering mind if it were not useful to us from the evolutionary viewpoint.

Yet, evolution does not try to make us happy. A problem with replaying past memories and planning the future in human brains is that we are, on some level, unable to understand they are not real. If you remember an embarrassing episode from the past, you actually feel embarrassed. If you think about something scary that might happen to you tomorrow, you actually start feeling scared. That is, wandering thoughts increase human suffering by making us suffer from simulated or replayed events, in addition to the real ones.

Any suffering produced by real-life events may, in fact, be repeated many times by the replay of those events. Likewise, if something unpleasant is expected to happen, the unpleasantness, the threat, is felt many times in planning how to avoid that thing—which may actually turn out not to happen at all. Planning future events even includes frustration when things in the fantasy don’t go as you would like them to, and you can be frustrated many times by the planning of a single event. Due to this multiplication of suffering by wandering thoughts, it could be argued that the vast majority of our suffering actually comes from remembering or anticipating unpleasant events. The anticipation is closely related to what we said about fear in the preceding chapters, but the aspect of replaying unpleasant memories is new.

Importantly from the viewpoint of suffering, you have little control regarding such wandering thoughts. You may think that you must have decided to recall an unpleasant conversation, but in fact, the recollection and replay just started without you deciding anything, and even if you try to think about something else, you may find yourself unable to do so. This is another clear connection to the emotional interrupts: both wandering thoughts and emotional interrupts are largely beyond conscious control. You cannot switch off those systems. In fact, it is even worse: both systems actually take control of the whole agent.

Some research actually claims that the wandering mind is generally unhappy. That sounds plausible if wandering thoughts multiply suffering, as I just argued. However, it might be a bit of an overgeneralization.33 Whether wandering thoughts make you unhappy probably depends on their contents. It might seem obvious that having wandering thoughts with negative feelings, such as worrying, makes you unhappier, while positive content has the opposite effect. In what is a rather extreme case, a study found that women having wandering thoughts about their significant others actually felt happier.34 Close to the negative extreme, we find rumination, which is thinking about negative events that typically happened in the past and are related to one’s personal concerns.35 It is particularly frequent in depression and, unsurprisingly, leads to low mood. For individuals with depressive tendencies, most wandering thoughts may consist of depressive rumination, and eventually may lead to relapse and full-blown depressive episodes; even for normal individuals, wandering thoughts provide an opportunity for rumination to arise, and thus may lead, on the average, to negative mood.36 In spite of some reservations, therefore, I think an important point is made in claiming that a wandering mind is an unhappy mind; we will get back to this important point when talking about meditation in Chapter 15.

Why do wandering thoughts trigger feelings?

Replaying negative experiences, or planning the future, might not have anything to do with suffering if they did not somehow feel unpleasant, i.e. if they did not activate the negative valence signalling. A person may have reoccurring wandering thoughts about going to the dentist and vaguely feel the pain that the dentist’s tools will cause in her mouth. Isn’t it odd that she feels the pain although she is not at the dentist at all? While you probably have to go to the dentist one day, people also worry about the possibility of various disasters that are not at all likely to happen to them. Let me repeat Montaigne’s comment: “One who fears suffering is already suffering from what he fears”.

Thoughts rarely correspond to something that is actually happening here and now, as opposed to perceptions. Almost by definition, our thinking is usually about past events which are no longer there, or future events which have not yet happened, and may not happen at all. Why do we then feel upset about them, or, from a more computational viewpoint, why do they activate negative valence signals? Indeed—this is a deep question that we encounter several times in this book—why do we feel the emotions associated with memories and imagination?

From the viewpoint of computational design, it is clear that the system that computes state-values and predicts rewards has to be active in wandering thoughts, at least to some extent, so that the brain can take its evaluations into account when planning and learning. What does not seem necessary is that we actually, on a visceral level, feel pleasant or unpleasant about the events produced by planned actions. Why do our bodies react to our fantasies as if they were true? I suggest this is a kind of a computational shortcut. If you want to make learning from the simulation as simple as possible, it makes sense to use the same mechanisms and networks as in the case of real data. This is possible if the AI or the brain doing the reinforcement learning is fed the same kind of signals and into the same networks regardless of whether the action is real or simulated.

Ultimately, combined with the hypothesis that the error signals are best broadcast to the whole brain using the pain system (Chapter 2), such computational simplification seems to have led to a situation where in the brain, it is not possible to give an “unpleasant” signal to the planning system without activating the main system that signals suffering to the whole system. In other words, perhaps humans feel suffering during negative wandering thoughts simply because it makes the design of the learning system easier.37

Here we see a particularly striking conflict between evolutionary goals and happiness. Suffering from the simulation of negative events may be a computational shortcut, which is not really that necessary. It is just that the brain was “designed” by evolutionary forces which do not care if the system design makes you suffer many times more; they just found this design to be functional for their own evolutionary purposes.