Chapter 8
Fast and slow intelligence and their problems

In this chapter, we delve deeper into the distinction of two different modes of information processing in the brain, which coincide with those in modern AI. They were already discussed in Chapter 4: neural networks and Good Old-Fashioned AI. The idea of two complementary systems or processes is, in fact, ubiquitous in modern neuroscience and psychology, where it is called the “dual-process” or “dual-systems” theory. It is assumed that the two systems in the brain work relatively independently of each other while complementing each other’s computations. The two systems, or modes of operation, roughly correspond to unconscious processing in the brain’s neural networks, and conscious language-based thinking. Each of the two systems has its own advantages and disadvantages, which is the main theme of this chapter and, in fact, a theme to which we will return many times in this book. Neural networks are based on learning, which means they need a lot of data and often result in inflexible functioning. On the other hand, the computations needed in GOFAI may be overwhelming, as in planning. On the positive side, we will see how the advantages of the two systems can be combined in the action selection of a real AI system. Using categories is crucial for GOFAI, and we conclude by discussing the deep question of the advantages and disadvantages of such categorical processing and thinking.

8.1 Fast and automated vs. slow and deliberative

Let us start with the viewpoint on the two systems given by cognitive psychology and neuroscience.1 According to such “dual-process” (or “dual-systems”) theories, one of the two systems in the brain is similar to the neural networks in AI: It performs its computation very fast, and in an automated manner. It is fast thanks to its computation being massively parallel, i.e., happening in many tiny “processors” at the same time. It is automated in the sense that the computations are performed without any conscious decision to do so, and without any feeling of effort. If visual input comes to your eyes, it will be processed without your deciding to do so, and usually you recognize a cat or a dog in your visual field right away, that is, in something like one-tenth of a second.2 Most of the processing in this system is also unconscious. You don’t even understand how the computations are made; the result of, say, visual recognition just somehow appears in your mind, which is why this system is also called “implicit”.

The processing in the conscious, GOFAI-like system is very different. To begin with, it is much slower. Consider planning how to get home from a restaurant where you are the first time: you can easily spent several seconds, even minutes, solving this planning task. The main reason is that the computations are not parallelized: They work in a serial way, one command by another, so the speed is limited by the speed of a single processing unit. In humans, another reason why symbolic processing is slow is, presumably, that it is evolutionarily a very new system, and thus not very well optimized. Other typical features of such processing are that you need to concentrate on solving the problem, the processing takes some mental effort, and it can make you tired. Such processing is also usually conscious, which means that you can explain how you arrived at your conclusion; hence the system is also called “explicit”.3

Note that in an ordinary computer, the situation above is in some ways reversed, as already explained in Chapter 4. A computer can do logical operations much faster than neural network computations, since logical operations are in line with its internal architecture. In fact, a computer can only do neural network computations based on a rather cumbersome conversion of such analog operations into logical ones. Analogously, the brain can only perform logical operations after converting them into neural network computations, which is equally cumbersome.

To see the division into two systems particularly clearly, we can consider situations where the two systems try to accomplish the same task, say, classification of visual input. We can have a neural network that proposes a solution, as well as a logic-based system that proposes its own. Sometimes, the systems may agree; at other times, they disagree.

Suppose a cat enters your visual field. When the conditions for object recognition are good, your visual neural network would recognize it as a cat. In other words, the network would output the classification “cat” with high certainty. However, when it is dark, and you only get a faint glimpse of the cat that runs behind some bushes, your neural network might not be able to resolve the categorization. It might say it is probably either a cat or a dog, but it cannot say which. At this point, the more conscious, logic-based system might take over. You recall that your neighbour has a cat; you don’t know anybody who owns a dog near-by; you think this is just the right moment in the evening for a cat to hunt for mice. Thus, you logically conclude it was probably a cat. In this case, the task of recognizing an object used the two different systems, working together. The logic-based one took quite some time and effort to use, while the neural network gave its output immediately and without any effort. Here, the systems were not completely independent, since the logic-based system did need input from the neural network to have some options to work on.

The two systems can also disagree, as often happens in the case of fear. Talking about fear and related emotional reactions, people often call them “hard-wired”. This expression is not too far from reality. What happens is that the brain uses special shortcut connections to relay information from the eye to a region called the amygdala, an emotional center in the brain. This shortcut by-passes those areas where visual information is usually processed.4 If such a connection learns to elicit fear (due to a previous unpleasant encounter with some animals, for example), it will be very difficult to get rid of it. Any amount of reasoning is futile, presumably since the visual signal triggering fear is processed by completely different brain areas than logical, conceptual reasoning. Often, the logic-based system loses here, and the neural-network-based fear prevails. This division into two processes also explains why it is difficult for us to change unconscious associations, such as fear: the conscious, symbolic processing has limited power over the neural networks.

Interestingly, people tend to think that the main information processing in our brain happens by the conscious, symbolic system, including our internal speech and conceptual thinking. But what if that is simply the tip of the iceberg, as early psychoanalysts5 claimed more than a hundred years ago? The idea that most information processing is conscious and conceptual may very well be an illusion. We may have such an impression because conceptual processing requires more effort, or because it is more accessible to us by virtue of being conscious. However, if you quantify the amount of computational resources which are used for conceptual, logical thinking, and compare them with those used for, say, vision, it is surely vision that will be the winner.6

Similar to the dual-process theories in cognitive psychology and neuroscience just described, the division between GOFAI and neural networks has been prominent in the history of AI research, which has largely oscillated between the two paradigms. Currently, neural networks are very popular, while GOFAI is not used very widely. However, this may very well change, and perhaps in the future, AI will combine logic-based and neural models in a balanced way. Since GOFAI is used by humans, it is very likely to have some distinct advantage over neural networks, at least for some tasks.7

Note that in AI we find another important distinction that is not very prominent in the neuroscientific literature: learning vs. no learning. Neural networks in AI are fundamentally based on learning, and using them without learning is not feasible. In contrast, in its original form, Good Old-Fashioned AI promises to deliver intelligence without any learning, at the cost of much more computation and more effort spent on programming. This distinction is also relevant to the brain, as we will see next.8

8.2 Neural network learning is slow, data-hungry, and inflexible

To understand the relative advantages of the two systems, let us first consider the limitations in neural networks, and especially the learning that they depend on. First of all, neural network learning is data-hungry: it needs large amounts of data. This is because the learning is by its very nature statistical; that is, it learns based on statistical regularities, such as correlations. Computing any statistical regularities necessarily needs a lot of data; you cannot compute statistics by just observing, say, two or three numbers.

Second, neural network learning is slow. Often, it is based on gradient optimization, which is iterative, and needs a lot of such iterations. The same applies to Hebbian learning, where changing neural connections takes many repetitions of the input-output pairs—this is natural since Hebbian learning can be seen as a special case of stochastic gradient descent. In fact, to input a really large number of data points into a learning system almost necessarily requires a lot of computation, since each data point takes some small amount of time to process.

This statistical and iterative nature of neural network learning has wide-ranging implications for AI. To begin with, these properties help us to further understand why it is so difficult, in us humans, to change any kind of deeply ingrained associations. Mental associations are presumably in a rather tight correspondence with neural connections: If you associate X with Y, it is because there are physical neural connections between the neurons representing X and Y. Now, even if any statistical connection ceases to exist in the real world, perhaps because you move to live in a new environment, it will take a long time before the Hebbian mechanisms learn to remove the association between X and Y, or to associate X with something else.9

In fact, these learning rules, whether basic Hebbian learning or some other stochastic gradient methods, may seem rather inadequate as an explanation for human learning: We humans can learn from single examples and do not always need a lot of data. You only need to hear somebody say once “Helsinki is the capital of Finland”, and you have learned it, at least for a while. Surely, you don’t need to hear it one thousand times, although that may help. This does not invalidate the neural network models, however, since the brain has multiple memory systems, and Hebbian learning is only one way we learn things and remember them—we will get back to this point in Chapter 11.10

The iterative nature of neural learning, together with the two-process theory, also helps to explain in more detail why it is so difficult to deliberately change unconscious associations. Suppose you consciously decide to learn an unconscious association between X and Y (where X might be “exercise” and Y might be “good”). How can you transfer such information from the conscious, explicit system to the neural networks? Perhaps the best you can do is to recall X and Y simultaneously to your mind—but that has to be done many times! In fact, you are kind of creating a kind of new data and feeding it into the unconscious association learning in you brain. You are almost cheating your brain by pretending that you perceive the association “X and Y” many times. We will see many variations on this technique when we consider methods for reducing suffering in Chapter 17.

Another limitation is that when a neural network learns something, it is strictly based on the specific input and output it has been trained on. While this may seem like an obvious and innocuous property, it is actually another major limitation of modern AI. Suppose that a neural network in a robot is trained to recognize animals of different species: It can tell if a picture depicts a cat or a dog, or any other species in the training set. Next, suppose somebody just replaces the camera in the robot with a new one, with higher resolution. What happens is that the neural network the robot previously trained does not work anymore. It will have no idea how to interpret the high-resolution images since they do not match the templates it learned for the original data. A similar problem is that the learning is dependent on the context: An AI might be trained by images where cats tend to be indoors and dogs outdoors, and it will then erroneously classify any animal pictured indoors as a cat. The AI sees a strong correlation between the surroundings and the animal species, and it will not understand that the actual task is about recognizing the animals and not recognizing the surroundings. That is why a neural network will typically only work in the environment or context it is trained in.11

In light of these limitations, AI based on neural networks is thus rather different from what intelligence usually is supposed to be like in humans.12 In general, when humans learn to perform a task, they are often somehow able to abstract general knowledge out of the learning material, and they are able to transfer such knowledge from one task to another. It has even been argued that the hallmark of real intelligence is that it is able to function in many different kinds of environments and accomplish a variety of tasks without having to learn everything from scratch. If all a robot can do is to mow the lawn, we would think it is just accomplishing a mechanical task and is not “really” intelligent.13

8.3 Using planning and habits together

Combining the two systems, neural networks and GOFAI, should take as closer to human-like intelligence. Let us next look at how the two systems might interact in AI. Regarding action selection, we have actually seen how two different approaches can solve the same problem in AI: reinforcement learning and planning. Planning is in fact one of the core ideas of the GOFAI theory. Planning is undeniably a highly sophisticated and demanding computational activity, and probably impossible for simple animals—some would even claim it is only present in humans, although that is a hotly debated question.14 In any case, it seems to correspond closely to the view humans have about their own intelligence, and therefore was the target of early AI research. However, in the 1980s, there was growing recognition that building agents, perhaps robots, whose actions show human-level intelligence is extremely difficult, and it may be better to set the ambitions lower. Perhaps building a robot which has the level of intelligence of some simple animal would be a more realistic goal. Moreover, like in other fields of AI, learning gained prominence. That is why habit-like reinforcement learning started to be seen as an interesting alternative to planning.15

8.3.1 Habits die hard—and are hard to learn

However, habit-based behavior has its problems, partly similar to those considered above for neural network learning. Learning the value function, that is, learning habits, obeys the same laws as other kinds of machine learning. It needs a lot of data: the agent needs to go and act in the world many, many times. This is a major bottleneck in teaching AI and robots to behave intelligently, since it may take a lot of time and energy to make, say, a cleaning robot try to clean the room thousands of times. Basic reinforcement algorithms are also similar to neural network algorithms in that they work by adjusting parameters in the system little by little, based on something like the stochastic gradient methods.

Another limitation which is crucial here is that the result of the learning, the state- or action-value function, is very context-specific—that is one form of inflexibility discussed above. If the robot has learned the value function for cleaning a room, it may not work when it has to clean a garden. Even different rooms to clean may require slightly different value functions! The world could also change. Suppose the fridge from which the robot fetches the orange juice for its master is next to a red table. Then, the robot will associate the red table with high value since seeing it, the robot knows it is close to being able to get the juice. However, if somebody moves the table to a different room, the robot will start acting in a seemingly very stupid way: It will go to the room which now has the red table when it is supposed to get the orange juice—in fact, it might simply approach any new red object introduced to its environment in the hope that this is how it finds the fridge. It will need to re-learn its action-values all over again.

Here, we see another aspect of the slowness of learning habits: Once a habit is learned, it is difficult to get rid of it. In humans, the system learning and computing the reinforcement value function is outside of any conscious control: We cannot tell it to associate a smaller or larger value to some event. This is why we often do things we would prefer not to do, out of habit. In order to learn that a habit is pointless in the sense that it does not give any reward anymore (as happened with the robot above), a new learning process has to happen, and this is just as slow as the initial learning of the habit. That is why habits die hard.16

8.3.2 Combining habits and planning

These problems motivate a recent trend in AI: combining planning and habit-like behavior. The habit-based framework using reinforcement learning will lead to fast but inflexible action selection, and is ideally complemented by a planning mechanism which searches an action tree a few steps ahead—as many as computationally possible. Depending on the circumstances, the action recommended by either of the two systems can then be implemented.17

Let us go back to the robot which is trying to get the orange juice from the fridge. One possible way of implementing a combination of planning and habit-like behavior is to have a habit-based system help the planning system in the tree search. Using reinforcement learning, you could train a habit-based system so that when the robot is in front of the fridge whose door is closed, the system suggests the action “open the door”. When the door of the fridge is open with orange juice inside, the habit-based system suggests “grab the orange juice”. While these outputs could be directly used for selecting actions, the point here is that we can use them as mere suggestions to a planning system. Such suggestions would greatly facilitate planning: The search can concentrate on those paths which start with the action suggested by the habit-based system, focusing the search and reducing its complexity. However, the planning system would still be able to correct any errors in the habit-like system, and could override it if the habit turns out to be completely inadequate.

One very successful real-world application using such a dual-process approach is AlphaGo, a system playing the board game of Go better than any human player.18 The tree to be searched in planning consists of moves by the AI and its opponent. This is a classical planning problem in a GOFAI sense. The world has a finite number of well-defined states, and also, the actions and their effects on the world are clearly defined, based on the rules of the game. What is a bit different is that there is an opponent whose actions are unpredictable; however, that is not a big problem because the agent can assume that the opponent chooses its actions using the same planning engine the agent uses itself.

The search tree in Go is huge since the number of possible moves at any given point of the game is quite large, even larger than in chess. In fact, the number of possible board positions (positions of all the stones on the board) is larger than the number of atoms in the universe—highlighting the fundamental problem in GOFAI-style planning. Since it is computationally impossible to exhaustively search the whole tree, AlphaGo randomly tries out as many paths as it has time for. This leads to a “randomized” tree search method called Monte Carlo Tree Search. Algorithms having some randomness deliberately programmed in them are often called Monte Carlo methods after the name of a famous casino. However, a purely random search would obviously be quite slow and unreliable.19

The crucial ingredient in AlphaGo is another system which learns habit-like behaviors. This system is used inside the planning system, a bit like in the juice robot just described. While the system is rather complex, let’s just consider the fact that in the initial stage of the learning, AlphaGo looks at a large database of games played by human experts. Using that data, it trains a neural network to predict what human experts would do in a given board position—the board positions correspond to the states here. The neural network is very similar to those used in computer vision, and gets as input a visual view of the Go board. This part of the action selection system could be interpreted as learning a “habit”, i.e., an instinctual way of playing the game without any planning.20 The action proposed by the habit system can be used as such, but even more intelligent performance is obtained by using it as a heuristic for the tree search: the tree search is focused on paths related to that proposed action. This heuristic is further refined by further learning stages. In particular, the system also learns to approximate the state-values by another neural network.21

Such suggestions based on neural networks are fast, and intuitively similar to what humans would do. Often, a single glimpse at the scene in front of your eyes will tell a lot about where reward can be obtained, and suggests what you should do. Even when humans are engaged in planning, such input coming from neural networks often guides the planning. If you go to get something from the fridge, don’t you have almost automated reactions to seeing the fridge door closed, and seeing your favorite food or drink inside the fridge? These are presumably given by a simple neural network. Yet, there is a deliberative, thinking aspect in your behavior, and you can change it if you realize, for example, that the juice has gone bad—which the simple neural network did not know.

What is typical in humans is that action selection can also switch from one system to another as a function of practice. Learning a new skill, such as driving a car, is a good example—skills are similar to habits from the computational viewpoint. First, you really have to concentrate and consciously think about different action possibilities. With increasing practice and learning, you need to think less and less, since something like a value function is being formed in the brain. In the end, your actions become highly automated, and you don’t really need to think about what you are doing anymore. The habit-based system takes over and drives the car effortlessly.22

8.4 Advantages of categories and symbols

While in this example of Go playing, neural networks and GOFAI work nicely together, it is often not easy to demonstrate any clear utility of symbolic AI approaches. This may of course change any time, since AI is a field of rapid development. It is quite likely that GOFAI is necessary for particularly advanced intelligence—something much more advanced than what we have at this moment. Yet, the tendency has recently been almost the opposite: tasks which were previously thought to be particularly suitable for symbolic AI have been more successfully solved by neural approaches. For example, large language models used in systems like ChatGPT effectively transform language, i.e. text data, into a sequence of high-dimensional continuous-valued vectors before inputting them into a huge neural network.23

Perhaps symbolic AI works with board games only because such games are in a sense discrete-valued: the stones on the Go board can only be in a limited number of positions, so the game is inherently suitable for GOFAI. So, we have to think hard about what might be the general advantages of logic-based intelligence compared to neural networks. In the following, I explore some possibilities.

8.4.1 GOFAI is more flexible and facilitates generalization

Suppose that there is a neural network that recognizes objects in the world and outputs the category of each object. Then, what would be the utility of operating on those categories as discrete entities, using symbolic-logical processing, instead of having just a huge neural network that does all the processing needed?

We have already seen, more than once, one great promise of GOFAI in the case of planning: flexibility. Given any current state and any goal state, a planning system can, if the computational resources are sufficient, find a plan to get there. If anything changes in the environment—say, it is no longer possible to transition between two states due to some kind of blockage—the planning system takes that into account without any problems. This is in contrast to reinforcement learning which will not know what to do if the environment changes; it may have to spend a lot of time re-learning its value functions.

Furthermore, GOFAI is easily capable of representing various kinds of data structures and relationships in the same way as a computer database. For example, it can easily represent the fact that both cats and dogs are animals, i.e. the hierarchical structure of the categories. It can also represent the relationship that the character string “Scooby” is the name of a particular dog. This adds to the flexibility of GOFAI by allowing more abstract kinds of processing, which are easily performed by humans.

Another wide-spread idea is that categories are useful for generalizing knowledge over categories, which in its turn underlies various forms of abstract thinking. Even though cats are not all the same, it is useful to learn some of their general properties. They like milk, they purr; they don’t like to chew bones like dogs do, and they are not dangerous like bears. Having categories enables the system to learn to associate various properties to the whole category: Observing a few cats drink milk, the system learns to associate milk-drinking to the whole category of cats, instead of just some individual cats. Importantly, associating properties to categories means the system was able to generalize: after seeing some of the cats drink milk, it inferred that all cats drink milk. Such generalization is clearly an important part of intelligence. If the system needed to learn such a property separately for each cat, it would be in great trouble when it sees a new cat and needs to feed it — it would have no idea what to do. But, learning that the whole category of cats is associated with milk-drinking, it knows, immediately and without any further data, what to give to this new cat.

8.4.2 Categories enable communication

Nevertheless, I think the feature which makes GOFAI fundamentally different from neural networks is that the use of symbols is similar to using some kind of a primitive language. In fact, you can hardly have GOFAI without some kind of a language—perhaps akin to a programming language—in which the symbols and logical rules are expressed.

It is equally clear that with humans, language is primarily used for communication between individuals. As each category typically corresponds to a word, humans can communicate associations, or properties of categories, to each other. I can tell my friend that cats drink milk, so she does not need to learn what to feed to cats by trial and error. I have condensed my extensive data on cats’ eating habits into a short verbal message that I transmit to her.

So, it is plausible that the main reason humans are capable of symbolic thinking is that it enables them to communicate with each other. After such a communication system was developed during evolution, humans then started using the same system for various kinds of intelligent processing even when alone. Perhaps we started by telling others, for example, where to find prey. This led to the development of symbols and logical operations, which were found useful for abstract thinking: Perhaps you could try to figure out yourself where you should hunt tomorrow. Eventually, such capabilities ended up producing things such as in quantum physics—and the very theory of GOFAI.24

A reflection of the utility of categories in communication may be seen in a recent research line in AI which tries to develop systems whose function is easy to interpret by humans.25 If you use a neural network to recognize a pattern, the output may be clear and comprehensible, but the computations—why did the network give that particular output— are extremely difficult to understand for humans. This is fine in many cases, but sometimes it is necessary to explain the decision to humans. For example, if an AI rejects your loan application, the bank using the AI may be legally obliged to explain the grounds for that decision.26 Researchers developing such interpretable AI often end up doing something similar to GOFAI boosted by learning, since it gives rules which can be expressed in more or less ordinary language, and thus they can be explained. In fact, in Chapter 4 we saw examples of GOFAI systems whose functioning is easy to understand and to explain.27

8.5 Categorization is fuzzy, uncertain, and arbitrary

Now, let us consider the flipside: problems that arise when using categories. We have already seen some problems in logical-symbolic processing, the most typical being the exponential explosion of computation in planning. Here, we focus on the consequences of using categories, and look at the question from a more philosophical angle. Indeed, it has been widely recognized by philosophers over the centuries that dividing the world into “crisp” categories can only be an approximation of the overwhelming complexity of the world. I focus on some issues which will in later chapters be seen to be relevant for suffering.28

8.5.1 Categories are fuzzy

Philosophers have long pointed out that there may not be any clearly defined categories in the world. Granted, the difference between cats and dogs may be rather clear, but what about the category of, say, a “game”? Wittgenstein gave this as an example of a category which has no clear boundaries. Different games have just some vague similarity, which he called “family resemblance”.

This idea has been very influential in AI under the heading of fuzziness. A category is called fuzzy if its boundaries are not clear or well-defined. Consider for example the word “big”. How does one define the category of big things? For simplicity, let us just consider the context of cities. If we say “London is big”, that is clearly true: London definitely belongs to the category of big things, in particular big cities. But if we say “Brussels is big”, is that true or false? How does one define what is big and what is not? In the case of cities, we could define a threshold for the population, but how would we decide what it should be? An AI might learn to categorize cities into big and small ones based on some classification task—in Chapter 4, we discussed how this might happen in categorizing body temperature into “high fever” or not. However, that categorization would depend on the task, and there would always be a gray zone where the division is rather arbitrary.

The consensus in AI research is that many categories are quite fuzzy and have no clear boundaries; there are only different degrees of membership to a category. There is no way of defining a word like “big” (or, say, “nice”, “tall”, “funny”) in a purely binary (true/false) fashion. There will always be objects that quite clearly belong to the category and objects which clearly do not belong to the category, but for a lot of objects the situation is not clear. In the theory of fuzzy logic, such fuzziness is modelled by giving each object a number between 0 and 1 to express the degrees of membership to each category.29

8.5.2 Categorization is uncertain

In addition, categorization is always more or less uncertain. Any information gleaned from incoming sensory input is uncertain, for reasons we will consider in more detail in Chapter 12. Partly, it is a question of the neural network getting limited information, and partly because of its limited information-processing capabilities. If you have a photograph of a cat taken in the dark and from a bad angle, the neural network or indeed any human observer may not be sure about what it is. They might say it is a cat with 60% probability, but it could be something else as well. In other words, any categorization by an AI is very often a matter of probabilities.

It is important to understand that fuzziness and uncertainty are two very different things. Uncertainty is a question of probabilities, and probabilities are about lack of information. If I say that a coin flip is heads with 50% probability and tails with 50% probability, there is no fuzziness about which one it is. After flipping the coin I can say if it is heads or tails, and no reasonable observer would disagree with me (except in some very, very rare cases). In other words, uncertainty is a question of not knowing what will happen or has happened, i.e., a lack of information about the world. In contrast, fuzziness has nothing to do with lack of information; it is about the lack of clear definition. We cannot say if the statement “Brussels is big” is true even if we have every possible piece of information about Brussels, including its exact population count. According to the information I find on Wikipedia, its population is 1,191,604, but knowing that will not help me with the problem if I don’t know how many inhabitants are required for a city to be in the “big” category.

Humans are not good at processing uncertainty. Various experiments show that humans tend to use excessively categorical thinking, where the uncertainty about the category membership is neglected. That is, when you see something which looks to you most probably like a cat, your cognitive system tends to ignore any other possibilities, and think it is a cat for sure.30

An old Buddhist parable about these dangers in categorization is seeing a rope in the dark and thinking it is a snake. You miscategorize the rope, and your brain activates not only the category of a snake, but all the associations related to that category (“animal”, “dangerous”). You get scared, with all the included physiological changes, such as an increased heart rate. If you had properly taken the uncertainty of such categorization into account, your reaction might have been more moderate.

8.5.3 Categorization is arbitrary

In some cases, the categories are not just fuzzy or uncertain: their very existence can be questionable. Consider concepts such as “freedom” or “good”. Even forgetting about any difficulties in programming an AI to understand them, is it even clear what these words mean? Certainly, they mean different things to different people: people from different cultural backgrounds may easily misunderstand each other simply because they use such concepts with slightly different meanings. A great amount of time can be spent in attempting to just describe the meanings of certain words and categories. In fact, we spend more than one chapter on analyzing the category called “self” in this book.

Even in rather straightforward biomedical applications of machine learning, we often use categories that are not well-defined. For example, in a medical diagnosis context, it is not clear if what we usually call schizophrenia is a single disease. Perhaps there are a number of different diseases which all lead to the single diagnosis of schizophrenia.31 Developing effective medications may only be possible once we understand all the subtypes, while thinking of all the subtypes as a single disease (a single category) may mislead any treatment attempts.

Moreover, a categorization that works for one purpose might not be suitable for another. We might divide people into different nationalities, which is very useful from the viewpoint of knowing what languages they are likely to understand. However, we can too easily use the same categories to predict all kinds of personality traits of those individuals, and that prediction may go quite wrong. Thus, the categories and their utility depend on the context. Moreover, since different people use different categories in different ways, they are subjective.32

Such arbitrariness of categories has been well appreciated in some philosophical schools. In the Yogcra school of Buddhism, it is claimed that “while such objects [as chairs and trees] are admissible as conventions, in more precise terms there are no chairs, or trees. These are merely words and concepts by which we gather and interpret discrete sensations that arise moment by moment in a causal flux.“33 What arises in such a moment-by-moment flux is, in our terminology, activities in neural networks. Categories are created afterwards, by further information-processing.

8.5.4 Overgeneralization

It may be easy to understand that miscategorization leads to problems, as in mistaking a rope for a snake. However, the biggest computational problem caused by all properties just discussed—fuzziness, uncertainty, arbitrariness—may be overgeneralization. Overgeneralization can be difficult to spot, even after the fact, which makes it particularly treacherous.

Overgeneralization means that you consider all instances of a category to have certain properties, even if those properties hold only for some of them. Since categories are fuzzy, anything which is not really firmly inside the category may actually be quite different from its prototype. Related to this, you may not acknowledge the uncertainty of categorization and the ensuing generalization. Even more rarely do people acknowledge that the very categories are arbitrary.

Overgeneralization effects are well documented, for example, in perception of human faces, where gender and race can bias any conclusions you make about the individual involved.34 As an extreme case of overgeneralization, if you have been bitten by a dog, you may develop a fear towards all dogs, which would be called a phobia. Such fear is overgeneralizing in the sense that it is very unlikely that the other dogs would bite you. If you didn’t use any categories, you would only be afraid of that one dog that bit you. This is a very concrete example of how thinking in terms of categories leads to suffering, as will be discussed in more detail in later chapters.

There are actually good computational reasons why overgeneralization occurs. Learning to generalize based on a limited number of categories means that knowledge gleaned from all the instances of each category can be pooled together. If you actually had enough data from all the dogs in the world, as well as unlimited computational capacities, you would be able to learn that some of them are safe while a few are not. However, data and computation are always limited, so some shortcuts may be necessary—even if they increase your suffering. This is another theme that we will return to over and over again in this book.