21For the general theory on approximating values by neural networks or simpler methods, see Sutton and Barto (2018, Chapter 9). In Chapter 11 we will also see how the system can improve by playing against itself. A completely different purpose for combining learning and planning is to learn to plan better in a given environment where rewards are changing (Tamar et al., 2016; Pascanu et al., 2017).