⁴To this advantage we have to add the more technical one that stochastic methods include an implicit regularization and are thus less likely to overfit the data (Bottou, 2003; Hardt et al., 2016). Overfitting is an important problem in practical AI learning, but I don’t discuss it at any length in this book. Basically, it means that if the amount of data at your disposal is limited (as it almost always is), the learning may go wrong in a particular way. The learning may seem to work well for the data you have, achieving a good “fit”, but the predictions your neural network gives are actually useless, because the learning “overfit” your data and does not work (or give a good fit) for any new data on which you would like to apply the system in the future.