¹¹An abstract way of justifying curiosity is that basic iterative learning mechanisms such as gradient descent often get stuck in what is called “local minima” of an objective (error) function. That means a point in the parameter space that has a better value of the objective function than any other point near-by, but so that there is a point far-away in the parameter space which has an even better value. A special class of optimization methods called “global optimization” tries to improve iterative algorithms so that they might find the global minimum, that is, the very best value for the parameters, or at least something better than simple gradient descent. Bayesian optimization is one class of such methods (Gutmann et al., 2016; Brochu et al., 2010).