19This can be done using a special form of RPE called temporal difference (TD) error (see footnote 20 below), and in particular using the squared error summed over all states. See Sutton and Barto (2018, p. 268) who call it Bellman error, or related developments by Bhatnagar et al. (2009).