Chapter 1 Numerical mathematics

For reasons of computational efficiency, essentially all numerical and statistical computations are performed on computers with finite precision arithmetic. The finite precision needs to be taken into account when designing algorithms because the numbers represented by a computer do not have all the same properties that real numbers have.

1.1 Floating point numbers

Most numerical and statistical algorithms and computations use floating point numbers. These are numbers of the form \[\begin{equation} x_{FP} = \pm c \cdot 2^q, \tag{1.1} \end{equation}\] where both the coefficient \(c\) and exponent \(q\) are integers chosen from a finite range of values. Floating point numbers are useful because they can easily represent values that are very small in absolute value (close to zero) and ones that are very large by using negative and positive exponents \(q\), respectively. The range of values is still finite and as a consequence there are real numbers \(x \neq 0\) for which the closest floating point number \(x_{FP}=0\), leading to what is known as underflow in computations. Conversely, computations with a result that is larger than the largest floating point number are said to overflow.

The floating point numbers used by all modern computers are defined by the IEEE Standard for Floating-Point Arithmetic (IEEE 754). Most commonly used formats are 64 bit double precision numbers and 32 bit single precision numbers, although others are also occasionally used.

Table 1.1: Key properties of common floating point number formats.
Precision	Bits	\(c\) bits	\(q\) bits	Max value
Single	32	24	8	\(\approx 3.4 \cdot 10^{38}\)
Double	64	53	11	\(\approx 1.8 \cdot 10^{308}\)
Half	16	11	5	\(\approx 6.6 \cdot 10^4\)

Python/NumPy allows exploring floating point type properties using

import numpy as np
print(np.finfo(np.float32))

## Machine parameters for float32
## ---------------------------------------------------------------
## precision =   6   resolution = 1.0000000e-06
## machep =    -23   eps =        1.1920929e-07
## negep =     -24   epsneg =     5.9604645e-08
## minexp =   -126   tiny =       1.1754944e-38
## maxexp =    128   max =        3.4028235e+38
## nexp =        8   min =        -max
## ---------------------------------------------------------------

print(np.finfo(np.float64))

## Machine parameters for float64
## ---------------------------------------------------------------
## precision =  15   resolution = 1.0000000000000001e-15
## machep =    -52   eps =        2.2204460492503131e-16
## negep =     -53   epsneg =     1.1102230246251565e-16
## minexp =  -1022   tiny =       2.2250738585072014e-308
## maxexp =   1024   max =        1.7976931348623157e+308
## nexp =       11   min =        -max
## ---------------------------------------------------------------

The value eps returned above is especially interesting because it is the smallest number of the given type for which \(1 + eps \neq 1\).

1.1.1 Special values

In addition to numbers in the prescribed range, floating point numbers can represent the following special values:

\(\pm\)Inf (\(\pm \infty\))
NaN (not-a-number)

Infinity values appear when a computation overflows, e.g.:

import numpy as np
print('exp(1000) =', np.exp(1000))

## exp(1000) = inf
## 
## <string>:1: RuntimeWarning: overflow encountered in exp

Infinities obey the usual rules of arithmetic and can be compared as usual.

NaN is used to represent undefined values such as \(0/0\), \(0 \cdot \infty\), \(\infty - \infty\).

All arithmetic operations including a NaN result in a NaN. All comparisons with NaN (except \(\neq\)) are always false. To test if a number is NaN, you need to use np.isnan().

1.2 Some important properties of floating point numbers

Some mathematical properties of floating point numbers that differ from real numbers:

Uneven distribution of numbers, more dense near zero
Finite range of representable values
- Example: cumulative density of standard normal \(\Phi(10) \approx 1 - 7.619853 \cdot 10^{-24} =_F 1\)
- Using \(\log p\) instead of \(p\) can help with probabilities
Rounding errors break all rules of arithmetic
Equality comparison non-trivial because of rounding: never use \(x==y\)
Avoid computing \(x - y\) for \(x,y \gg 0\) or \(x,y \ll 0\)

1.3 Computing with probabilities

As probabilities are non-negative but can be very close to 0, it is often numerically advantageous to store their logarithms \(\log p\) instead of the raw values \(p\).

1.3.1 Logsumexp

(Re-)normalisation of probabilities defined in log scale requires converting back to raw probabilities with exp() which can cause an overflow or an underflow. This can be avoided using so-called logsumexp trick.

Let us consider a vector \(\boldsymbol{\pi} = (\pi_i) = (\log p_i)\), where \(p_i > 0\), \(i= 1,\dots,n\). In order to compute \[\log S = \log \sum_{i = 1}^n p_i,\] we need to compute \[\log S = \log \sum_{i=1}^n \exp(\pi_i).\] However, the computation of \(\exp(\pi_i)\) can overflow even if final \(\log S\) is not that large. The overflow can be avoided by finding \[M = \max_{i \in \{1, \dots, n\}} \pi_i\] and computing \[\begin{equation} \log S = M + \log \sum_{i=1}^n \exp(\pi_i - M), \tag{1.2} \end{equation}\] which is guaranteed to be well-behaved because \(\max_{i \in \{1, \dots, n\}} \pi_i - M = 0\). The same trick avoids potential underflows when \(\pi_i\) are small, because at least one of \(\pi_i - M\) is guaranteed not to be small.

1.3.2 Special functions

Mathematical libraries in various programming environments contain a number of special functions that are specifically designed for computing challenging expressions in a numerically stable manner by avoiding overflows and underflows. Some examples of such functions are listed in Table 1.2

Table 1.2: Useful special functions for numerical computation in Python. Equivalent functions can be found in most other numerical languages. (`import numpy as np; import scipy.special as scs`)
Name	Purpose
`np.log1p`	Computes \(\log(1 + x)\), useful for \(x \approx 0\)
`np.expm1`	Computes \(\exp(x) - 1\), useful for \(x \approx 0\)
`scs.gammaln`	Computes \(\log \Gamma(x)\)
`scs.erf`	Computes \(\operatorname{erf}(x) = 2/\sqrt{\pi} \int_{0}^x e^{-t^2} \,\mathrm{d} t\)
	(The cumulative distribution of \(\mathcal{N}(0, 1)\) is \(\Phi(x) = 1/2[1 + \operatorname{erf}(x/\sqrt{2})]\))
`scs.erfc`	Computes \(1 - \operatorname{erf}(x)\), useful for \(x \gg 0\)
`scs.erfcx`	Computes \(\exp(x^2) (1-\operatorname{erf}(x))\), useful for \(x \gg 0\)

Examples:

import numpy as np
print(np.log1p(1e-99))

## 1e-99

print(np.log(1 + 1e-99))

## 0.0

print(np.expm1(1e-20))

## 1e-20

print(np.exp(1e-20) - 1)

## 0.0

1.4 Reporting extreme values computed using logarithms

Most people are used to working in base 10 numbers. When using logarithms to represent extreme values, it is sometimes non-trivial to print them as \(x \cdot 10^y\).

Assuming \(z = \log_{10}(p)\), we have for all \(y\), \[ p = 10^z = 10^(z - y) \cdot 10^y. \] By choosing \(y = \lfloor z \rfloor\), we have \[ 1 \le 10^(z - y) < 10. \] Numbers computed in different logarithmic bases can be converted as usual using \[ \log_{10}(p) = \frac{\log_b(p)}{\log_b(10)}. \]

import numpy as np

def pretty_print_log10(logp):
    # Print base-10 logarithm value as x*10^y
    # For all y, we have
    #   p = 10^logp = 10^y * 10^(logp - y)
    # By choosing y to be largest integer not greater than logp,
    # we have 1 <= x < 10
    y = np.floor(logp)
    x = 10**(logp-y)
    print(x, 'e', np.int(y), sep='')

print("Computed using floating point values and logarithms")

## Computed using floating point values and logarithms

pretty_print_log10(100*np.log10(2))

## 1.2676506002282257e30

print("Computed using Python's arbitrary precision integer arithmetic")

## Computed using Python's arbitrary precision integer arithmetic

print(2**100)

## 1267650600228229401496703205376

1.5 Best practices and hints

Be aware of your data types and their ranges, especially with GPUs and other special hardware
Compute and store values in a way that maintains precision (\(p\) vs. \(1 - p\), \(\log p\), renormalising things as necessary)
Order of evaluation does matter!
You should not do equality comparisons for floating point numbers because of possible rounding errors
- Python has np.isclose() to test approximate equality
Equality tests for special values should be done using special functions np.isfinite(), np.isinf(), np.isnan()
NaN values can be annoying, because any operation with NaN yields a NaN and thus a single NaN value often propagates to all values in a large computation.
- It can be useful to use np.isnan() to check the output of a suspicious function

1.6 Literature

Gentle (2009) contains an introduction to basic challenges raised by floating point representation.

1.7 Exercises

Write a program to increment x = 0.0 by 0.1 100 times. Compute x - 10. How do you interpret the result?
Verify (prove) that the logsumexp trick of Eq. (1.2) gives the correct result for \(\log S\).
Find examples of floating point computations that break standard rules of arithmetic.
For \(x \sim \mathcal{N}(0, 1)\), compute \(\log p(28 < x < 30)\).
Check if the following properties of real numbers hold for floating point numbers and provide examples of how the invalid properties fail.

\(\forall x < y, \exists z: x < z < y\)
\(x, y \in \mathbb{R} \Rightarrow x + y \in \mathbb{R}\)
\(x, y \in \mathbb{R} \Rightarrow x y \in \mathbb{R}\)
There exists unique \(a = 0\) such that \(x+a = a+x = x\)
\(x, y, z \in \mathbb{R} \Rightarrow (x + y) + z = x + (y + z)\)
\(x, y, z \in \mathbb{R} \Rightarrow (x y) z = x (y z)\)
\(x, y, z \in \mathbb{R} \Rightarrow x (y + z) = xy + xz\)

References

Gentle, James E. 2009. Computational Statistics. 1st ed. New York, NY: Springer.