This note documents how the
sample() function has changed since R 3.6.0, and how to reproduce its previous behaviour.
The issue that used to affect the pseudo-random number generator (PRNG) at the core of the function is documented in a note by Kellie Ottoboni and Philip B. Stark, “Random problems with R,” which was extensively discussed on the R-devel mailing-list.
The note explains that (part of) the PRNG used by R 3.5.1 does not correct for the uneven spacing of binary floating-point numbers. The resulting quantization effect/error produces biased selection probabilities, to a sufficiently severe extent for the PRNG not to qualify as sufficiently pseudo-random.
The issue got patched in R 3.6.0, which led to the introduction of a method allowing to reproduce the former behaviour of the PRNG. The method is well documented on R blogs like Revolution Analytics or J. Kenneth Tay's Statistical Odds & Ends, and consists in adjusting the
RNGKind option before calling the
This is of course only useful if one depends on a particular random number generator and seed number, as set through
set.seed, to reproduce the behaviour and results of R code written and executed before R 3.6.0.
This note is obviously of the ‘note to self’ kind. The previous one in that category (successfully) aimed at reminding me to use the RDS format.
: to change the local state of the PRNG without affecting its global state, see this note by Evgeni Chasnovski.
: thanks to R Weekly for mentioning this note.
- First published on August 7th, 2019