Pseudo-random number generators: best practices
1 Implicitly rely on the default random number generator for ease and convenience
In workflows where replicability is not critical or immediately relevant (exploratory, pilot, scaffolding runs, etc.) you can rely on the default random generator being used in the background without explicitly managing it.
using Random
rand(Int, 10)
rand(['A', 'C', 'G', 'T'])2 Explicitly set the default random number generator seed for replicability
In most cases, however, replicability is important.
As discussed in the primer on pseudo-random number generators, the random number “seed” value passed to the algorithm determines the value of every number generated from the algorithm: in a sense, it indexes or selects a specific sequence of pre-determined random outcomes to be used.
Different seed values produce different sequences of random numbers, while conversely, the same seed value repeatedly and reliabily produces the same sequence of randomness across independent runs of the program.
This latter is what gives PRNG’s their great utility in many research software applications, and is actually useful in programming and software development as well for debugging purposes: you can only reliably be sure of fixing an error if you can reliably reproduce it to be able to work on it.
You can use default random number implicitly but gain replicability by explicitly setting its seed using the Random.seed! function.
using Random
# Explicitly seed the default random number generator with 42
Random.seed!(42)
# Implict usage of global default random number generator, seed globally set previously
rand(Int, 10)
rand(['A', 'C', 'G', 'T'])As before, when rand is called without a random number generator object as its first argument, the default random number genrator will be used as the source of randomness. However, here we have set the seed for this before hand.
3 Explicitly reference the default random number generator for maintainability and extensibility
While implicit usage of the default random number generator is convenient, it is good practice to always be explicit about the random number generator that is being used event if it is the default.
rng = Random.default_rng()
Random.seed!(rng, 42)
rand(rng, Int, 10)
rand(rng, ['A', 'C', 'G', 'T'])Beyond clarity, it also makes it easy to use other random number generators if the need arises.
Furthermore, when writing our own methods for rand, the general principle is that any function doing random sampling should ideally accept an rng::AbstractRNG argument with a default of Random.default_rng(), following the convention used throughout Base and most Julia packages.
4 Ensure randomness across replicates by using time and process id’s as seeds
While replicability is important, at the same time, in almost every case except when actually wanting to reproduce a run for debugging or study purposes, when we are running programs that use these values, we want them to have different random sequences across different independent runs.
A common convention to have a new explicitly-specified random seed on every independent run without needing to change the code (or force the user to type one in) is to use the system time, specified in nanoseconds, time_ns(), (sometimes with the process id, getpid(), added to avoid correlations when executed in bulk in parallel).
using Random
rng = Random.default_rng()
seed = time_ns() + getpid()
@info "Setting random seed: $(seed)"
Random.seed!(seed)5 Explicitly create anmd manage your own random number generators for robust reproducibility and replicability
In the full and final production-grade software, you should design all your programs to (1) optionally take a user-specific random number seed and, if not given, generate your own; (2) report and log the random number seed used; and (3) instantiate and use your own random number generator object in all computation, passing it around to different functions or objects that need them.
using Random
# Instantiate a PRNG using the "Xoshiro" algorithm with the seed value of 42
rng = Random.Xoshiro(42)
# Instantiate a PRNG using the "Mersenne Twister" with the seed value of 1999
rng = Random.MersenneTwister(1999)
# Use `rand` with the `rng` to sample an `Int` (integer) value
rand(rng, Int, 10)
# Use `rand` with the `rng` to sample a random value from the list
rand(rng, ['A', 'C', 'G', 'T'])