1. Primers
  2. Pseudo-random number generators: best practices
  • (Just enough) Julia for scientific informatics, modeling, and reasoning
  • Introduction
  • Basic frameworks and mechanisms
    • Orientation
    • Basics of setting up and running Julia
    • Basics of visualizing mathematical models
    • Basics of working with randomness and probabilities
    • Basics of working with data tables
  • Basics of specialized workflows
    • Basics of paleobiological fossil collection analyses
    • Basics of agent-based modeling: spatial epidemic dynamics with Agents.jl
      • Basics of agent-based modeling: spatial epidemic dynamics with Agents.jl
    • Basics of species distribution modeling
  • Primers
    • Bernoulli trial
    • Pathogen fitness as a function of virulence (Frank, 1996)
    • Virulence-transmission trade-off (Frank, 1996)
    • Julia – Environments – Global vs project
    • Julia: Functions, methods, and signatures
    • Markov property
    • Probabilty distributions–Essential concepts
    • Pseudo-random number generators
    • Pseudo-random number generators: best practices
    • Pseudo-random number generators: continuous values from discrete machines

On this page

  • 1 Implicitly rely on the default random number generator for ease and convenience
  • 2 Explicitly set the default random number generator seed for replicability
  • 3 Explicitly reference the default random number generator for maintainability and extensibility
  • 4 Ensure randomness across replicates by using time and process id’s as seeds
  • 5 Explicitly create anmd manage your own random number generators for robust reproducibility and replicability
  1. Primers
  2. Pseudo-random number generators: best practices

Pseudo-random number generators: best practices

Author

Jeet Sukumaran

1 Implicitly rely on the default random number generator for ease and convenience

In workflows where replicability is not critical or immediately relevant (exploratory, pilot, scaffolding runs, etc.) you can rely on the default random generator being used in the background without explicitly managing it.

using Random

rand(Int, 10)
rand(['A', 'C', 'G', 'T'])

2 Explicitly set the default random number generator seed for replicability

In most cases, however, replicability is important.

As discussed in the primer on pseudo-random number generators, the random number “seed” value passed to the algorithm determines the value of every number generated from the algorithm: in a sense, it indexes or selects a specific sequence of pre-determined random outcomes to be used.

Different seed values produce different sequences of random numbers, while conversely, the same seed value repeatedly and reliabily produces the same sequence of randomness across independent runs of the program.

This latter is what gives PRNG’s their great utility in many research software applications, and is actually useful in programming and software development as well for debugging purposes: you can only reliably be sure of fixing an error if you can reliably reproduce it to be able to work on it.

You can use default random number implicitly but gain replicability by explicitly setting its seed using the Random.seed! function.

using Random

# Explicitly seed the default random number generator with 42
Random.seed!(42)

# Implict usage of global default random number generator, seed globally set previously
rand(Int, 10)
rand(['A', 'C', 'G', 'T'])

As before, when rand is called without a random number generator object as its first argument, the default random number genrator will be used as the source of randomness. However, here we have set the seed for this before hand.

3 Explicitly reference the default random number generator for maintainability and extensibility

While implicit usage of the default random number generator is convenient, it is good practice to always be explicit about the random number generator that is being used event if it is the default.

rng = Random.default_rng()
Random.seed!(rng, 42)
rand(rng, Int, 10)
rand(rng, ['A', 'C', 'G', 'T'])

Beyond clarity, it also makes it easy to use other random number generators if the need arises.

Furthermore, when writing our own methods for rand, the general principle is that any function doing random sampling should ideally accept an rng::AbstractRNG argument with a default of Random.default_rng(), following the convention used throughout Base and most Julia packages.

4 Ensure randomness across replicates by using time and process id’s as seeds

While replicability is important, at the same time, in almost every case except when actually wanting to reproduce a run for debugging or study purposes, when we are running programs that use these values, we want them to have different random sequences across different independent runs.

A common convention to have a new explicitly-specified random seed on every independent run without needing to change the code (or force the user to type one in) is to use the system time, specified in nanoseconds, time_ns(), (sometimes with the process id, getpid(), added to avoid correlations when executed in bulk in parallel).

using Random

rng = Random.default_rng()
seed = time_ns() + getpid()
@info "Setting random seed: $(seed)"
Random.seed!(seed)

5 Explicitly create anmd manage your own random number generators for robust reproducibility and replicability

In the full and final production-grade software, you should design all your programs to (1) optionally take a user-specific random number seed and, if not given, generate your own; (2) report and log the random number seed used; and (3) instantiate and use your own random number generator object in all computation, passing it around to different functions or objects that need them.

using Random

# Instantiate a PRNG using the "Xoshiro" algorithm with the seed value of 42
rng = Random.Xoshiro(42)
# Instantiate a PRNG using the "Mersenne Twister" with the seed value of 1999
rng = Random.MersenneTwister(1999)

# Use `rand` with the `rng` to sample an `Int` (integer) value
rand(rng, Int, 10)
# Use `rand` with the `rng` to sample a random value from the list
rand(rng, ['A', 'C', 'G', 'T'])
Back to top
Pseudo-random number generators
Pseudo-random number generators: continuous values from discrete machines
  • © Jeet Sukumaran

Please share or adapt under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).