Basics of working with randomness and probabilities

Author

Jeet Sukumaran

1 Random number generation under uniform probability distributions with `rand`

Proficiencies

Sampling a single random value of a specified numeric type (e.g., Int, Float64) using rand.
Sampling multiple random values of the specified type and length using rand.
Sampling uniformly from $(0, 1]$ using rand()
How to sample a vector of independent uniform random Float64 values in $(0, 1]$ by passing a length to rand.
How to sample a matrix of independent uniform random Float64 values in $(0, 1]$ by passing row and column dimensions to rand.
How to sample uniformly at random from a defined integer range using rand with a range argument (e.g., rand(-500:500, n)).
How to sample uniformly at random from an explicit collection of values (e.g., an array of floats or characters) using rand.

1.1 The `rand` function

Functions vs methods in Julia

In Julia, a function is a operation that can accept inputs and returns outputs. A function of a particular name (e.g. range, rand, my_custom_function) may have a number of different methods, or implementation that are distinguished by the pattern of arguments the function takes (called the signature of the function).

See: Julia: Functions, methods, and signatures

The rand function is the function we will most often use to generate random variates of different shapes, and sizes and characteristics. The rand function has a number of different methods, that is, different approaches for generating random values of various different data types, structures, and ranges, depending on the arguments passed to it.

1.2 Uniform probability distributions

The core Julia library provides rand methods that generate values by sampling them from ranges and collections under a uniform probability distribution. Packages such as Distributions.jl, covered separately, provide the ability to sample random values from a broader range of distributions.

Probability distributions

1.3 Methods of the `rand` function for generating samples of uniformly distributed values

1.3.1 Generating a value of a particular type sampled with uniform probability from its entire range

Sampling a random integer between its smallest and largest representible value with uniform random probability:

rand(Int)

Sampling a random real (continuous or floating-point) value from across its 64-bit representation range with uniform random probability:

rand(Float64)

0.6919429112861065

Sampling a random character value from across its possible values with uniform random probability:

rand(Char)

'\Uc433e': Unicode U+C433E (category Cn: Other, not assigned)

1.3.2 Generating collections of values sampled independently with uniform probability from their entire ranges

A vector of 10 i.i.d (identically and independently distributed) uniformly-distributed integers:

rand(Int, 10)

10-element Vector{Int64}:
  2466386225610535120
 -8793489409307919264
  4882515724999170443
 -6784324233252291528
  -780864292374854054
  4202870063366858479
 -1059321598618763210
  3516560114394028368
    24067213464731782
   849760427981797079

A vector of 10 i.i.d (identically and independently distributed) uniformly-distributed (64-bit) floating-point values:

rand(Float64, 10)

10-element Vector{Float64}:
 0.4035560602394148
 0.36036670468985665
 0.6642908613841859
 0.4510301344026574
 0.06182975795347512
 0.503575918475182
 0.4877831837365839
 0.029682007600483562
 0.05208337113672257
 0.8332325177147346

1.3.3 Generating a matrix of random values sampled independently with uniform probability from their entire ranges

Multidimensional collections of values can be generated by the syntax, rand(Type, n, m), where n is the number of rows and m the number of columns.

Sampling a $2 \times 3$ matrix of random Int values:

rand(Int, 2, 3)

2×3 Matrix{Int64}:
 5423624950225502462   4645961223961001037    95236877055533749
 5447466455513695867  -7323070809988580107  8498790813425923297

Sampling a $2 \times 3$ matrix of random Float64 values:

rand(Float64, 2, 3)

2×3 Matrix{Float64}:
 0.27974   0.708834  0.916697
 0.900955  0.80065   0.238447

1.3.4 Sampling with uniform probability over given ranges

We can use methods of the rand function that take range objects to constrain the values to particular ranges. Instead of a type (Int, Float64, and so on), we can pass the rand function a range object to sample with uniform probability from the intervals or collections of values represented by the range object.

Sampling 10 integer values from the closed interval [-5, 5] with uniform random probability:

rand(-5:5, 10)

10-element Vector{Int64}:
 -2
  4
 -1
 -2
 -2
 -5
  1
 -1
  4
  2

Sampling 10 real values from the closed interval [-5.0, 5.0] (binned into 0.1 units) with uniform random probability:

rand(-5.0:0.1:5.0, 10)

10-element Vector{Float64}:
  3.5
  3.1
  4.6
  0.6
 -4.1
 -2.0
 -1.6
  4.5
  0.9
  0.5

1.3.5 Sampling with uniform probability (with replacement) over given collections of values

rand([0.1, 0.2, 0.3, 0.4], 2)

2-element Vector{Float64}:
 0.3
 0.4

rand(['A', 'C', 'G', 'T'], 3)

3-element Vector{Char}:
 'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
 'T': ASCII/Unicode U+0054 (category Lu: Letter, uppercase)
 'G': ASCII/Unicode U+0047 (category Lu: Letter, uppercase)

rand(["Frodo", "Gandalf", "Eowyn", ], 4)

4-element Vector{String}:
 "Frodo"
 "Gandalf"
 "Frodo"
 "Frodo"

1.3.6 Sampling from ranges vs. sampling from collections

Note that while both produce identical results. In one, we use the range function (the 1:4 is short-hand for range(1, 4))

rand(1:4, , 10)
## Sample 10 values from the collection of values `[1, 2, 3, 4]`
rand([1, 2, 3, 4], 10)

1.3.7 Sampling random continuous values from $(0, 1]$ with uniform probability

Sample a single random Float64 sampled uniformly from $(0, 1]`.

rand()

Sample a vector of 5 independent uniform random values in $(0, 1]$.

rand(5)

Sample a $2 \times 3$ matrix of independent uniform random values in $(0, 1]$.

julia> rand(2, 3)
2×3 Matrix{Float64}:
 0.731902  0.184771  0.992814
 0.442018  0.661203  0.107552

1.4 Notation for the uniform distribution

A uniform distribution between value $a$ and $b$, with support of continuous values in the real interval $[a, b]$ is denoted

\[ \mathcal{U}(a, b). \]

A uniform distribution between value $a$ and $b$, with support of integer values in a finite set of integers in the interval $[a, b]$, is denoted

\[ \mathcal{U}(a, \dots, b). \]

1.5 Histogram visualization

The hist function provides a variety of methods to visualize data in histogram formats.

using CairoMakie
data = rand(Float64, 1000)
hist(data)

1.6 Exercises

1.6.1 Visualizing uniform distributions

Exercise 1

Generate $N = 1000$ samples from each of the following distributions, and plot their histograms.

$\mathcal{U}(-10, 10)$
$\mathcal{U}(-10, \dots, 10)$

Advanced

Write a function, plot_uniform that takes three integer (Int) arguments, range_start, range_end, and n_samples, and plots the results.
Using this, produce plots of $N \in \{1 \times 10^4, 1 \times 10^5, \}$ etc., to see how the histograms appear more and more uniform as the sample size increases.

1.6.2 Visualizing mathematical models by sampling points

Exercise 2

We previously used range to generate a regular grid of value for one set of coordinates:

α = 0.5
z_vals = range(0.0, 1/α; length = 400)

0.0:0.005012531328320802:2.0

The range range function returns a finite sequence of values, with different methods giving sequences of different data types, bounds, and intervals, depending on the input arguments. Here, range(0.0, 1/α; length = 400) is a method of the range function that returns a sequence of value between $0.0$ and $\frac{1}{\alpha}$

length = 400 represent the the number of samples/points in the function call range(…, length = 400) This forms a “grid” (a collection of regular systematic samples) on the x-axis. The y-values come from passing each y-value to the function to get the result. Instead of a systematic sample, we can randomly sample from the same range.

Instead of a systematic grid with range, generate a collection of $x$-coordinates of the same size by sampling independently from a uniform distribution over the same range and type using rand. Calculate the corresponding $y$-coordinates in the same way, by using map and the function. Visualize the results using scatter.

Here we both regularly (grid) and randomly sample 10 points from between 0.0 and 1/alpha:

julia> x_vals1 = collect(range(0.0, 1/α, length=10)) 10-element Vector{Float64}: 0.0 0.2222222222222222 0.4444444444444444 0.6666666666666666 0.8888888888888888 1.1111111111111112 1.3333333333333333 1.5555555555555556 1.7777777777777777 2.0

julia> x_vals2 = rand(0.0:0.01:1/α, 10) 10-element Vector{Float64}: 0.55 1.72 1.09 0.38 0.33 0.38 1.42 1.24 0.76 0.98 These all form one set of coordinates. The other, comes from applying the function to these:

y_vals1 = map(x -> ???, x_vals1) y_vals2 = map(x -> ???, x_vals2) Randomly sampling values from the model’s x value range as opposed to a regular grid makes sense when the range is huge (e.g. $-\infty$ to $\infty$), and we cannot possibly cover it all with a grid.

Exercise 3

Simulate 1000 independent rolls of a single fair six-sided die by using rand to sample from an integer value in $\{1, 2, 3, 4, 5, 6\}$.
Plot these

2 Simulating a Bernoulli trial with `rand`

We can use rand(), which returns a uniformly distributed real value in $[0, 1)$ to determine whether or not an event of given probability occurs or not in a particular sample (outcome, realization, draw, run, etc.) of a random system.

For an event with some given probability value, if the value returned by the rand() function is less than this value we consider the event to have occurred. By default then, if the result of rand() is greater than this value, the event has not occured.

Consider 1000 samples from a uniform distribution on a subset of real number line between $0$ and $1$.

If we consider a uniform distribution on a subset of real number line between $0$ and $1$, and divide it at $p$ of various values, we can see the larger segment of the range assigned to one or the other outcome depending the parameter.

If we consider that rand() will return a sample anywhere in that range with equal probability, we can see how a larger $p$ means a value will fall in the part of the range that maps to the first outcome and vice versa.

if the result of arand() is less than $p$, then we have simulated a sample in which the event has occurred, and conversely, if greater than or equal to $p$, to not have occured.

For example, consider the classical Bernoulli trial modeling an idealized fair coin toss, where there are two possible outcomes, “heads” or “tails”, with the probability of “heads” in any single realization being given by the model parameter $p=0.5$.

The short-hand Julia expression for this is:

rand() < 0.5 ? "heads" : "tails"

"heads"

Or, for a more readable, flexible, extensible, as well as maintanable, we define a named function that takes an argument which allows us to specify the probability parameter:

function bernoulli(p::Float64)::String
    if rand() < p
        "heads"
    else
        "tails"
    end
end
bernoulli(0.5)

"tails"