dendropy.model.coalescent
: The Coalescent¶
Functions, classes, and methods for working with Kingman’s n-coalescent framework.
- dendropy.model.coalescent.coalesce_nodes(nodes, pop_size=None, period=None, rng=None, use_expected_tmrca=False)[source]¶
Returns a list of nodes that have not yet coalesced once
period
is exhausted.This function will a draw a coalescence time,
t
, from an exponential distribution with a rate ofchoose(k, 2)
, wherek
is the number of nodes. Ifperiod
is given and if this time is less thanperiod
, or ifperiod
is not given, then two nodes are selected at random fromnodes
, and coalesced: a new node is created, and the two nodes are added as child_nodes to this node with an edge length such the the total length from tip to the ancestral node is equal to the depth of the deepest child +t
. The two nodes are removed from the list of nodes, and the new node is added to it.t
is then deducted fromperiod
, and the process repeats.The function ends and returns the list of nodes once
period
is exhausted or if any draw oft
exceedsperiod
, ifperiod
is given or when there is only one node left.As each coalescent event occurs, all nodes have their edges extended to the point of the coalescent event. In the case of constrained coalescence, all uncoalesced nodes have their edges extended to the end of the period (coalesced nodes have the edges fixed by the coalescent event in their ancestor). Thus multiple calls to this method with the same set of nodes will gradually ‘grow’ the edges, until all the the nodes coalesce. The edge lengths of the nodes passed to this method thus should not be modified or reset until the process is complete.
- Parameters:
nodes (iterable[
Node
]) – An interable ofNode
objects representing a sample of neutral genes (some, all, or none of these nodes may have descendent nodes).pop_size (integer) – The effective haploid population size; i.e., number of genes in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals.
period (numeric) – The time that the genes have to coalesce. If
pop_size
is 1 or 0 or None, then time is in haploid population units; i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations.rng (
Random
object) – The random number generator instance to use. If not specified, the default RNG will be used.use_expected_tmrca (bool) – If
True
, then instead of random times, the expected times will be used.
- Returns:
nodes (iterable[|Node|]) – A list of nodes once
period
is exhausted or if any draw oft
exceedsperiod
, ifperiod
is given or when there is only one node left.
- dendropy.model.coalescent.constrained_kingman_tree(pop_tree, gene_tree_list=None, rng=None, gene_node_label_fn=None, gene_sampling_strategy='random_uniform', num_genes=None, num_genes_attr='num_genes', pop_size_attr='pop_size', decorate_original_tree=False)[source]¶
Given a population tree,
pop_tree
this will return a pair of trees: a gene tree simulated on this population tree based on Kingman’s n-coalescent, and population tree with the additional attribute ‘gene_nodes’ on each node, which is a list of uncoalesced nodes from the gene tree associated with the given node from the population tree.pop_tree
: a Tree object.gene_sampling_strategy
: string“node_attribute”: Will expect each leaf of
pop_tree
to have an attribute,num_genes
, that specifies the number of genes to be sampled from that population.“fixed_per_population”: Will assign
num_genes
to each population.“random_uniform”: Will assign genes to leaves with uniform probability until
num_genes
genes have been assigned.
pop_size_attr
: stringThe attribute name of the edges of
pop_tree
that specify the population size. By default it ispop_size
. The should specify the effective haploid population size; i.e., number of gene in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals.
If
pop_size
is 1 or 0 or None, then the edge lengths ofpop_tree
is taken to be in haploid population units; i.e. where 1 unit equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise the edge lengths ofpop_tree
is taken to be in generations.If
gene_tree_list
is given, then the gene tree is added to the tree block, and the tree block’s taxa block will be used to manage the gene tree’staxa
.gene_node_label_fn
is a function that takes two arguments (a string and an integer, respectively, where the string is the containing species taxon label and the integer is the gene index) and returns a label for the corresponding the gene node.if
decorate_original_tree
is True, then the list of uncoalesced nodes at each node of the population tree is added to the original (input) population tree instead of a copy.If
num_genes
is None, then it will be set to 1 under the “node_attribute” strategy (serving as a fallback default for nodes that do not spcifynum_genes_attr
) or the leaf count ofpop_tree
under therandom_uniform
strategy.Note that this function does very much the same thing as
contained_coalescent_tree()
, but provides a very different API.
- dendropy.model.coalescent.contained_coalescent_tree(containing_tree, gene_to_containing_taxon_map, edge_pop_size_attr='pop_size', default_pop_size=1, rng=None)[source]¶
Returns a gene tree simulated under the coalescent contained within a population or species tree.
containing_tree
The population or species tree. If
edge_pop_size_map
is not None, and population sizes given are non-trivial (i.e., >1), then edge lengths on this tree are in units of generations. Otherwise edge lengths are in population units; i.e. 2N generations for diploid populations of size N, or N generations for diploid populations of size N.gene_to_containing_taxon_map
A TaxonNamespaceMapping object mapping Taxon objects in the
containing_tree
TaxonNamespace to corresponding Taxon objects in the resulting gene tree.edge_pop_size_attr
Name of attribute of edges that specify population size. By default this is “pop_size”. If this attribute does not exist,
default_pop_size
will be used. The value for this attribute should be the haploid population size or the number of genes; i.e. 2N for a diploid population of N individuals, or N for a haploid population of N individuals. This value determines how branch length units are interpreted in the input tree,containing_tree
. If a biologically-meaningful value, then branch lengths on thecontaining_tree
are properly read as generations. If not (e.g. 1 or 0), then they are in population units, i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations. If this argument is None, then population sizes default todefault_pop_size
.default_pop_size
Population size to use if
edge_pop_size_attr
is None or if an edge does not have the attribute. Defaults to 1.
The returned gene tree will have the following extra attributes:
pop_node_genes
A dictionary with nodes of
containing_tree
as keys and a list of gene tree nodes that are uncoalesced as values.
Note that this function does very much the same thing as
constrained_kingman_tree()
, but provides a very different API.
- dendropy.model.coalescent.discrete_time_to_coalescence(n_genes, pop_size=None, n_to_coalesce=2, rng=None)[source]¶
A random draw from the “Kingman distribution” (discrete time version): Time to go from
n_genes
genes ton_genes
-1 genes in a discrete-time Wright-Fisher population ofpop_size
genes; i.e. waiting time untiln-genes
lineages coalesce in a population ofpop_size
genes.- Parameters:
n_genes (integer) – The number of genes in the sample. Must be greater than or equal to n_to_coalesce.
pop_size (integer) – The effective haploid population size; i.e., number of genes in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals.
n_to_coalesce (integer) – The waiting time that will be returned will be the waiting time for this number of genes in the sample to coalesce.
rng (
Random
object) – The random number generator instance.
- Returns:
k (integer) – A randomly-generated waiting time (in discrete generations) for
n_to_coalesce
genes to coalesce out of a sample ofn_genes
in a population ofpop_size
genes.
- dendropy.model.coalescent.expected_tmrca(n_genes, pop_size=None, n_to_coalesce=2)[source]¶
Expected (mean) value for the Time to the Most Recent Common Ancestor of
n_to_coalesce
genes in a sample ofn_genes
drawn from a population ofpop_size
genes.- Parameters:
n_genes (integer) – The number of genes in the sample.
pop_size (integer) – The effective haploid population size; i.e., number of genes in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals.
n_to_coalesce (integer) – The waiting time that will be returned will be the waiting time for this number of genes in the sample to coalesce.
rng (
Random
object) – The random number generator instance.
- Returns:
k (float) – The expected waiting time (in continuous time) for
n_to_coalesce
genes to coalesce out of a sample ofn_genes
in a population ofpop_size
genes.
- dendropy.model.coalescent.extract_coalescent_frames(tree, ultrametricity_precision=1e-05)[source]¶
Returns a list of tuples describing the coalescent frames on the tree. That is, each element in the list is tuple pair consisting of where: the first element of the pair is the number of separate lineages remaining on the tree at coalescence event, and the second element of the pair is the time between this coalescence event and the earlier (more recent) one.
- Parameters:
tree (
Tree
) – A tree instance.ultrametricity_precision (float) – When calculating the node ages, an error will be raised if the tree is not ultrametric. This error may be due to floating-point or numerical imprecision. You can set the precision of the ultrametricity validation by setting the
ultrametricity_precision
parameter. E.g., useultrametricity_precision=0.01
for a more relaxed precision, down to 2 decimal places. Useultrametricity_precision=False
to disable checking of ultrametricity.
- Returns:
x (dict) – Returns dictionary, with key = number of alleles, and values = waiting time for coalescent for the given tree
- dendropy.model.coalescent.log_probability_of_coalescent_frames(coalescent_frames, haploid_pop_size)[source]¶
Under the classical neutral coalescent \(\citep{Kingman1982, Kingman1982b}\), the waiting times between coalescent events in a sample of \(k\) alleles segregating in a population of (haploid) size \(N_e\) is distributed exponentially with a rate parameter of \(\frac{{k \choose 2}}{N_e}\):
\[\Pr(T) = \frac{{k \choose 2}}{N_e} e^{- \frac{{k \choose 2}}{N_e} T},\]where \(T\) is the length of (chronological) time in which there are \(k\) alleles in the sample (i.e., for \(k\) alleles to coalesce into \(k-1\) alleles).
- dendropy.model.coalescent.log_probability_of_coalescent_tree(tree, haploid_pop_size, ultrametricity_precision=1e-05)[source]¶
Wraps up extraction of coalescent frames and reporting of probability.
- dendropy.model.coalescent.mean_kingman_tree(taxon_namespace, pop_size=1, rng=None)[source]¶
Returns a tree with coalescent intervals given by the expected times under Kingman’s neutral coalescent.
- dendropy.model.coalescent.node_waiting_time_pairs(tree, ultrametricity_precision=1e-05)[source]¶
Returns a list of tuples of (nodes, coalescent interval time) on the tree. That is, each element in the list is tuple pair consisting of where: the first element of the pair is an internal node representing a coalescent event on the tree, and the second element of the pair is the time between this coalescence event and the earlier (more recent) one.
- Parameters:
tree (
Tree
) – A tree instance.ultrametricity_precision (float) – When calculating the node ages, an error will be raised if the tree is not ultrametric. This error may be due to floating-point or numerical imprecision. You can set the precision of the ultrametricity validation by setting the
ultrametricity_precision
parameter. E.g., useultrametricity_precision=0.01
for a more relaxed precision, down to 2 decimal places. Useultrametricity_precision=False
to disable checking of ultrametricity.
- Returns:
x (list of tuples (node, coalescent interval)) – Returns list of tuples of (node, coalescent interval [= time between last coalescent event and current node age])
- dendropy.model.coalescent.pure_kingman_tree(taxon_namespace, pop_size=1, rng=None)[source]¶
Generates a tree under the unconstrained Kingman’s coalescent process.
- Parameters:
taxon_namespace (
TaxonNamespace
instance) – A pre-populatedTaxonNamespace
where the containedTaxon
instances represent the genes or individuals sampled from the population.pop_size (numeric) – The size of the population from the which the coalescent process is sampled.
- Returns:
t (|Tree|) – A tree sampled from the Kingman’s neutral coalescent.
- dendropy.model.coalescent.pure_kingman_tree_shape(num_leaves, pop_size=1, rng=None)[source]¶
Like
dendropy.model.pure_kingman_tree
, but does not assign taxa to tips.- Parameters:
num_leaves (int) – Number of individuals/genes sampled.
pop_size (numeric) – The size of the population from the which the coalescent process is sampled.
- Returns:
t (|Tree|) – A tree sampled from the Kingman’s neutral coalescent.
- dendropy.model.coalescent.time_to_coalescence(n_genes, pop_size=None, n_to_coalesce=2, rng=None)[source]¶
A random draw from the “Kingman distribution” (discrete time version): Time to go from
n_genes
genes ton_genes
-1 genes in a continuous-time Wright-Fisher population ofpop_size
genes; i.e. waiting time untiln-genes
lineages coalesce in a population ofpop_size
genes.Given the number of gene lineages in a sample,
n_genes
, and a population size,pop_size
, this function returns a random number from an exponential distribution with rate \(\choose(``pop_size`\), 2)`.pop_size
is the effective haploid population size; i.e., number of gene in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals. Ifpop_size
is 1 or 0 or None, then time is in haploid population units; i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations.The coalescence time, or the waiting time for the coalescence, of two gene lineages evolving in a population with haploid size \(N\) is an exponentially-distributed random variable with rate of \(N\) an expectation of \(\frac{1}{N}\)). The waiting time for coalescence of any two gene lineages in a sample of \(n\) gene lineages evolving in a population with haploid size \(N\) is an exponentially-distributed random variable with rate of \(\choose{N, 2} and an expectation of :math:\)frac{1}{choose{N, 2}}`.
- Parameters:
n_genes (integer) – The number of genes in the sample.
pop_size (integer) – The effective haploid population size; i.e., number of genes in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals.
n_to_coalesce (integer) – The waiting time that will be returned will be the waiting time for this number of genes in the sample to coalesce.
rng (
Random
object) – The random number generator instance to use.
- Returns:
k (float) – A randomly-generated waiting time (in continuous time) for
n_to_coalesce
genes to coalesce out of a sample ofn_genes
in a population ofpop_size
genes.