dendropy.model.coalescent: The Coalescent¶
Functions, classes, and methods for working with Kingman’s n-coalescent framework.
- dendropy.model.coalescent.coalesce_nodes(nodes, pop_size=None, period=None, rng=None, use_expected_tmrca=False)[source]¶
- Returns a list of nodes that have not yet coalesced once - periodis exhausted.- This function will a draw a coalescence time, - t, from an exponential distribution with a rate of- choose(k, 2), where- kis the number of nodes. If- periodis given and if this time is less than- period, or if- periodis not given, then two nodes are selected at random from- nodes, and coalesced: a new node is created, and the two nodes are added as child_nodes to this node with an edge length such the the total length from tip to the ancestral node is equal to the depth of the deepest child +- t. The two nodes are removed from the list of nodes, and the new node is added to it.- tis then deducted from- period, and the process repeats.- The function ends and returns the list of nodes once - periodis exhausted or if any draw of- texceeds- period, if- periodis given or when there is only one node left.- As each coalescent event occurs, all nodes have their edges extended to the point of the coalescent event. In the case of constrained coalescence, all uncoalesced nodes have their edges extended to the end of the period (coalesced nodes have the edges fixed by the coalescent event in their ancestor). Thus multiple calls to this method with the same set of nodes will gradually ‘grow’ the edges, until all the the nodes coalesce. The edge lengths of the nodes passed to this method thus should not be modified or reset until the process is complete. - Parameters:
- nodes (iterable[ - Node]) – An interable of- Nodeobjects representing a sample of neutral genes (some, all, or none of these nodes may have descendent nodes).
- pop_size (integer) – The effective haploid population size; i.e., number of genes in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals. 
- period (numeric) – The time that the genes have to coalesce. If - pop_sizeis 1 or 0 or None, then time is in haploid population units; i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations.
- rng ( - Randomobject) – The random number generator instance to use. If not specified, the default RNG will be used.
- use_expected_tmrca (bool) – If - True, then instead of random times, the expected times will be used.
 
- Returns:
- nodes (iterable[|Node|]) – A list of nodes once - periodis exhausted or if any draw of- texceeds- period, if- periodis given or when there is only one node left.
 
- dendropy.model.coalescent.constrained_kingman_tree(pop_tree, gene_tree_list=None, rng=None, gene_node_label_fn=None, gene_sampling_strategy='random_uniform', num_genes=None, num_genes_attr='num_genes', pop_size_attr='pop_size', decorate_original_tree=False)[source]¶
- Given a population tree, - pop_treethis will return a pair of trees: a gene tree simulated on this population tree based on Kingman’s n-coalescent, and population tree with the additional attribute ‘gene_nodes’ on each node, which is a list of uncoalesced nodes from the gene tree associated with the given node from the population tree.- pop_tree: a Tree object.- gene_sampling_strategy: string
- “node_attribute”: Will expect each leaf of - pop_treeto have an attribute,- num_genes, that specifies the number of genes to be sampled from that population.
- “fixed_per_population”: Will assign - num_genesto each population.
- “random_uniform”: Will assign genes to leaves with uniform probability until - num_genesgenes have been assigned.
 
- pop_size_attr: string
- The attribute name of the edges of - pop_treethat specify the population size. By default it is- pop_size. The should specify the effective haploid population size; i.e., number of gene in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals.
 - If - pop_sizeis 1 or 0 or None, then the edge lengths of- pop_treeis taken to be in haploid population units; i.e. where 1 unit equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise the edge lengths of- pop_treeis taken to be in generations.- If - gene_tree_listis given, then the gene tree is added to the tree block, and the tree block’s taxa block will be used to manage the gene tree’s- taxa.- gene_node_label_fnis a function that takes two arguments (a string and an integer, respectively, where the string is the containing species taxon label and the integer is the gene index) and returns a label for the corresponding the gene node.- if - decorate_original_treeis True, then the list of uncoalesced nodes at each node of the population tree is added to the original (input) population tree instead of a copy.- If - num_genesis None, then it will be set to 1 under the “node_attribute” strategy (serving as a fallback default for nodes that do not spcify- num_genes_attr) or the leaf count of- pop_treeunder the- random_uniformstrategy.- Note that this function does very much the same thing as - contained_coalescent_tree(), but provides a very different API.
- dendropy.model.coalescent.contained_coalescent_tree(containing_tree, gene_to_containing_taxon_map, edge_pop_size_attr='pop_size', default_pop_size=1, rng=None)[source]¶
- Returns a gene tree simulated under the coalescent contained within a population or species tree. - containing_tree
- The population or species tree. If - edge_pop_size_mapis not None, and population sizes given are non-trivial (i.e., >1), then edge lengths on this tree are in units of generations. Otherwise edge lengths are in population units; i.e. 2N generations for diploid populations of size N, or N generations for diploid populations of size N.
- gene_to_containing_taxon_map
- A TaxonNamespaceMapping object mapping Taxon objects in the - containing_treeTaxonNamespace to corresponding Taxon objects in the resulting gene tree.
- edge_pop_size_attr
- Name of attribute of edges that specify population size. By default this is “pop_size”. If this attribute does not exist, - default_pop_sizewill be used. The value for this attribute should be the haploid population size or the number of genes; i.e. 2N for a diploid population of N individuals, or N for a haploid population of N individuals. This value determines how branch length units are interpreted in the input tree,- containing_tree. If a biologically-meaningful value, then branch lengths on the- containing_treeare properly read as generations. If not (e.g. 1 or 0), then they are in population units, i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations. If this argument is None, then population sizes default to- default_pop_size.
- default_pop_size
- Population size to use if - edge_pop_size_attris None or if an edge does not have the attribute. Defaults to 1.
 - The returned gene tree will have the following extra attributes: - pop_node_genes
- A dictionary with nodes of - containing_treeas keys and a list of gene tree nodes that are uncoalesced as values.
 - Note that this function does very much the same thing as - constrained_kingman_tree(), but provides a very different API.
- dendropy.model.coalescent.discrete_time_to_coalescence(n_genes, pop_size=None, n_to_coalesce=2, rng=None)[source]¶
- A random draw from the “Kingman distribution” (discrete time version): Time to go from - n_genesgenes to- n_genes-1 genes in a discrete-time Wright-Fisher population of- pop_sizegenes; i.e. waiting time until- n-geneslineages coalesce in a population of- pop_sizegenes.- Parameters:
- n_genes (integer) – The number of genes in the sample. Must be greater than or equal to n_to_coalesce. 
- pop_size (integer) – The effective haploid population size; i.e., number of genes in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals. 
- n_to_coalesce (integer) – The waiting time that will be returned will be the waiting time for this number of genes in the sample to coalesce. 
- rng ( - Randomobject) – The random number generator instance.
 
- Returns:
- k (integer) – A randomly-generated waiting time (in discrete generations) for - n_to_coalescegenes to coalesce out of a sample of- n_genesin a population of- pop_sizegenes.
 
- dendropy.model.coalescent.expected_tmrca(n_genes, pop_size=None, n_to_coalesce=2)[source]¶
- Expected (mean) value for the Time to the Most Recent Common Ancestor of - n_to_coalescegenes in a sample of- n_genesdrawn from a population of- pop_sizegenes.- Parameters:
- n_genes (integer) – The number of genes in the sample. 
- pop_size (integer) – The effective haploid population size; i.e., number of genes in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals. 
- n_to_coalesce (integer) – The waiting time that will be returned will be the waiting time for this number of genes in the sample to coalesce. 
- rng ( - Randomobject) – The random number generator instance.
 
- Returns:
- k (float) – The expected waiting time (in continuous time) for - n_to_coalescegenes to coalesce out of a sample of- n_genesin a population of- pop_sizegenes.
 
- dendropy.model.coalescent.extract_coalescent_frames(tree, ultrametricity_precision=1e-05)[source]¶
- Returns a list of tuples describing the coalescent frames on the tree. That is, each element in the list is tuple pair consisting of where: the first element of the pair is the number of separate lineages remaining on the tree at coalescence event, and the second element of the pair is the time between this coalescence event and the earlier (more recent) one. - Parameters:
- tree ( - Tree) – A tree instance.
- ultrametricity_precision (float) – When calculating the node ages, an error will be raised if the tree is not ultrametric. This error may be due to floating-point or numerical imprecision. You can set the precision of the ultrametricity validation by setting the - ultrametricity_precisionparameter. E.g., use- ultrametricity_precision=0.01for a more relaxed precision, down to 2 decimal places. Use- ultrametricity_precision=Falseto disable checking of ultrametricity.
 
- Returns:
- x (dict) – Returns dictionary, with key = number of alleles, and values = waiting time for coalescent for the given tree 
 
- dendropy.model.coalescent.log_probability_of_coalescent_frames(coalescent_frames, haploid_pop_size)[source]¶
- Under the classical neutral coalescent \(\citep{Kingman1982, Kingman1982b}\), the waiting times between coalescent events in a sample of \(k\) alleles segregating in a population of (haploid) size \(N_e\) is distributed exponentially with a rate parameter of \(\frac{{k \choose 2}}{N_e}\): \[\Pr(T) = \frac{{k \choose 2}}{N_e} e^{- \frac{{k \choose 2}}{N_e} T},\]- where \(T\) is the length of (chronological) time in which there are \(k\) alleles in the sample (i.e., for \(k\) alleles to coalesce into \(k-1\) alleles). 
- dendropy.model.coalescent.log_probability_of_coalescent_tree(tree, haploid_pop_size, ultrametricity_precision=1e-05)[source]¶
- Wraps up extraction of coalescent frames and reporting of probability. 
- dendropy.model.coalescent.mean_kingman_tree(taxon_namespace, pop_size=1, rng=None)[source]¶
- Returns a tree with coalescent intervals given by the expected times under Kingman’s neutral coalescent. 
- dendropy.model.coalescent.node_waiting_time_pairs(tree, ultrametricity_precision=1e-05)[source]¶
- Returns a list of tuples of (nodes, coalescent interval time) on the tree. That is, each element in the list is tuple pair consisting of where: the first element of the pair is an internal node representing a coalescent event on the tree, and the second element of the pair is the time between this coalescence event and the earlier (more recent) one. - Parameters:
- tree ( - Tree) – A tree instance.
- ultrametricity_precision (float) – When calculating the node ages, an error will be raised if the tree is not ultrametric. This error may be due to floating-point or numerical imprecision. You can set the precision of the ultrametricity validation by setting the - ultrametricity_precisionparameter. E.g., use- ultrametricity_precision=0.01for a more relaxed precision, down to 2 decimal places. Use- ultrametricity_precision=Falseto disable checking of ultrametricity.
 
- Returns:
- x (list of tuples (node, coalescent interval)) – Returns list of tuples of (node, coalescent interval [= time between last coalescent event and current node age]) 
 
- dendropy.model.coalescent.pure_kingman_tree(taxon_namespace, pop_size=1, rng=None)[source]¶
- Generates a tree under the unconstrained Kingman’s coalescent process. - Parameters:
- taxon_namespace ( - TaxonNamespaceinstance) – A pre-populated- TaxonNamespacewhere the contained- Taxoninstances represent the genes or individuals sampled from the population.
- pop_size (numeric) – The size of the population from the which the coalescent process is sampled. 
 
- Returns:
- t (|Tree|) – A tree sampled from the Kingman’s neutral coalescent. 
 
- dendropy.model.coalescent.pure_kingman_tree_shape(num_leaves, pop_size=1, rng=None)[source]¶
- Like - dendropy.model.pure_kingman_tree, but does not assign taxa to tips.- Parameters:
- num_leaves (int) – Number of individuals/genes sampled. 
- pop_size (numeric) – The size of the population from the which the coalescent process is sampled. 
 
- Returns:
- t (|Tree|) – A tree sampled from the Kingman’s neutral coalescent. 
 
- dendropy.model.coalescent.time_to_coalescence(n_genes, pop_size=None, n_to_coalesce=2, rng=None)[source]¶
- A random draw from the “Kingman distribution” (discrete time version): Time to go from - n_genesgenes to- n_genes-1 genes in a continuous-time Wright-Fisher population of- pop_sizegenes; i.e. waiting time until- n-geneslineages coalesce in a population of- pop_sizegenes.- Given the number of gene lineages in a sample, - n_genes, and a population size,- pop_size, this function returns a random number from an exponential distribution with rate \(\choose(``pop_size`\), 2)`.- pop_sizeis the effective haploid population size; i.e., number of gene in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals. If- pop_sizeis 1 or 0 or None, then time is in haploid population units; i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations.- The coalescence time, or the waiting time for the coalescence, of two gene lineages evolving in a population with haploid size \(N\) is an exponentially-distributed random variable with rate of \(N\) an expectation of \(\frac{1}{N}\)). The waiting time for coalescence of any two gene lineages in a sample of \(n\) gene lineages evolving in a population with haploid size \(N\) is an exponentially-distributed random variable with rate of \(\choose{N, 2} and an expectation of :math:\)frac{1}{choose{N, 2}}`. - Parameters:
- n_genes (integer) – The number of genes in the sample. 
- pop_size (integer) – The effective haploid population size; i.e., number of genes in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals. 
- n_to_coalesce (integer) – The waiting time that will be returned will be the waiting time for this number of genes in the sample to coalesce. 
- rng ( - Randomobject) – The random number generator instance to use.
 
- Returns:
- k (float) – A randomly-generated waiting time (in continuous time) for - n_to_coalescegenes to coalesce out of a sample of- n_genesin a population of- pop_sizegenes.
 


