dendropy.simulate.treesim
: Unified Namespace Aggregating Functions and Classes for Tree Simulations¶
This module provides a convenient interface that aggregates, wraps, and/or
implements functions and classes that simulate trees under various
models and processes. This module just exposes these function and classes under
the dendropy.simulate.treesim
namespace. The actual functions and classes
are defined under the the appropriate model namespace in the dendropy.model
sub-package.
- dendropy.simulate.treesim.birth_death_tree(birth_rate, death_rate, birth_rate_sd=0.0, death_rate_sd=0.0, **kwargs)[source]¶
Returns a birth-death tree with birth rate specified by
birth_rate
, and death rate specified bydeath_rate
, with edge lengths in continuous (real) units.Tree growth is controlled by one or more of the following arguments, of which at least one must be specified:
If
num_extant_tips
is given as a keyword argument, tree is grown until the number of EXTANT tips equals this number.If
num_extinct_tips
is given as a keyword argument, tree is grown until the number of EXTINCT tips equals this number.If
num_total_tips
is given as a keyword argument, tree is grown until the number of EXTANT plus EXTINCT tips equals this number.If ‘max_time’ is given as a keyword argument, tree is grown for a maximum of
max_time
.If
gsa_ntax
is given then the tree will be simulated up to this number of EXTANT tips (or 0 tips), then a tree will be randomly selected from the intervals which corresond to times at which the tree had exactlynum_extant_tips
leaves. This allows for simulations according to the “General Sampling Approach” of Hartmann et al. (2010). If this option is specified, thennum_extant_tips
MUST be specified andnum_extinct_tips
andnum_total_tips
CANNOT be specified.
If more than one of the above is given, then tree growth will terminate when any one of the termination conditions are met.
- Parameters:
birth_rate (float) – The birth rate.
death_rate (float) – The death rate.
birth_rate_sd (float) – The standard deviation of the normally-distributed mutation added to the birth rate as it is inherited by daughter nodes; if 0, birth rate does not evolve on the tree.
death_rate_sd (float) – The standard deviation of the normally-distributed mutation added to the death rate as it is inherited by daughter nodes; if 0, death rate does not evolve on the tree.
- Keyword Arguments:
num_extant_tips (int) – If specified, branching process is terminated when number of EXTANT tips equals this number.
num_extinct_tips (int) – If specified, branching process is terminated when number of EXTINCT tips equals this number.
num_total_tips (int) – If specified, branching process is terminated when number of EXTINCT plus EXTANT tips equals this number.
max_time (float) – If specified, branching process is terminated when time reaches or exceeds this value.
gsa_ntax (int) – The General Sampling Approach threshold for number of taxa. See above for details.
tree (Tree instance) – If given, then this tree will be used; otherwise a new one will be created.
taxon_namespace (TaxonNamespace instance) – If given, then this will be assigned to the new tree, and, in addition, taxa assigned to tips will be sourced from or otherwise created with reference to this.
is_assign_extant_taxa (bool [default: True]) – If False, then taxa will not be assigned to extant tips. If True (default), then taxa will be assigned to extant tips. Taxa will be assigned from the specified
taxon_namespace
ortree.taxon_namespace
. If the number of taxa required exceeds the number of taxa existing in the taxon namespace, newTaxon
objects will be created as needed and added to the taxon namespace.is_assign_extinct_taxa (bool [default: True]) – If False, then taxa will not be assigned to extant tips. If True (default), then taxa will be assigned to extant tips. Taxa will be assigned from the specified
taxon_namespace
ortree.taxon_namespace
. If the number of taxa required exceeds the number of taxa existing in the taxon namespace, newTaxon
objects will be created as needed and added to the taxon namespace. Note that this option only makes sense if extinct tips are retained (specified via ‘is_retain_extinct_tips’ option), and will otherwise be ignored.is_add_extinct_attr (bool [default: True]) – If True (default), add an boolean attribute indicating whether or not a node is an extinct tip or not. False will skip this. Name of attribute is set by ‘extinct_attr_name’ argument, defaulting to ‘is_extinct’. Note that this option only makes sense if extinct tips are retained (specified via ‘is_retain_extinct_tips’ option), and will otherwise be ignored.
extinct_attr_name (str [default: 'is_extinct']) – Name of attribute to add to nodes indicating whether or not tip is extinct. Note that this option only makes sense if extinct tips are retained (specified via ‘is_retain_extinct_tips’ option), and will otherwise be ignored.
is_retain_extinct_tips (bool [default: False]) – If True, extinct tips will be retained on tree. Defaults to False: extinct lineages removed from tree.
repeat_until_success (bool [default: True]) – Under some conditions, it is possible for all lineages on a tree to go extinct. In this case, if this argument is given as
True
(the default), then a new branching process is initiated. IfFalse
(default), then a TreeSimTotalExtinctionException is raised.rng (random.Random() or equivalent instance) – A Random() object or equivalent can be passed using the
rng
keyword; otherwise GLOBAL_RNG is used.
References
Hartmann, Wong, and Stadler “Sampling Trees from Evolutionary Models” Systematic Biology. 2010. 59(4). 465-476
- dendropy.simulate.treesim.constrained_kingman_tree(pop_tree, gene_tree_list=None, rng=None, gene_node_label_fn=None, gene_sampling_strategy='random_uniform', num_genes=None, num_genes_attr='num_genes', pop_size_attr='pop_size', decorate_original_tree=False)[source]¶
Given a population tree,
pop_tree
this will return a pair of trees: a gene tree simulated on this population tree based on Kingman’s n-coalescent, and population tree with the additional attribute ‘gene_nodes’ on each node, which is a list of uncoalesced nodes from the gene tree associated with the given node from the population tree.pop_tree
: a Tree object.gene_sampling_strategy
: string“node_attribute”: Will expect each leaf of
pop_tree
to have an attribute,num_genes
, that specifies the number of genes to be sampled from that population.“fixed_per_population”: Will assign
num_genes
to each population.“random_uniform”: Will assign genes to leaves with uniform probability until
num_genes
genes have been assigned.
pop_size_attr
: stringThe attribute name of the edges of
pop_tree
that specify the population size. By default it ispop_size
. The should specify the effective haploid population size; i.e., number of gene in the population: 2 * N in a diploid population of N individuals, or N in a haploid population of N individuals.
If
pop_size
is 1 or 0 or None, then the edge lengths ofpop_tree
is taken to be in haploid population units; i.e. where 1 unit equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise the edge lengths ofpop_tree
is taken to be in generations.If
gene_tree_list
is given, then the gene tree is added to the tree block, and the tree block’s taxa block will be used to manage the gene tree’staxa
.gene_node_label_fn
is a function that takes two arguments (a string and an integer, respectively, where the string is the containing species taxon label and the integer is the gene index) and returns a label for the corresponding the gene node.if
decorate_original_tree
is True, then the list of uncoalesced nodes at each node of the population tree is added to the original (input) population tree instead of a copy.If
num_genes
is None, then it will be set to 1 under the “node_attribute” strategy (serving as a fallback default for nodes that do not spcifynum_genes_attr
) or the leaf count ofpop_tree
under therandom_uniform
strategy.Note that this function does very much the same thing as
contained_coalescent_tree()
, but provides a very different API.
- dendropy.simulate.treesim.contained_coalescent_tree(containing_tree, gene_to_containing_taxon_map, edge_pop_size_attr='pop_size', default_pop_size=1, rng=None)[source]¶
Returns a gene tree simulated under the coalescent contained within a population or species tree.
containing_tree
The population or species tree. If
edge_pop_size_map
is not None, and population sizes given are non-trivial (i.e., >1), then edge lengths on this tree are in units of generations. Otherwise edge lengths are in population units; i.e. 2N generations for diploid populations of size N, or N generations for diploid populations of size N.gene_to_containing_taxon_map
A TaxonNamespaceMapping object mapping Taxon objects in the
containing_tree
TaxonNamespace to corresponding Taxon objects in the resulting gene tree.edge_pop_size_attr
Name of attribute of edges that specify population size. By default this is “pop_size”. If this attribute does not exist,
default_pop_size
will be used. The value for this attribute should be the haploid population size or the number of genes; i.e. 2N for a diploid population of N individuals, or N for a haploid population of N individuals. This value determines how branch length units are interpreted in the input tree,containing_tree
. If a biologically-meaningful value, then branch lengths on thecontaining_tree
are properly read as generations. If not (e.g. 1 or 0), then they are in population units, i.e. where 1 unit of time equals 2N generations for a diploid population of size N, or N generations for a haploid population of size N. Otherwise time is in generations. If this argument is None, then population sizes default todefault_pop_size
.default_pop_size
Population size to use if
edge_pop_size_attr
is None or if an edge does not have the attribute. Defaults to 1.
The returned gene tree will have the following extra attributes:
pop_node_genes
A dictionary with nodes of
containing_tree
as keys and a list of gene tree nodes that are uncoalesced as values.
Note that this function does very much the same thing as
constrained_kingman_tree()
, but provides a very different API.
- dendropy.simulate.treesim.discrete_birth_death_tree(birth_rate, death_rate, birth_rate_sd=0.0, death_rate_sd=0.0, **kwargs)[source]¶
Returns a birth-death tree with birth rate specified by
birth_rate
, and death rate specified bydeath_rate
, with edge lengths in discrete (integer) units.birth_rate_sd
is the standard deviation of the normally-distributed mutation added to the birth rate as it is inherited by daughter nodes; if 0, birth rate does not evolve on the tree.death_rate_sd
is the standard deviation of the normally-distributed mutation added to the death rate as it is inherited by daughter nodes; if 0, death rate does not evolve on the tree.Tree growth is controlled by one or more of the following arguments, of which at least one must be specified:
If
ntax
is given as a keyword argument, tree is grown until the number of tips == ntax.If
taxon_namespace
is given as a keyword argument, tree is grown until the number of tips == len(taxon_namespace), and the taxa are assigned randomly to the tips.If ‘max_time’ is given as a keyword argument, tree is grown for
max_time
number of generations.
If more than one of the above is given, then tree growth will terminate when any of the termination conditions (i.e., number of tips ==
ntax
, or number of tips == len(taxon_namespace) or number of generations =max_time
) are met.Also accepts a Tree object (with valid branch lengths) as an argument passed using the keyword
tree
: if given, then this tree will be used; otherwise a new one will be created.If
assign_taxa
is False, then taxa will not be assigned to the tips; otherwise (default), taxa will be assigned. Iftaxon_namespace
is given (tree.taxon_namespace
, iftree
is given), and the final number of tips on the tree after the termination condition is reached is less then the number of taxa intaxon_namespace
(as will be the case, for example, whenntax
< len(taxon_namespace
)), then a random subset of taxa intaxon_namespace
will be assigned to the tips of tree. If the number of tips is more than the number of taxa in thetaxon_namespace
, new Taxon objects will be created and added to thetaxon_namespace
if the keyword argumentcreate_required_taxa
is not given as False.Under some conditions, it is possible for all lineages on a tree to go extinct. In this case, if the keyword argument
repeat_until_success
isTrue
, then a new branching process is initiated. IfFalse
(default), then a TreeSimTotalExtinctionException is raised.A Random() object or equivalent can be passed using the
rng
keyword; otherwise GLOBAL_RNG is used.
- dendropy.simulate.treesim.mean_kingman_tree(taxon_namespace, pop_size=1, rng=None)[source]¶
Returns a tree with coalescent intervals given by the expected times under Kingman’s neutral coalescent.
- dendropy.simulate.treesim.pure_kingman_tree(taxon_namespace, pop_size=1, rng=None)[source]¶
Generates a tree under the unconstrained Kingman’s coalescent process.
- Parameters:
taxon_namespace (
TaxonNamespace
instance) – A pre-populatedTaxonNamespace
where the containedTaxon
instances represent the genes or individuals sampled from the population.pop_size (numeric) – The size of the population from the which the coalescent process is sampled.
- Returns:
t (|Tree|) – A tree sampled from the Kingman’s neutral coalescent.
- dendropy.simulate.treesim.rand_trees(rng, model_fn, model_kwargs, n_replicates)[source]¶
The model parameters may be specified as: - A single dict or map, in which case it will be repeated
through the replicates,
A function, in which case it will be called for each replicate with two positional arguments: the 0-based replicate index and the random numberg generator object to use; the function should return a dict or mapping of keyword-value pairs for the model simulation call.
An iterable of dicts or maps, for each of which
n_replicates
simulations will be generated, in order.