dendropy.model.discrete: Discrete Character Evolution

Models and modeling of discrete character evolution.

class dendropy.model.discrete.DiscreteCharacterEvolutionModel(state_alphabet, stationary_freqs=None, rng=None)[source]

Base class for discrete character substitution models.

__init__ initializes the state_alphabet to define the character type on which this model acts. The objects random number generator will be rng or ‘GLOBAL_RNG’

pmatrix(tlen, rate=1.0)[source]

Returns a matrix of nucleotide substitution probabilities.

simulate_descendant_states(ancestral_states, edge_length, mutation_rate=1.0, rng=None)[source]

Returns descendent sequence given ancestral sequence.

class dendropy.model.discrete.DiscreteCharacterEvolver(seq_model=None, mutation_rate=None, seq_attr='sequences', seq_model_attr='seq_model', edge_length_attr='length', edge_rate_attr='mutation_rate', seq_label_attr='taxon')[source]

Evolves sequences on a tree.

__init__ sets up meta-data dealing with object nomenclature and semantics.

evolve_states(tree, seq_len, root_states=None, simulate_root_states=True, in_place=True, rng=None)[source]

Appends a new sequence of length seq_len to a list at each node in tree. The attribute name of this list in each node is given by seq_attr. If seq_model is None, tree.seq_model or seq_model at each node must be specified. If in_place is False, the tree is copied first, otherwise original tree is modified. If root_states is given, this will be used as the sequence for the root. If not, and if simulate_root_states is True, then the sequence for the root will be drawn from the stationary distribution of the character model.

extend_char_matrix_with_characters_on_tree(char_matrix, tree, include=None, exclude=None)[source]

Creates a character matrix with new sequences (or extends sequences of an existing character matrix if provided via char_matrix), where the the sequence for each taxon corresponds to the concatenation of all sequences in the list of sequences associated with tip that references the given taxon. Specific sequences to be included/excluded can be fine-tuned using the include and exclude args, where include=None means to include all by default, and exclude=None means to exclude all by default.

class dendropy.model.discrete.Hky85(kappa=1.0, base_freqs=None, state_alphabet=None, rng=None)[source]

Hasegawa et al. 1985 model. Implementation following Swofford et al., 1996.

__init__: if no arguments given, defaults to JC69.

corrected_substitution_rate(rate)[source]

Returns the factor that we have to multiply to the branch length to make branch lengths proportional to # of substitutions per site.

pij(state_i, state_j, tlen, rate=1.0)[source]

Returns probability, p_ij, of going from state i to state j over time tlen at given rate. (tlen * rate = nu, expected number of substitutions)

pmatrix(tlen, rate=1.0)[source]

Returns a matrix of nucleotide substitution probabilities. Based on analytical solution by Swofford et al., 1996. (tlen * rate = nu, expected number of substitutions)

pvector(state, tlen, rate=1.0)[source]

Returns a vector of transition probabilities for a given state over time tlen at rate rate for state. (tlen * rate = nu, expected number of substitutions)

qmatrix(rate=1.0)[source]

Returns the instantaneous rate of change matrix.

class dendropy.model.discrete.Jc69(state_alphabet=None, rng=None)[source]

Jukes-Cantor 1969 model. Specializes HKY85 such that kappa = 1.0, and base frequencies = [0.25, 0.25, 0.25, 0.25].

__init__: uses Hky85.__init__

class dendropy.model.discrete.NucleotideCharacterEvolutionModel(base_freqs=None, state_alphabet=None, rng=None)[source]

General nucleotide substitution model.

__init__ calls SeqModel.__init__ and sets the base_freqs field

is_purine(state_index)[source]

Returns True if state_index represents a purine (A or G) row or column index: 0, 2

is_purine_transition(state1_idx, state2_idx)[source]

Returns True if the change from state1 to state2, as represented by the row or column indices, is a purine transitional change.

is_pyrimidine(state_index)[source]

Returns True if state_index represents a pyrimidine (C or T) row or column index: 1, 3

is_pyrimidine_transition(state1_idx, state2_idx)[source]

Returns True if the change from state1 to state2, as represented by the row or column indices, is a pyrimidine transitional change.

is_transition(state1_idx, state2_idx)[source]

Returns True if the change from state1 to state2, as represented by the row or column indices, is a transitional change.

is_transversion(state1_idx, state2_idx)[source]

Returns True if the change from state1 to state2, as represented by the row or column indices, is a transversional change.

stationary_sample(seq_len, rng=None)[source]

Returns a NucleotideSequence() object with length length representing a sample of characters drawn from this model’s stationary distribution.

dendropy.model.discrete.hky85_chars(seq_len, tree_model, mutation_rate=1.0, kappa=1.0, base_freqs=[0.25, 0.25, 0.25, 0.25], root_states=None, char_matrix=None, retain_sequences_on_tree=False, rng=None)[source]

Convenience class to wrap generation of characters (as a CharacterBlock object) based on the HKY model.

Parameters:
  • seq_len (int) – Length of sequence (number of characters).

  • tree_model (Tree) – Tree on which to simulate.

  • mutation_rate (float) – Mutation modifier rate (should be 1.0 if branch lengths on tree reflect true expected number of changes).

  • root_states`` (list) – Vector of root states (length must equal seq_len).

  • char_matrix (DnaCharacterMatrix) – If given, new sequences for taxa on tree_model leaf_nodes will be appended to existing sequences of corresponding taxa in char_matrix; if not, a new DnaCharacterMatrix object will be created.

  • retain_sequences_on_tree (bool) – If False, sequence annotations will be cleared from tree after simulation. Set to True if you want to, e.g., evolve and accumulate different sequences on tree, or retain information for other purposes.

  • rng (random number generator) – If not given, ‘GLOBAL_RNG’ will be used.

Returns:

  • d (|DnaCharacterMatrix|) – The simulated alignment.

  • Since characters will be appended to existing sequences, you can simulate a

  • sequences under a mixed model by calling this method multiple times with

  • different character model parameter values and/or different mutation

  • rates, passing in the same char_matrix object each time.

dendropy.model.discrete.simulate_discrete_char_dataset(seq_len, tree_model, seq_model, mutation_rate=1.0, root_states=None, dataset=None, rng=None)[source]

Wrapper to conveniently generate a DataSet simulated under the given tree and character model.

Parameters:
  • seq_len (int) – Length of sequence (number of characters).

  • tree_model (Tree) – Tree on which to simulate.

  • seq_model (dendropy.model.discrete.NucleotideCharacterEvolutionModel) – The character substitution model under which to to evolve the characters.

  • mutation_rate (float) – Mutation modifier rate (should be 1.0 if branch lengths on tree reflect true expected number of changes).

  • root_states`` (list) – Vector of root states (length must equal seq_len).

  • dataset (DataSet) – If given, the new dendropy.CharacterMatrix object will be added to this (along with a new taxon_namespace if required). Otherwise, a new dendropy.DataSet object will be created.

  • rng (random number generator) – If not given, ‘GLOBAL_RNG’ will be used.

Returns:

d (|DataSet|)

dendropy.model.discrete.simulate_discrete_chars(seq_len, tree_model, seq_model, mutation_rate=1.0, root_states=None, char_matrix=None, retain_sequences_on_tree=False, rng=None)[source]

Wrapper to conveniently generate a characters simulated under the given tree and character model.

Since characters will be appended to existing sequences, you can simulate a sequences under a mixed model by calling this method multiple times with different character models and/or different mutation rates, passing in the same char_matrix object each time.

Parameters:
  • seq_len (int) – Length of sequence (number of characters).

  • tree_model (Tree) – Tree on which to simulate.

  • seq_model (dendropy.model.discrete.NucleotideCharacterEvolutionModel) – The character substitution model under which to to evolve the characters.

  • mutation_rate (float) – Mutation modifier rate (should be 1.0 if branch lengths on tree reflect true expected number of changes).

  • root_states`` (list) – Vector of root states (length must equal seq_len).

  • char_matrix (DnaCharacterMatrix) – If given, new sequences for taxa on tree_model leaf_nodes will be appended to existing sequences of corresponding taxa in char_matrix; if not, a new DnaCharacterMatrix object will be created.

  • retain_sequences_on_tree (bool) – If False, sequence annotations will be cleared from tree after simulation. Set to True if you want to, e.g., evolve and accumulate different sequences on tree, or retain information for other purposes.

  • rng (random number generator) – If not given, ‘GLOBAL_RNG’ will be used.

Returns:

d (a dendropy.datamodel.CharacterMatrix object.)