dendropy.model.parsimony
: The Parsimony Model¶
Models, modeling and model-fitting of parsimony.
- dendropy.model.parsimony.fitch_down_pass(postorder_node_iter, state_sets_attr_name='state_sets', taxon_state_sets_map=None, weights=None, score_by_character_list=None, **kwargs)[source]¶
Returns the parsimony score given a list of nodes in postorder and associated states, using Fitch’s (1971) unordered parsimony algorithm.
- Parameters:
postorder_node_iter (iterable of/over
Node
objects) – An iterable ofNode
objects in in order of post-order traversal of the tree.state_sets_attr_name (str) – Name of attribute on
Node
objects in which state set lists will stored/accessed. IfNone
, then state sets will not be stored on the tree.taxon_state_sets_map (dict[taxon] = state sets) – A dictionary that takes a taxon object as a key and returns a state set list as a value. This will be used to populate the state set of a node that has not yet had its state sets scored and recorded (typically, leaves of a tree that has not yet been processed).
weights (iterable) – A list of weights for each pattern.
score_by_character_list (None or list) – If not
None
, should be a reference to a list object. This list will be populated by the scores on a character-by-character basis.
- Returns:
s (int) – Parismony score of tree.
Notes
Currently this requires a bifurcating tree (even at the root).
Examples
Assume that we have a tree,
tree
, and an associated data set,data
:import dendropy from dendropy.model.parsimony import fitch_down_pass taxa = dendropy.TaxonNamespace() data = dendropy.StandardCharacterMatrix.get_from_path( "apternodus.chars.nexus", "nexus", taxon_namespace=taxa) tree = dendropy.Tree.get_from_path( "apternodus.tre", "nexus", taxon_namespace=taxa) taxon_state_sets_map = data.taxon_state_sets_map(gaps_as_missing=True)
The following will return the parsimony score of the
tree
with respect to the data indata
:score = fitch_down_pass( nodes=tree.postorder_node_iter(), taxon_state_sets_map=taxon_set_map) print(score)
In the above, every
Node
object oftree
will have an attribute added, “state_sets”, that stores the list of state sets from the analysis:for nd in tree: print(nd.state_sets)
If you want to store the list of state sets in a different attribute, e.g., “analysis1_states”:
score = fitch_down_pass( nodes=tree.postorder_node_iter(), state_sets_attr_name="analysis1_states", taxon_state_sets_map=taxon_set_map) print(score) for nd in tree: print(nd.analysis1_states)
Or not to store these at all:
score = fitch_down_pass( nodes=tree.postorder_node_iter(), state_sets_attr_name=None, taxon_state_sets_map=taxon_set_map) print(score)
Scoring custom data can be done by something like the following:
taxa = dendropy.TaxonNamespace() taxon_state_sets_map = {} t1 = taxa.require_taxon("A") t2 = taxa.require_taxon("B") t3 = taxa.require_taxon("C") t4 = taxa.require_taxon("D") t5 = taxa.require_taxon("E") taxon_state_sets_map[t1] = [ set([0,1]), set([0,1]), set([0]), set([0]) ] taxon_state_sets_map[t2] = [ set([1]), set([1]), set([1]), set([0]) ] taxon_state_sets_map[t3] = [ set([0]), set([1]), set([1]), set([0]) ] taxon_state_sets_map[t4] = [ set([0]), set([1]), set([0,1]), set([1]) ] taxon_state_sets_map[t5] = [ set([1]), set([0]), set([1]), set([1]) ] tree = dendropy.Tree.get_from_string( "(A,(B,(C,(D,E))));", "newick", taxon_namespace=taxa) score = fitch_down_pass(tree.postorder_node_iter(), taxon_state_sets_map=taxon_state_sets_map) print(score)
- dendropy.model.parsimony.fitch_up_pass(preorder_node_iter, state_sets_attr_name='state_sets', taxon_state_sets_map=None, **kwargs)[source]¶
Finalizes the state set lists associated with each node using the “final phase” of Fitch’s (1971) unordered parsimony algorithm.
- Parameters:
preorder_node_iter (iterable of/over
Node
objects) – An iterable ofNode
objects in in order of post-order traversal of the tree.state_sets_attr_name (str) – Name of attribute on
Node
objects in which state set lists will stored/accessed. IfNone
, then state sets will not be stored on the tree.taxon_state_sets_map (dict[taxon] = state sets) – A dictionary that takes a taxon object as a key and returns a state set list as a value. This will be used to populate the state set of a node that has not yet had its state sets scored and recorded (typically, leaves of a tree that has not yet been processed).
Notes
Currently this requires a bifurcating tree (even at the root).
Examples
taxa = dendropy.TaxonNamespace() data = dendropy.StandardCharacterMatrix.get_from_path( "apternodus.chars.nexus", "nexus", taxon_namespace=taxa) tree = dendropy.Tree.get_from_path( "apternodus.tre", "nexus", taxon_namespace=taxa) taxon_state_sets_map = data.taxon_state_sets_map(gaps_as_missing=True) score = fitch_down_pass(tree.postorder_node_iter(), taxon_state_sets_map=taxon_state_sets_map) print(score) fitch_up_pass(tree.preorder_node_iter()) for nd in tree: print(nd.state_sets)
- dendropy.model.parsimony.parsimony_score(tree, chars, gaps_as_missing=True, weights=None, score_by_character_list=None)[source]¶
Calculates the score of a tree,
tree
, given some character data,chars
, under the parsimony model using the Fitch algorithm.- Parameters:
tree (a
Tree
instance) – ATree
to be scored. Must reference the sameTaxonNamespace
aschars
.chars (a
CharacterMatrix
instance) – ACharacterMatrix
-derived object with data to be scored. Must have the sameTaxonNamespace
astree
.gap_as_missing (bool) – If
True
[default], then gaps will be treated as missing data. IfFalse
, then gaps will be treated as a new/additional state.weights (iterable) – A list of weights for each pattern/column in the matrix.
score_by_character_list (None or list) – If not
None
, should be a reference to a list object. This list will be populated by the scores on a character-by-character basis.
- Returns:
pscore (int) – The parsimony score of the tree given the data.
Examples
import dendropy from dendropy.calculate import treescore # establish common taxon namespace taxon_namespace = dendropy.TaxonNamespace() # Read data; if data is, e.g., "standard", use StandardCharacterMatrix. # If unsure of data type, can do: # dataset = dendropy.DataSet.get( # path="path/to/file.nex", # schema="nexus", # taxon_namespace=tns,) # chars = dataset.char_matrices[0] chars = dendropy.DnaCharacterMatrix.get( path="pythonidae.chars.nexus", schema="nexus", taxon_namespace=taxon_namespace) tree = dendropy.Tree.get( path="pythonidae.mle.newick", schema="newick", taxon_namespace=taxon_namespace) # We store the site-specific scores here # This is optional; if we do not want to # use the per-site scores, just pass in |None| # for the ``score_by_character_list`` argument # or do not specify this argument at all. score_by_character_list = [] score = treescore.parsimony_score( tree, chars, gaps_as_missing=False, score_by_character_list=score_by_character_list) # Print the results: the score print("Score: {}".format(score)) # Print the results: the per-site scores for idx, x in enumerate(score_by_character_list): print("{}: {}".format(idx+1, x))
Notes
If the same data is going to be used to score multiple trees or multiple times, it is probably better to generate the ‘taxon_state_sets_map’ once and call “fitch_down_pass” directly yourself, as this function generates a new map each time.