Trees¶
The Tree
Class¶
Trees in DendroPy are represented by objects of the class Tree
.
Trees consist of a collection of nodes, represented by objects of the class Node
, connected to each other in parent-child or ancestor-descendent relationships by objects of the class Edge
.
The first or initial node of a Tree
is the head of the data structure, and is represented by the seed_node
attribute of a Tree
object.
If the tree is rooted, then this is the root node.
If the tree is unrooted, however, then this is an artificial node that only serves as the initial node when iterating over a tree in preorder or the final node when iterating over the tree in postorder, but does not have any informational significance by itself: it is an algorithmic artifact.
The seed_node
attribute of a Tree
object, like every other node on the tree, is an object of the Node
class.
Every Node
object maintains a list of its immediate child Node
objects as well as a reference to its parent Node
object.
You can iterate over the child of a particular Node
object using the child_node_iter
method, get a shallow-copy list of child Node
objects using the child_nodes
method, or access the parent Node
object directly through the parent_node
attribute of the Node
.
By definition, the seed_node
has no parent node, leaf nodes have no child nodes, and term:internal nodes
have both parent nodes and child nodes.
Every Node
object has an attribute, edge
, which is an Edge
object representing the edge that is incident to or subtends the node represented by that Node
object.
Each Edge
, in turn, has an attribute, head_node
, which is the Node
object representing the node that the edge subtends.
The Tree
, Node
, and Edge
classes all have “annotations
” as an attribute, which is a AnnotationSet
object, i.e. a collection of Annotation
instances tracking metadata.
More information on working with metadata can be found in the “Working with Metadata Annotations” section.
Reading and Writing Tree
Instances¶
The Tree
class supports the “get
” factory class method for simultaneously instantiating and populating a Tree
instance, taking a data source as the first argument and a schema specification string (”nexus
”, “newick
”, “nexml
”, “fasta
”, or “phylip
”, etc.) as the second:
import dendropy
tree = dendropy.Tree.get(
path='pythonidae.mcmc.nex',
schema='nexus')
A Tree
object can be written to an external resource using the “write
” method:
import dendropy
tree = dendropy.Tree.get(
path="trees1.nex",
schema="nexus",
tree_offset=2,
)
tree.write(
path="trees1.newick",
schema="newick",
)
It can also be represented as a string using the “as_string
” method:
import dendropy
tree = dendropy.Tree.get(
path="trees1.nex",
schema="nexus",
)
print(tree.as_string(schema="newick",)
More information on reading operations is available in the Reading and Writing Phylogenetic Data section.
Cloning/Copying a Tree
¶
You can make a “taxon namespace-scoped” copy of a Tree
instance, i.e., where all Node
and associated Edge
instances of a Tree
are cloned, but references to Taxon
objects are preserved, you can call dendropy.datamodel.treemodel.Tree.clone
with a “depth
” argument value of 1 or by copy construction:
import dendropy
# original list
s1 = "(A,(B,C));"
tree1 = dendropy.Tree.get(
data=s1,
schema="newick")
# taxon namespace-scoped deep copy by calling Tree.clone(1)
# I.e. Everything cloned, but with Taxon and TaxonNamespace references shared
tree2 = tree1.clone(depth=1)
# taxon namespace-scoped deep copy by copy-construction
# I.e. Everything cloned, but with Taxon and TaxonNamespace references shared
tree3 = dendropy.Tree(tree1)
# *different* tree instances, with different nodes and edges
for nd1, nd2, nd3 in zip(tree1, tree2, tree3):
assert nd1 is not nd2
assert nd1 is not nd3
assert nd2 is not nd3
# Note: TaxonNamespace is still shared
# I.e. Everything cloned, but with Taxon and TaxonNamespace references shared
assert tree2.taxon_namespace is tree1.taxon_namespace
assert tree3.taxon_namespace is tree1.taxon_namespace
For a true and complete deep-copy, where even the Taxon
and TaxonNamespace
references are copied, call copy.deepcopy
:
import copy
import dendropy
# original list
s1 = "(A,(B,C));"
tree1 = dendropy.Tree.get(
data=s1,
schema="newick")
# Full deep copy by calling copy.deepcopy()
# I.e. Everything cloned including Taxon and TaxonNamespace instances
tree2 = copy.deepcopy(tree1)
# *different* tree instances
for nd1, nd2 in zip(tree1, tree2):
assert nd1 is not nd2
# Note: TaxonNamespace is also different
assert tree2.taxon_namespace is not tree1.taxon_namespace
for tx1 in tree1.taxon_namespace:
assert tx1 not in tree2.taxon_namespace
for tx2 in tree2.taxon_namespace:
assert tx2 not in tree1.taxon_namespace
Alternatively, many times you want a “light” or “thin” copy, where just the tree structure (node and edge relationships) and basic information are retained (e.g., edge lengths, taxon associations, and node and edge labels), but not, e.g. the rich annotations. Then the extract_tree
method is what you are looking for:
import dendropy
from dendropy.calculate import treecompare
tree0 = dendropy.Tree.get(
path="pythonidae.mle.nex",
schema="nexus")
for idx, nd in enumerate(tree0):
nd.label = "hello, world{}".format(idx)
nd.edge.label = "world, hello{}".format(idx)
nd.annotations["color"] = "blue"
nd.edge.annotations["taste"] = "sweet"
tree1 = tree0.extract_tree()
assert tree0.taxon_namespace is tree1.taxon_namespace
assert treecompare.weighted_robinson_foulds_distance(
tree0, tree1) == 0.0
for nd in tree1:
original_node = nd.extraction_source
print("{} on extracted tree corresponds to {} on original tree".format(
nd, original_node))
## basic attributes copied
assert nd.label == original_node.label
assert nd.edge.label == original_node.edge.label
assert nd.edge.length == original_node.edge.length
## but not annotations
assert len(nd.annotations) == 0 and len(original_node.annotations) > 0
assert len(nd.edge.annotations) == 0 and len(original_node.edge.annotations) > 0
Tree Traversal¶
Iterating Over Nodes¶
The following example shows how you might evolve a continuous character on a tree by recursively visting each node, and setting the value of the character to one drawn from a normal distribution centered on the value of the character of the node’s ancestor and standard deviation given by the length of the edge subtending the node:
1#! /usr/bin/env python
2# -*- coding: utf-8 -*-
3
4import random
5import dendropy
6
7def process_node(node, start=1.0):
8 if node.parent_node is None:
9 node.value = start
10 else:
11 node.value = random.gauss(node.parent_node.value, node.edge.length)
12 for child in node.child_nodes():
13 process_node(child)
14 if node.taxon is not None:
15 print("%s : %s" % (node.taxon, node.value))
16
17mle = dendropy.Tree.get(
18 path='pythonidae.mle.nex',
19 schema='nexus')
20process_node(mle.seed_node)
21
While the previous example works, it is probably clearer and more efficient to use one of the pre-defined node iterator methods:
preorder_node_iter
Iterates over nodes in a
Tree
object in a depth-first search pattern, i.e., “visiting” a node before visiting the children of the node. This is the same traversal order as the previous example. This traversal order is useful if you require ancestral nodes to be processed before descendent nodes, as, for example, when evolving sequences over a tree.postorder_node_iter
Iterates over nodes in a
Tree
object in a postorder search pattern, i.e., visiting the children of the node before visiting the node itself. This traversal order is useful if you require descendent nodes to be processed before ancestor nodes, as, for example, when calculating ages of nodes.level_order_node_iter
Iterates over nodes in a
Tree
object in a breadth-first search pattern, i.e., every node at a particular level is visited before proceeding to the next level.leaf_node_iter
Iterates over the leaf or tip nodes of a
Tree
object.
The previous example would thus be better implemented as follows:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import random
import dendropy
def evolve_char(tree, start=1.0):
for node in tree.preorder_node_iter():
if node.parent_node is None:
node.value = 1.0
else:
node.value = random.gauss(node.parent_node.value, node.edge.length)
return tree
mle = dendropy.Tree.get(
path='pythonidae.mle.nex',
schema='nexus')
evolve_char(mle)
for node in mle.leaf_iter():
print("%s : %s" % (node.taxon, node.value))
The nodes returned by each of these iterators can be filtered if a filter function is passed as a second argument to the iterator.
This filter function should take a Node
object as an argument, and return True
if the node is to be returned or False
if it is not. For example, the following iterates over all nodes that have more than two children:
1#! /usr/bin/env python
2# -*- coding: utf-8 -*-
3
4import warnings
5import dendropy
6
7warnings.warn(
8 "This example is known to be broken! "
9 "It will be fixed or removed in the future. "
10 "See https://github.com/jeetsukumaran/DendroPy/issues/160 for details. "
11 "Patch contributions are welcome.",
12)
13
14mle = dendropy.Tree.get(
15 path='pythonidae.mle.nex',
16 schema='nexus')
17multifurcating = lambda x: True if len(x.child_nodes()) > 2 else False
18for nd in mle.postorder_node_iter(multifurcating):
19 print(nd.description(0))
Iterating Over Edges¶
The Edge
objects associated with each Node
can be accessed through the edge
attribute of the Node
object.
So it is possible to iterate over every edge on a tree by iterating over the nodes and referencing the edge
attribute of the node when processing the node.
But it is clearer and probably more convenient to use one of the Edge
iterators:
preorder_edge_iter
Iterates over edges in a
Tree
object in a depth-first search pattern, i.e., “visiting” an edge before visiting the edges descending from that edge. This is the same traversal order as the previous example. This traversal order is useful if you require ancestral edges to be processed before descendent edges, as, for example, when calculating the sum of edge lengths from the root.postorder_edge_iter
Iterates over edges in a
Tree
object in a postorder search pattern, i.e., visiting the descendents of the edge before visiting the edge itself. This traversal order is useful if you require descendent edges to be processed before ancestral edges, as, for example, when calculating the sum of edge lengths from the tiplevel_order_edge_iter
Iterates over edges in a
Tree
object in a breadth-first search pattern, i.e., every edge at a particular level is visited before proceeding to the next level.
The following example sets the edge lengths of a tree to the proportions of the total tree length that they represent:
1#! /usr/bin/env python
2# -*- coding: utf-8 -*-
3
4import dendropy
5
6mle = dendropy.Tree.get(path='pythonidae.mle.nex', schema='nexus')
7mle_len = mle.length()
8for edge in mle.postorder_edge_iter():
9 if edge.length is None:
10 edge.length = 0
11 else:
12 edge.length = float(edge.length)/mle_len
13print(mle.as_string(schema="newick"))
While this one removes the edge lengths entirely:
1#! /usr/bin/env python
2# -*- coding: utf-8 -*-
3
4import dendropy
5
6mle = dendropy.Tree.get(
7 path='pythonidae.mle.nex',
8 schema='nexus')
9mle_len = mle.length()
10for edge in mle.postorder_edge_iter():
11 edge.length = None
12print(mle.as_string(schema="newick"))
Like the node iterators, the edge iterators also optionally take a filter function as a second argument, except here the filter function should take an Edge
object as an argument.
The following example shows how you might iterate over all edges with lengths less than some value:
1#! /usr/bin/env python
2# -*- coding: utf-8 -*-
3
4import warnings
5import dendropy
6
7warnings.warn(
8 "This example is known to be broken! "
9 "It will be fixed or removed in the future. "
10 "See https://github.com/jeetsukumaran/DendroPy/issues/160 for details. "
11 "Patch contributions are welcome.",
12)
13
14mle = dendropy.Tree.get(
15 path='pythonidae.mle.nex',
16 schema='nexus')
17short = lambda edge: True if edge.length < 0.01 else False
18for edge in mle.postorder_edge_iter(short):
19 print(edge.length)
Finding Nodes on Trees¶
Nodes with Particular Taxa¶
To retrieve a node associated with a particular taxon, we can use the find_taxon_node
method, which takes a filter function as an argument.
The filter function should take a Taxon
object as an argument and return True
if the taxon is to be returned.
For example:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree = dendropy.Tree.get(path="pythonidae.mle.nex", schema="nexus")
filter = lambda taxon: True if taxon.label=='Antaresia maculosa' else False
node = tree.find_node_with_taxon(filter)
print(node.description())
Because we might find it easier to refer to Taxon
objects by their labels, a convenience method that wraps the retrieval of nodes associated with Taxon
objects of particular label is provided:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree = dendropy.Tree.get(path="pythonidae.mle.nex", schema="nexus")
node = tree.find_node_with_taxon_label('Antaresia maculosa')
print(node.description())
Most Recent Common Ancestors¶
The MRCA (most recent common ancestor) of taxa or nodes can be retrieved by the instance method mrca
.
This method takes a list of Taxon
objects given by the taxa
keyword argument, or a list of taxon labels given by the taxon_labels
keyword argument, and returns a Node
object that corresponds to the MRCA of the specified taxa.
For example:
import dendropy
tree = dendropy.Tree.get(path="pythonidae.mle.nex", schema="nexus")
taxon_labels=['Python sebae',
'Python regius',
'Python curtus',
'Python molurus']
mrca = tree.mrca(taxon_labels=taxon_labels)
print(mrca.description())
Note that this method is inefficient when you need to resolve MRCA’s for multiple sets or pairs of taxa.
In this context, the PhylogeneticDistanceMatrix
offers a more efficient approach, and should be preferred for applications such as calculating the patristic distances between all pairs of taxa. An instance of this class will be returned when you call phylogenetic_distance_matrix
:
import dendropy
tree = dendropy.Tree.get(
path="pythonidae.mle.nex",
schema="nexus")
pdm = tree.phylogenetic_distance_matrix()
for idx1, taxon1 in enumerate(tree.taxon_namespace):
for taxon2 in tree.taxon_namespace:
mrca = pdm.mrca(taxon1, taxon2)
weighted_patristic_distance = pdm.patristic_distance(taxon1, taxon2)
unweighted_patristic_distance = pdm.path_edge_count(taxon1, taxon2)
print("'{}' vs '{}': {} (distance (weighted-edges, unweighted-edges) = {}, {})".format(
taxon1.label,
taxon2.label,
mrca.bipartition.split_as_bitstring(),
weighted_patristic_distance,
unweighted_patristic_distance))
Note that the PhylogeneticDistanceMatrix
object does not automatically update if the original Tree
changes: it is essentially a snapshot of Tree
at the point in which it is instantiated.
If the original Tree
changes, you should create a new instance of the corresponding PhylogeneticDistanceMatrix
object.
Viewing and Displaying Trees¶
Sometimes it is useful to get a visual representation of a Tree
.
For quick inspection, the print_plot
will write an ASCII text plot to the standard output stream:
>>> t = dendropy.Tree.get_from_string("(A,(B,(C,D)));", "newick")
>>> t.print_plot()
/----------------------------------------------- A
+
| /------------------------------ B
\----------------+
| /------------------- C
\----------+
\------------------- D
If you need to store this representation as a string instead, you can use as_ascii_plot
:
>>> s = t.as_ascii_plot()
>>> print(s)
/----------------------------------------------- A
+
| /------------------------------ B
\----------------+
| /------------------- C
\----------+
\------------------- D
You can also, as mentioned above, using the as_string
method to represent a Tree
as string in any format:
t = dendropy.Tree.get_from_string("(A,(B,(C,D)));", "newick")
print(t.as_string(schema="nexus"))
print(t.as_string(schema="newick"))
Building a Tree Programmatically¶
For example:
import dendropy
taxon_namespace = dendropy.TaxonNamespace(["A", "B", "C", "D",])
tree = dendropy.Tree(taxon_namespace=taxon_namespace)
# Create and add a new child node to the seed node,
# assigning it an edge length:
#
# (seed)
# /
# /
# ch1
#
ch1 = tree.seed_node.new_child()
ch1.edge.length = 1
# Can also assign edge length on construction:
#
# (seed)
# / \
# / \
# ch1 ch2
#
ch2 = tree.seed_node.new_child(edge_length=1)
# Can also add an existing node as child
#
# (seed)
# / \
# / \
# ch1 ch2
# / \ / \
# ch3 ch4 ch5 ch6
ch3 = dendropy.Node(edge_length=1)
ch4 = dendropy.Node(edge_length=2)
ch1.add_child(ch3)
ch1.add_child(ch4)
ch5 = dendropy.Node(edge_length=1)
ch6 = dendropy.Node(edge_length=2)
# Note: this clears/deletes existing child nodes before adding the new ones;
ch2.set_child_nodes([ch5, ch6])
# Assign taxa
ch3.taxon = taxon_namespace.get_taxon("A")
ch4.taxon = taxon_namespace.get_taxon("B")
ch5.taxon = taxon_namespace.get_taxon("C")
ch6.taxon = taxon_namespace.get_taxon("D")
print(tree.as_string("newick"))
print(tree.as_ascii_plot())
produces the following:
((A:1,B:2):1,(C:1,D:2):1);
/---------------------------------- A
/----------------------------------+
| \---------------------------------- B
+
| /---------------------------------- C
\----------------------------------+
\---------------------------------- D