Tree Manipulation and Restructuring¶
The Tree
class provides both low-level and high-level methods for manipulating tree structure.
Low-level methods are associated with Node
objects, and allow to restructure the relationships between nodes at a fine level: add_child
, new_child
, remove_child
, etc.
In most cases, however, you will be using high-level methods to restructure Tree
objects.
In all cases, if any part of the Tree
object’s structural relations change, and you are interested in calculating any metrics or statistics on the tree or comparing the tree to another tree, you need to call update_bipartitions
on the object to update the internal splits hash representation.
This is not done for you automatically because there is a computational cost associated with the operation, and the splits hashes are not always needed. Furthermore, even when needed, if there are a number of structural changes to be made to a Tree
object before calculations/comparisions, it makes sense to postpone the splits rehashing until there all the tree manipulations are completed.
Most methods that affect the tree structure that require the splits hashes to updated take a update_bipartitions
argument. By specifying True
for this, the Tree
object will recalculate the splits hashes after the changes have been made.
Rooting, Derooting and Rerooting¶
The Rooting of Tree(s) Read from External Sources¶
Newick and NEXUS formats have a convention where the rooting of the tree is specified by a special comment token preceding the tree statement: “[&R]
” to indicate a rooted tree:
[&R] ((A,B),(C,D));
and : “[&U]
” to indicate an unrooted tree:
[&U] ((A,B),(C,D));
These rooting comment tokens are respected when tree data is read. If no such comment token is given, then the tree is assumed to be unrooted by default.
You can control the behavior of trees read by using the “rooting
keyword argument when using the “get
” or “read
” methods of the Tree
or TreeList
classes. This takes one of four string values which determines how the rooting states of the tree(s) will be set:
- “default-unrooted” [default]
All trees are interpreted as unrooted unless a “[&R]” comment token explicitly specifies them as rooted.
- “default-rooted”
All trees are interpreted as rooted unless a “[&U]” comment token explicitly specifies them as unrooted.
- “force-unrooted”
All trees are unconditionally interpreted as unrooted.
- “force-rooted”
All trees are unconditionally interpreted as rooted.
The behavior of this can be be summarized by the following:
Keyword Argument |
|
|
No Rooting Given in Tree Statement |
|
unrooted |
rooted |
None |
|
unrooted |
rooted |
unrooted |
|
unrooted |
rooted |
rooted |
|
unrooted |
unrooted |
unrooted |
|
rooted |
rooted |
rooted |
As an example:
import dendropy
# force tree to be read as rooted
mle_rooted = dendropy.Tree.get(
path='pythonidae.mle.nex',
schema='nexus',
rooting='force-rooted')
# force tree to be read as unrooted
mle_rooted = dendropy.Tree.get(
path='pythonidae.mle.nex',
schema='nexus',
rooting='force-unrooted')
Setting the Rooting State¶
All Tree
objects have a boolean property, is_rooted
that DendroPy uses to track whether or not the tree should be treated as rooted. The property is_unrooted
is also defined, and these two properties are synchronized. Thus setting is_rooted
to True
will result in is_unrooted
being set to False
and vice versa.
The state of a Tree
object’s rootedness flag does not modify any internal structural relationship between nodes. It simply determines how its splits hashes are calculated, which in turn affects a broad range of comparison and metric operations. Thus you need to update the splits hashes after modifying the is_rooted
property by calling the update_bipartitions
before carrying out any calculations on or with the Tree
object. Note that calling update_bipartitions
on an unrooted tree will force the basal split to be a trifurcation. So if the original tree was bifurcating, the end result will be a tree with a trifurcation at the root. This can be prevented by passing in the keyword argument suppress_unifurcations=False
to update_bipartitions
.
For example, the following:
import dendropy
tree_str = "[&R] (A, (B, (C, (D, E))));"
tree = dendropy.Tree.get_from_string(
tree_str,
"newick")
print("Original:")
print(tree.as_ascii_plot())
tree.is_rooted = False
print("After `is_rooted=False`:")
print(tree.as_ascii_plot())
tree.update_bipartitions()
print("After `update_bipartitions()`:")
print(tree.as_ascii_plot())
tree2 = dendropy.Tree.get_from_string(
tree_str,
"newick")
tree2.is_rooted = False
tree2.update_bipartitions(suppress_unifurcations=False)
print("After `update_bipartitions(suppress_unifurcations=False)`:")
print(tree2.as_ascii_plot())
will result in:
Original:
/---------------------------------------------------- A
+
| /--------------------------------------- B
\------------+
| /-------------------------- C
\------------+
| /------------- D
\------------+
\------------- E
After `is_rooted=False`:
/---------------------------------------------------- A
+
| /--------------------------------------- B
\------------+
| /-------------------------- C
\------------+
| /------------- D
\------------+
\------------- E
After `update_bipartitions()`:
/---------------------------------------------------- A
|
+---------------------------------------------------- B
|
| /----------------------------------- C
\----------------+
| /----------------- D
\-----------------+
\----------------- E
After `update_bipartitions(suppress_unifurcations=False)`:
/---------------------------------------------------- A
+
| /--------------------------------------- B
\------------+
| /-------------------------- C
\------------+
| /------------- D
\------------+
\------------- E
Derooting¶
To deroot a rooted Tree
, you can also call the deroot
method, which collapses the root to a trifurcation if it is bifurcation and sets the is_rooted
to False
. The deroot
method has the same structural and semantic affect of is_rooted
to False
and then calling update_bipartitions
. You would use the former if you are not going to be doing any tree comparisons or calculating tree metrics, and thus do not want to calculate the splits hashes.
Rerooting¶
To reroot a Tree
along an existing edge, you can use the reroot_at_edge
method. This method takes an Edge
object as as its first argument. This rerooting is a structural change that will require the splits hashes to be updated before performing any tree comparisons or calculating tree metrics. If needed, you can do this yourself by calling update_bipartitions
later, or you can pass in True
as the second argument to the reroot_at_edge
method call, which instructs DendroPy to automatically update the splits for you.
As an example, the following reroots the tree along an internal edge (note that we do not recalculate the splits hashes, as we are not carrying out any calculations or comparisons with the Tree
):
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree_str = "[&R] (A, (B, (C, (D, E))));"
tree = dendropy.Tree.get(
data=tree_str,
schema="newick")
print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
mrca = tree.mrca(taxon_labels=["D", "E"])
tree.reroot_at_edge(mrca.edge, update_bipartitions=False)
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
and results in:
Before:
[&R] (A,(B,(C,(D,E))));
/---------------------------------------------------- A
+
| /--------------------------------------- B
\------------+
| /-------------------------- C
\------------+
| /------------- D
\------------+
\------------- E
After:
[&R] ((D,E),(C,(B,A)));
/----------------- D
/----------------------------------+
| \----------------- E
+
| /----------------------------------- C
\----------------+
| /----------------- B
\-----------------+
\----------------- A
Another example, this time rerooting along an edge subtending a tip instead of an internal edge:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree_str = "[&R] (A, (B, (C, (D, E))));"
tree = dendropy.Tree.get(
data=tree_str,
schema="newick")
print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
node_D = tree.find_node_with_taxon_label("D")
tree.reroot_at_edge(node_D.edge, update_bipartitions=False)
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
which results in:
Before:
[&R] (A,(B,(C,(D,E))));
/---------------------------------------------------- A
+
| /--------------------------------------- B
\------------+
| /-------------------------- C
\------------+
| /------------- D
\------------+
\------------- E
After:
[&R] (D,(E,(C,(B,A))));
/---------------------------------------------------- D
+
| /--------------------------------------- E
\------------+
| /-------------------------- C
\------------+
| /------------- B
\------------+
\------------- A
To reroot a Tree
at a node instead, you can use the reroot_at_node
method:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree_str = "[&R] (A, (B, (C, (D, E))));"
tree = dendropy.Tree.get(
data=tree_str,
schema="newick")
print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
mrca = tree.mrca(taxon_labels=["D", "E"])
tree.reroot_at_node(mrca, update_bipartitions=False)
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
which results in:
Before:
[&R] (A,(B,(C,(D,E))));
/---------------------------------------------------- A
+
| /--------------------------------------- B
\------------+
| /-------------------------- C
\------------+
| /------------- D
\------------+
\------------- E
After:
[&R] (D,E,(C,(B,A)));
/---------------------------------------------------- D
|
+---------------------------------------------------- E
|
| /----------------------------------- C
\----------------+
| /----------------- B
\-----------------+
\----------------- A
You can also reroot the tree such that a particular node is moved to the outgroup position using the to_outgroup_position
, which takes a Node
as the first argument. Again, you can update the splits hashes in situ by passing True
to the second argument, and again, here we do not because we are not carrying out any calculations. For example:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree_str = "[&R] (A, (B, (C, (D, E))));"
tree = dendropy.Tree.get(
data=tree_str,
schema="newick")
print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
outgroup_node = tree.find_node_with_taxon_label("C")
tree.to_outgroup_position(outgroup_node, update_bipartitions=False)
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
which will result in:
Before:
[&R] (A,(B,(C,(D,E))));
/---------------------------------------------------- A
+
| /--------------------------------------- B
\------------+
| /-------------------------- C
\------------+
| /------------- D
\------------+
\------------- E
After:
[&R] (C,(D,E),(B,A));
/---------------------------------------------------- C
|
| /-------------------------- D
+-------------------------+
| \-------------------------- E
|
| /-------------------------- B
\-------------------------+
\-------------------------- A
If you have a tree with edge lengths specified, you can reroot it at the midpoint, using the reroot_at_midpoint
method:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree_str = "[&R] (A:0.55, (B:0.82, (C:0.74, (D:0.42, E:0.64):0.24):0.15):0.20):0.3;"
tree = dendropy.Tree.get(
data=tree_str,
schema="newick")
print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot(plot_metric='length'))
tree.reroot_at_midpoint(update_bipartitions=False)
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot(plot_metric='length'))
which results in:
Before:
[&R] (A:0.55,(B:0.82,(C:0.74,(D:0.42,E:0.64):0.24):0.15):0.2):0.3;
/------------------- A
+
| /---------------------------- B
\------+
| /-------------------------- C
\----+
| /-------------- D
\--------+
\---------------------- E
After:
[&R] ((C:0.74,(D:0.42,E:0.64):0.24):0.045,(B:0.82,A:0.75):0.105):0.3;
/------------------------------- C
/-+
| | /------------------ D
| \---------+
+ \---------------------------- E
|
| /------------------------------------ B
\---+
\-------------------------------- A
Pruning Subtrees and Tips¶
To remove a set of tips from a Tree
, you can use either the prune_taxa
or the prune_taxa_with_labels
methods. The first takes a container of TaxonNamespace
objects as an argument, while the second takes container of strings. In both cases, nodes associated with the specified taxa (as given by the TaxonNamespace
objects directly in the first case, or TaxonNamespace
objects with labels given in the list of string in the second case) will be removed from the tree. For example:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree_str = "[&R] ((A, (B, (C, (D, E)))),(F, (G, H)));"
tree = dendropy.Tree.get(
data=tree_str,
schema="newick")
print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
tree.prune_taxa_with_labels(["A", "C", "G"])
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
which results in:
Before:
[&R] ((A,(B,(C,(D,E)))),(F,(G,H)));
/------------------------------------------- A
/---------+
| | /-------------------------------- B
| \----------+
| | /--------------------- C
| \----------+
+ | /---------- D
| \----------+
| \---------- E
|
| /--------------------- F
\-------------------------------+
| /---------- G
\----------+
\---------- H
After:
[&R] ((B,(D,E)),(F,H));
/----------------------------------- B
/-----------------+
| | /----------------- D
| \-----------------+
+ \----------------- E
|
| /----------------- F
\-----------------------------------+
\----------------- H
Alternatively, the tree can be pruned based on a set of taxa that you want to keep. This can be affected through the use of the counterpart “retain” methods, retain_taxa
and retain_taxa_with_labels
. For example:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree_str = "[&R] ((A, (B, (C, (D, E)))),(F, (G, H)));"
tree = dendropy.Tree.get(
data=tree_str,
schema="newick")
print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
tree.retain_taxa_with_labels(["A", "C", "G"])
print("After:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
which results in:
Before:
[&R] ((A,(B,(C,(D,E)))),(F,(G,H)));
/------------------------------------------- A
/---------+
| | /-------------------------------- B
| \----------+
| | /--------------------- C
| \----------+
+ | /---------- D
| \----------+
| \---------- E
|
| /--------------------- F
\-------------------------------+
| /---------- G
\----------+
\---------- H
After:
[&R] ((A,C),G);
/-------------------------- A
/--------------------------+
+ \-------------------------- C
|
\----------------------------------------------------- G
Again, it should be noted that, as these operations modify the structure of the tree, you need to call update_bipartitions
to update the internal splits hashes, before carrying out any calculations, comparisons, or metrics.
Extracting Trees and Subtrees from an Existing Tree¶
If you just need a tree structure, i.e. the nodes and edges, and minimal other attributes, such as taxon associations, node and edge labels, and edge lengths, the following methods provide fast and powerful ways to give you copies of the current tree or parts of the current tree:
All these methods create a “thin” or “light” copy of the tree, optionally pruning out nodes based on criteria.
The basic extract_tree
method returns a full clone of the structure of the source tree:
import dendropy
from dendropy.calculate import treecompare
tree0 = dendropy.Tree.get(
path="pythonidae.mle.nex",
schema="nexus")
for idx, nd in enumerate(tree0):
nd.label = "hello, world{}".format(idx)
nd.edge.label = "world, hello{}".format(idx)
nd.annotations["color"] = "blue"
nd.edge.annotations["taste"] = "sweet"
tree1 = tree0.extract_tree()
assert tree0.taxon_namespace is tree1.taxon_namespace
assert treecompare.weighted_robinson_foulds_distance(
tree0, tree1) == 0.0
for nd in tree1:
original_node = nd.extraction_source
print("{} on extracted tree corresponds to {} on original tree".format(
nd, original_node))
## basic attributes copied
assert nd.label == original_node.label
assert nd.edge.label == original_node.edge.label
assert nd.edge.length == original_node.edge.length
## but not annotations
assert len(nd.annotations) == 0 and len(original_node.annotations) > 0
assert len(nd.edge.annotations) == 0 and len(original_node.edge.annotations) > 0
As can be seen, the nodes on the extracted tree have an attribute set on them, “extraction_source
” that point to the corresponding node on the original tree.
Note how while labels, taxon associations and edge lengths are copied, the metadata annotations are not.
The node_filter_fn
argument provides a powerful and flexible way to selective pull out parts of the tree that might interest you:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree0 = dendropy.Tree.get(
path="pythonidae.mle.nex",
schema="nexus")
node_filter_fn = lambda nd: nd.taxon is None or nd.taxon.label.startswith("Morelia")
tree1 = tree0.extract_tree(node_filter_fn=node_filter_fn)
print(tree1.as_string("newick"))
If you do not need the annotations, then this approach can be dramatically more performant than cloning and then pruning the tree. For example, the following:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
import timeit
tree0 = dendropy.Tree.get(
path="pythonidae.mle.nex",
schema="nexus")
taxa_to_prune = set(taxon for taxon in tree0.taxon_namespace
if taxon.label.startswith("Morelia"))
def f1():
tree1 = dendropy.Tree(tree0)
tree1.prune_taxa(taxa_to_prune)
def f2():
tree1 = tree0.extract_tree(
node_filter_fn=lambda taxon: taxon not in taxa_to_prune)
t1 = timeit.Timer(f1)
print(min(t1.repeat(10,10))/10)
t2 = timeit.Timer(f2)
print(min(t2.repeat(10,10))/10)
results in:
0.00191879272461
0.000579190254211
Some convenience wrappers around the extract_tree
method allow you to more easily pull out or drop taxa of interest, either by passing in the Taxon
instances directly or their labels:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
from dendropy.calculate import treecompare
tree0 = dendropy.Tree.get(
path="pythonidae.mle.nex",
schema="nexus")
morelia_taxa = set(taxon for taxon in tree0.taxon_namespace
if taxon.label.startswith("Morelia"))
morelia_labels = set([t.label for t in morelia_taxa])
non_morelia_taxa = set(taxon for taxon in tree0.taxon_namespace
if not taxon.label.startswith("Morelia"))
non_morelia_labels = set([t.label for t in non_morelia_taxa])
tree1 = tree0.extract_tree_with_taxa(taxa=morelia_taxa)
tree2 = tree0.extract_tree_with_taxa_labels(labels=morelia_labels)
tree3 = tree0.extract_tree_without_taxa(taxa=non_morelia_taxa)
tree4 = tree0.extract_tree_without_taxa_labels(labels=non_morelia_labels)
print(tree1.as_string("newick"))
print(tree2.as_string("newick"))
print(tree3.as_string("newick"))
print(tree4.as_string("newick"))
assert treecompare.weighted_robinson_foulds_distance(tree1, tree2) == 0.0, \
treecompare.weighted_robinson_foulds_distance(tree1, tree2)
assert treecompare.weighted_robinson_foulds_distance(tree2, tree3) == 0.0, \
treecompare.weighted_robinson_foulds_distance(tree2, tree3)
assert treecompare.weighted_robinson_foulds_distance(tree3, tree4) == 0.0, \
treecompare.weighted_robinson_foulds_distance(tree3, tree4)
You can also use the extract_tree
method to created a “casted” copy, i.e. a (light) copy of the tree using your own custom classes instead of DendroPy’s native Tree
and Node
classes:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
# Our custom node class. We do not actually need to derive from dendropy.Node.
# Doing so, though, ensures that all the required methods exist instead of
# having to write them ourselves.
class PhyloNode(dendropy.Node):
pass
# Our custom tree class. Note that the definition of node factory
# (class-)method here ensures that nodes created are of the type we we want.
# Many DendroPy methods, include 'Tree.extract_tree()' check and preferentially
# use the return value of tree's class' ``node_factory()`` method when building
# trees. Note also that we do not actually need to derive from dendropy.Tree.
# Doing so, though, ensures that all the required methods exist instead of
# having to write them ourselves.
class PhyloTree(dendropy.Tree):
@classmethod
def node_factory(cls, *args, **kwargs):
return PhyloNode(*args, **kwargs)
# The original tree using dendropy.Tree for the tree and dendropy.Node for the
# nodes.
tree0 = dendropy.Tree.get(
data="(a,(b,(c,d)));",
schema="newick",
)
print(type(tree0)) # dendropy.Tree
for nd in tree0:
print(type(nd)) # dendropy.Node
# Default extraction: use dendropy.Tree for tree
# and dendropy.Node for node
tree1 = tree0.extract_tree()
print(type(tree1)) # dendropy.Tree
for nd in tree1:
print(type(nd)) # dendropy.Node
# Node factory defaults to ``node_factory`` method
# of instance returned by ``tree_factory`` if
# ``tree_factory is specified.
tree2 = tree0.extract_tree(tree_factory=PhyloTree)
print(type(tree2)) # PhyloTree
for nd in tree2:
print(type(nd)) # PhyloNode
# equivalent to above
tree3 = tree0.extract_tree(tree_factory=PhyloTree,
node_factory=PhyloNode)
print(type(tree3)) # PhyloTree
for nd in tree3:
print(type(nd)) # PhyloNode
# Use dendropy.Tree for tree but
# PhyloNode for node
tree4 = tree0.extract_tree(node_factory=PhyloNode)
print(type(tree4)) # dendropy.Tree
for nd in tree4:
print(type(nd)) # PhyloNode
Rotating¶
You can ladderize trees (sort the child nodes in order of the number of their children) by calling the ladderize
method. This method takes one argument, ascending
. If ascending=True
, which is the default, then the nodes are sorted in ascending order (i.e., nodes with fewer children sort before nodes with more children). If ascending=False
, then the nodes are sorted in descending order (i.e., nodes with more children sorting before nodes with fewer children). For example:
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import dendropy
tree_str = "[&R] ((A, (B, (C, (D, E)))),(F, (G, H)));"
tree = dendropy.Tree.get(
data=tree_str,
schema="newick")
print("Before:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
tree.ladderize(ascending=True)
print("Ladderize, ascending=True:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
tree.ladderize(ascending=False)
print("Ladderize, ascending=False:")
print(tree.as_string(schema='newick'))
print(tree.as_ascii_plot())
results in:
Before:
[&R] ((A,(B,(C,(D,E)))),(F,(G,H)));
/------------------------------------------- A
/---------+
| | /-------------------------------- B
| \----------+
| | /--------------------- C
| \----------+
+ | /---------- D
| \----------+
| \---------- E
|
| /--------------------- F
\-------------------------------+
| /---------- G
\----------+
\---------- H
Ladderize, ascending=True:
[&R] ((F,(G,H)),(A,(B,(C,(D,E)))));
/--------------------- F
/-------------------------------+
| | /---------- G
| \----------+
+ \---------- H
|
| /------------------------------------------- A
\---------+
| /-------------------------------- B
\----------+
| /--------------------- C
\----------+
| /---------- D
\----------+
\---------- E
Ladderize, ascending=False:
[&R] (((((D,E),C),B),A),((G,H),F));
/---------- D
/----------+
/----------+ \---------- E
| |
/----------+ \--------------------- C
| |
/---------+ \-------------------------------- B
| |
| \------------------------------------------- A
+
| /---------- G
| /----------+
\-------------------------------+ \---------- H
|
\--------------------- F
Tree rotation operations do not actually change the tree structure, at least in so far as splits are concerned, so it is not neccessary to update the splits hashes.