dendropy.datamodel.treecollectionmodel: Collections of Trees

The TreeList Class

class dendropy.datamodel.treecollectionmodel.TreeList(*args, **kwargs)[source]

A collection of Tree objects, all referencing the same “universe” of opeational taxonomic unit concepts through the same TaxonNamespace object reference.

Constructs a new TreeList object, populating it with any iterable container with Tree object members passed as unnamed argument, or from a data source if stream and schema are passed.

If passed an iterable container, the objects in that container must be of type Tree (or derived). If the container is of type TreeList, then, because each Tree object must have the same TaxonNamespace reference as the containing TreeList, the trees in the container passed as an initialization argument will be deep-copied (except for associated TaxonNamespace and Taxon objects, which will be shallow-copied). If the container is any other type of iterable, then the Tree objects will be shallow-copied.

TreeList objects can directly thus be instantiated in the following ways:

# /usr/bin/env python

from dendropy import TaxonNamespace, Tree, TreeList

# instantiate an empty tree
tlst1 = TreeList()

# TreeList objects can be instantiated from an external data source
# using the 'get()' factory class method

tlst2 = TreeList.get(file=open('treefile.tre', 'rU'), schema="newick")
tlst3 = TreeList.get(path='sometrees.nexus', schema="nexus")
tlst4 = TreeList.get(data="((A,B),(C,D));((A,C),(B,D));", schema="newick")

# can also call `read()` on a TreeList object; each read adds
# (appends) the tree(s) found to the TreeList
tlst5 = TreeList()
tlst5.read(file=open('boot1.tre', 'rU'), schema="newick")
tlst5.read(path="boot3.tre", schema="newick")
tlst5.read(value="((A,B),(C,D));((A,C),(B,D));", schema="newick")

# populated from list of Tree objects
tlist6_1 = Tree.get(
        data="((A,B),(C,D))",
        schema="newick")
tlist6_2 = Tree.get(
        data="((A,C),(B,D))",
        schema="newick")
tlist6 = TreeList([tlist5_1, tlist5_2])

# passing keywords to underlying tree parser
tlst8 = TreeList.get(
                 data="((A,B),(C,D));((A,C),(B,D));",
                 schema="newick",
                 taxon_namespace=tlst3.taxon_namespace,
                 rooting="force-rooted",
                 extract_comment_metadata=True,
                 store_tree_weights=False,
                 preserve_underscores=True)

# Subsets of trees can be read. Note that in most cases, the entire
# data source is parsed, so this is not more efficient than reading
# all the trees and then manually-extracting them later; just more
# convenient

# skip the *first* 100 trees in the *first* (offset=0) collection of trees
trees = TreeList.get(
            path="mcmc.tre",
            schema="newick",
            collection_offset=0,
            tree_offset=100)

# get the *last* 10 trees in the *second* (offset=1) collection of trees
trees = TreeList.get(
            path="mcmc.tre",
            schema="newick",
            collection_offset=1,
            tree_offset=-10)

# get the last 10 trees in the second-to-last collection of trees
trees = TreeList.get(
            path="mcmc.tre",
            schema="newick",
            collection_offset=-2,
            tree_offset=100)

# Slices give shallow-copy: trees are references
tlst4copy0a = t4[:]
assert tlst4copy0a[0] is t4[0]
tlst4copy0b = t4[:4]
assert tlst4copy0b[0] is t4[0]

# 'Taxon-namespace-scoped' copy:
# I.e., Deep-copied objects but taxa and taxon namespace
# are copied as references
tlst4copy1a = TreeList(t4)
tlst4copy1b = TreeList([Tree(t) for t in tlst5])
assert tlst4copy1a[0] is not tlst4[0] # True
assert tlst4copy1a.taxon_namespace is tlst4.taxon_namespace # True
assert tlst4copy1b[0] is not tlst4[0] # True
assert tlst4copy1b.taxon_namespace is tlst4.taxon_namespace # True
DEFAULT_TREE_TYPE

alias of Tree

__add__(other)[source]

Creates and returns new TreeList with clones of all trees in self as well as all Tree objects in other. If other is a TreeList, then the trees are cloned and migrated into self.taxon_namespace; otherwise, the original objects are migrated into self.taxon_namespace and added directly.

Parameters:

other (iterable of Tree objects) –

Returns:

tlist (|TreeList| object) – TreeList object containing clones of Tree objects in self and other.

__getitem__(index)[source]

If index is an integer, then Tree object at position index is returned. If index is a slice, then a TreeList is returned with references (i.e., not copies or clones, but the actual original instances themselves) to Tree objects in the positions given by the slice. The TaxonNamespace is the same as self.

Parameters:

index (integer or slice) – Index or slice.

Returns:

t (|Tree| object or |TreeList| object)

__iadd__(other)[source]

In-place addition of Tree objects in other to self.

If other is a TreeList, then the trees are copied and migrated into self.taxon_namespace; otherwise, the original objects are migrated into self.taxon_namespace and added directly.

Parameters:

other (iterable of Tree objects) –

Returns:

``self`` (|TreeList|)

append(tree, taxon_import_strategy='migrate', **kwargs)[source]

Adds a Tree object, tree, to the collection.

The TaxonNamespace reference of tree will be set to that of self. Any Taxon objects associated with nodes in tree that are not already in self.taxon_namespace will be handled according to taxon_import_strategy:

  • ‘migrate’

    Taxon objects associated with tree that are not already in self.taxon_nameaspace will be remapped based on their labels, with new :class|Taxon| objects being reconstructed if none with matching labels are found. Specifically, dendropy.datamodel.treemodel.Tree.migrate_taxon_namespace will be called on tree, where kwargs is as passed to this function.

  • ‘add’

    Taxon objects associated with tree that are not already in self.taxon_namespace will be added. Note that this might result in Taxon objects with duplicate labels as no attempt at mapping to existing Taxon objects based on label-matching is done.

Parameters:
  • tree (A Tree instance) – The Tree object to be added.

  • taxon_import_strategy (string) – If tree is associated with a different TaxonNamespace, this argument determines how new Taxon objects in tree are handled: ‘migrate’ or ‘add’. See above for details.

  • **kwargs (keyword arguments) – These arguments will be passed directly to ‘migrate_taxon_namespace()’ method call on tree.

See also

Tree.migrate_taxon_namespace

as_string(schema, **kwargs)

Composes and returns string representation of the data.

Mandatory Schema-Specification Keyword Argument:

Optional Schema-Specific Keyword Arguments:

These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.

as_tree_array(**kwargs)[source]

Return TreeArray collecting information on splits in contained trees. Keyword arguments get passed directly to TreeArray constructor.

clone(depth=1)

Creates and returns a copy of self.

Parameters:

depth (integer) –

The depth of the copy:

  • 0: shallow-copy: All member objects are references, except for :attr:annotation_set of top-level object and member Annotation objects: these are full, independent instances (though any complex objects in the value field of Annotation objects are also just references).

  • 1: taxon-namespace-scoped copy: All member objects are full independent instances, except for TaxonNamespace and Taxon instances: these are references.

  • 2: Exhaustive deep-copy: all objects are cloned.

consensus(min_freq=0.5, is_bipartitions_updated=False, summarize_splits=True, **kwargs)[source]

Returns a consensus tree of all trees in self, with minumum frequency of bipartition to be added to the consensus tree given by min_freq.

copy_annotations_from(other, attribute_object_mapper=None)

Copies annotations from other, which must be of Annotable type.

Copies are deep-copies, in that the Annotation objects added to the annotation_set AnnotationSet collection of self are independent copies of those in the annotate_set collection of other. However, dynamic bound-attribute annotations retain references to the original objects as given in other, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found in attribute_object_mapper. In dynamic bound-attribute annotations, the _value attribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs the Annotation object to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which the AnnotationSet belongs (i.e., self). When a copy of Annotation is created, the object reference given in the first element of the _value tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo

Parameters:
  • other (Annotable) – Source of annotations to copy.

  • attribute_object_mapper (dict) – Like the memo of __deepcopy__, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attribute Annotation gives object x as the parent or owner of the attribute (that is, the first element of the Annotation._value tuple is other) and id(x) is found in attribute_object_mapper, then in the copy the owner of the attribute is changed to attribute_object_mapper[id(x)]. If attribute_object_mapper is None (default), then the following mapping is automatically inserted: id(other): self. That is, any references to other in any Annotation object will be remapped to self. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.

deep_copy_annotations_from(other, memo=None)

Note that all references to other in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references to self. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).

extend(other)[source]

In-place addition of Tree objects in other to self.

If other is a TreeList, then the trees are copied and migrated into self.taxon_namespace; otherwise, the original objects are migrated into self.taxon_namespace and added directly.

Parameters:

other (iterable of Tree objects) –

Returns:

``self`` (|TreeList|)

frequency_of_bipartition(**kwargs)[source]

Given a bipartition specified as:

  • a Bipartition instance given the keyword ‘bipartition’

  • a split bitmask given the keyword ‘split_bitmask’

  • a list of Taxon objects given with the keyword taxa

  • a list of taxon labels given with the keyword labels

this function returns the proportion of trees in self in which the split is found.

If the tree(s) in the collection are unrooted, then the bipartition will be normalized for the comparison.

frequency_of_split(**kwargs)[source]

DEPRECATED: use ‘frequency_of_bipartition()’ instead.

classmethod get(**kwargs)[source]

Instantiate and return a new TreeList object from a data source.

Mandatory Source-Specification Keyword Argument (Exactly One Required):

  • file (file) – File or file-like object of data opened for reading.

  • path (str) – Path to file of data.

  • url (str) – URL of data.

  • data (str) – Data given directly.

Mandatory Schema-Specification Keyword Argument:

Optional General Keyword Arguments:

  • label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or None otherwise.

  • taxon_namespace (TaxonNamespace) – The TaxonNamespace instance to use to manage the taxon names. If not specified, a new one will be created.

  • collection_offset (int) – 0-based index of tree block or collection in source to be parsed. If not specified then the first collection (offset = 0) is assumed.

  • tree_offset (int) – 0-based index of first tree within the collection specified by collection_offset to be parsed (i.e., skipping the first tree_offset trees). If not specified, then the first tree (offset = 0) is assumed (i.e., no trees within the specified collection will be skipped). Use this to specify, e.g. a burn-in.

  • ignore_unrecognized_keyword_arguments (bool) – If True, then unsupported or unrecognized keyword arguments will not result in an error. Default is False: unsupported keyword arguments will result in an error.

Optional Schema-Specific Keyword Arguments:

These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.

Examples:

tlst1 = dendropy.TreeList.get(
        file=open('treefile.tre', 'rU'),
        schema="newick")
tlst2 = dendropy.TreeList.get(
        path='sometrees.nexus',
        schema="nexus",
        collection_offset=2,
        tree_offset=100)
tlst3 = dendropy.TreeList.get(
        data="((A,B),(C,D));((A,C),(B,D));",
        schema="newick")
tree4 = dendropy.dendropy.TreeList.get(
        url="http://api.opentreeoflife.org/v2/study/pg_1144/tree/tree2324.nex",
        schema="nexus")
classmethod get_from_path(src, schema, **kwargs)

Factory method to return new object of this class from file specified by string src.

Parameters:
  • src (string) – Full file path to source of data.

  • schema (string) – Specification of data format (e.g., “nexus”).

  • kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.

Returns:

pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.

classmethod get_from_stream(src, schema, **kwargs)

Factory method to return new object of this class from file-like object src.

Parameters:
  • src (file or file-like) – Source of data.

  • schema (string) – Specification of data format (e.g., “nexus”).

  • kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.

Returns:

pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.

classmethod get_from_string(src, schema, **kwargs)

Factory method to return new object of this class from string src.

Parameters:
  • src (string) – Data as a string.

  • schema (string) – Specification of data format (e.g., “nexus”).

  • kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.

Returns:

pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.

classmethod get_from_url(src, schema, strip_markup=False, **kwargs)

Factory method to return a new object of this class from URL given by src.

Parameters:
  • src (string) – URL of location providing source of data.

  • schema (string) – Specification of data format (e.g., “nexus”).

  • kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.

Returns:

pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.

insert(index, tree, taxon_import_strategy='migrate', **kwargs)[source]

Inserts a Tree object, tree, into the collection before index.

The TaxonNamespace reference of tree will be set to that of self. Any Taxon objects associated with nodes in tree that are not already in self.taxon_namespace will be handled according to taxon_import_strategy:

  • ‘migrate’

    Taxon objects associated with tree that are not already in self.taxon_nameaspace will be remapped based on their labels, with new :class|Taxon| objects being reconstructed if none with matching labels are found. Specifically, dendropy.datamodel.treemodel.Tree.migrate_taxon_namespace will be called on tree, where kwargs is as passed to this function.

  • ‘add’

    Taxon objects associated with tree that are not already in self.taxon_namespace will be added. Note that this might result in Taxon objects with duplicate labels as no attempt at mapping to existing Taxon objects based on label-matching is done.

Parameters:
  • index (integer) – Position before which to insert tree.

  • tree (A Tree instance) – The Tree object to be added.

  • taxon_import_strategy (string) – If tree is associated with a different TaxonNamespace, this argument determines how new Taxon objects in tree are handled: ‘migrate’ or ‘add’. See above for details.

  • **kwargs (keyword arguments) – These arguments will be passed directly to ‘migrate_taxon_namespace()’ method call on tree.

See also

Tree.migrate_taxon_namespace

maximum_product_of_split_support_tree(include_external_splits=False, score_attr='log_product_of_split_support')[source]

Return the tree with that maximizes the product of split supports, also known as the “Maximum Clade Credibility Tree” or MCCT.

Parameters:

include_external_splits (bool) – If True, then non-internal split posteriors will be included in the score. Defaults to False: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.

Returns:

mcct_tree (Tree) – Tree that maximizes the product of split supports.

maximum_sum_of_split_support_tree(include_external_splits=False, score_attr='sum_of_split_support')[source]

Return the tree with that maximizes the sum of split supports.

Parameters:

include_external_splits (bool) – If True, then non-internal split posteriors will be included in the score. Defaults to False: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.

Returns:

mcct_tree (Tree) – Tree that maximizes the sum of split supports.

migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)

Move this object and all members to a new operational taxonomic unit concept namespace scope.

Current self.taxon_namespace value will be replaced with value given in taxon_namespace if this is not None, or a new TaxonNamespace object. Following this, reconstruct_taxon_namespace() will be called: each distinct Taxon object associated with self or members of self that is not alread in taxon_namespace will be replaced with a new Taxon object that will be created with the same label and added to self.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.

Label mapping case sensitivity follows the self.taxon_namespace.is_case_sensitive setting. If False and unify_taxa_by_label is also True, then the establishment of correspondence between Taxon objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are four Taxon objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single new Taxon object in the new namespace (with a label some existing casing variant of ‘foo’). If True: if unify_taxa_by_label is True, Taxon objects with labels identical except in case will be considered distinct.

Parameters:
  • taxon_namespace (TaxonNamespace) – The TaxonNamespace into the scope of which this object will be moved.

  • unify_taxa_by_label (boolean, optional) – If True, then references to distinct Taxon objects with identical labels in the current namespace will be replaced with a reference to a single Taxon object in the new namespace. If False: references to distinct Taxon objects will remain distinct, even if the labels are the same.

  • taxon_mapping_memo (dictionary) – Similar to memo of deepcopy, this is a dictionary that maps Taxon objects in the old namespace to corresponding Taxon objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if a Taxon object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.

Examples

Use this method to move an object from one taxon namespace to another.

For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:

# Get handle to the new TaxonNamespace
other_taxon_namespace = some_other_data.taxon_namespace

# Get a taxon-namespace scoped copy of a tree
# in another namespace
t2 = Tree(t1)

# Replace taxon namespace of copy
t2.migrate_taxon_namespace(other_taxon_namespace)

You can also use this method to get a copy of a structure and then move it to a new namespace:

t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())

# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)

poll_taxa(taxa=None)[source]

Returns a set populated with all of Taxon instances associated with self.

Parameters:

taxa (set()) – Set to populate. If not specified, a new one will be created.

Returns:

taxa (set[|Taxon|]) – Set of taxa associated with self.

purge_taxon_namespace()

Remove all Taxon instances in self.taxon_namespace that are not associated with self or any item in self.

read(**kwargs)[source]

Add Tree objects to existing TreeList from data source providing one or more collections of trees.

Mandatory Source-Specification Keyword Argument (Exactly One Required):

  • file (file) – File or file-like object of data opened for reading.

  • path (str) – Path to file of data.

  • url (str) – URL of data.

  • data (str) – Data given directly.

Mandatory Schema-Specification Keyword Argument:

Optional General Keyword Arguments:

  • collection_offset (int) – 0-based index of tree block or collection in source to be parsed. If not specified then the first collection (offset = 0) is assumed.

  • tree_offset (int) – 0-based index of first tree within the collection specified by collection_offset to be parsed (i.e., skipping the first tree_offset trees). If not specified, then the first tree (offset = 0) is assumed (i.e., no trees within the specified collection will be skipped). Use this to specify, e.g. a burn-in.

  • ignore_unrecognized_keyword_arguments (bool) – If True, then unsupported or unrecognized keyword arguments will not result in an error. Default is False: unsupported keyword arguments will result in an error.

Optional Schema-Specific Keyword Arguments:

These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.

Examples:

tlist = dendropy.TreeList()
tlist.read(
        file=open('treefile.tre', 'rU'),
        schema="newick",
        tree_offset=100)
tlist.read(
        path='sometrees.nexus',
        schema="nexus",
        collection_offset=2,
        tree_offset=100)
tlist.read(
        data="((A,B),(C,D));((A,C),(B,D));",
        schema="newick")
tlist.read(
        url="http://api.opentreeoflife.org/v2/study/pg_1144/tree/tree2324.nex",
        schema="nexus")
read_from_path(src, schema, **kwargs)

Reads data from file specified by filepath.

Parameters:
  • filepath (file or file-like) – Full file path to source of data.

  • schema (string) – Specification of data format (e.g., “nexus”).

  • kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.

Returns:

n (tuple [integer]) – A value indicating size of data read, where “size” depends on the object:

read_from_stream(src, schema, **kwargs)

Reads from file (exactly equivalent to just read(), provided here as a separate method for completeness.

Parameters:
  • fileobj (file or file-like) – Source of data.

  • schema (string) – Specification of data format (e.g., “nexus”).

  • kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.

Returns:

n (tuple [integer]) – A value indicating size of data read, where “size” depends on the object:

read_from_string(src, schema, **kwargs)

Reads a string.

Parameters:
  • src_str (string) – Data as a string.

  • schema (string) – Specification of data format (e.g., “nexus”).

  • kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.

Returns:

n (tuple [integer]) – A value indicating size of data read, where “size” depends on the object:

read_from_url(src, schema, **kwargs)

Reads a URL source.

Parameters:
  • src (string) – URL of location providing source of data.

  • schema (string) – Specification of data format (e.g., “nexus”).

  • kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.

Returns:

n (tuple [integer]) – A value indicating size of data read, where “size” depends on the object:

reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)[source]

Repopulates the current taxon namespace with new taxon objects, preserving labels. Each distinct Taxon object associated with self or members of self that is not already in self.taxon_namespace will be replaced with a new Taxon object that will be created with the same label and added to self.taxon_namespace.

Label mapping case sensitivity follows the self.taxon_namespace.is_case_sensitive setting. If False and unify_taxa_by_label is also True, then the establishment of correspondence between Taxon objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are four Taxon objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single new Taxon object in the new namespace (with a label some existing casing variant of ‘foo’). If True: if unify_taxa_by_label is True, Taxon objects with labels identical except in case will be considered distinct.

Note

Existing Taxon objects in self.taxon_namespace are not removed. This method should thus only be called only when self.taxon_namespace has been changed. In fact, typical usage would not involve calling this method directly, but rather through

Parameters:
  • unify_taxa_by_label (boolean, optional) – If True, then references to distinct Taxon objects with identical labels in the current namespace will be replaced with a reference to a single Taxon object in the new namespace. If False: references to distinct Taxon objects will remain distinct, even if the labels are the same.

  • taxon_mapping_memo (dictionary) – Similar to memo of deepcopy, this is a dictionary that maps Taxon objects in the old namespace to corresponding Taxon objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace.

reindex_subcomponent_taxa()[source]

DEPRECATED: Use reconstruct_taxon_namespace instead. Derived classes should override this to ensure that their various components, attributes and members all refer to the same TaxonNamespace object as self.taxon_namespace, and that self.taxon_namespace has all the Taxon objects in the various members.

reindex_taxa(taxon_namespace=None, clear=False)

DEPRECATED: Use migrate_taxon_namespace() instead. Rebuilds taxon_namespace from scratch, or assigns Taxon objects from given TaxonNamespace object taxon_namespace based on label values.

split_distribution(is_bipartitions_updated=False, default_edge_length_value=None, **kwargs)[source]

Return SplitDistribution collecting information on splits in contained trees. Keyword arguments get passed directly to SplitDistribution constructor.

taxon_namespace_scoped_copy(memo=None)[source]

Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for TaxonNamespace and Taxon objects: these are preserved as references.

classmethod tree_factory(*args, **kwargs)[source]

Creates and returns a Tree of a type that this list understands how to manage.

Deriving classes can override this to provide for custom Tree-type object lists. You can simple override the class-level variable DEFAULT_TREE_TYPE in your derived class if the constructor signature of the alternate tree type is the same as Tree. If you want to have a TreeList instance that generates custom trees (i.e., as opposed to a TreeList-ish class of instances), set the tree_type attribute of the TreeList instance.

Parameters:
  • *args (positional arguments) – Passed directly to constructor of Tree.

  • **kwargs (keyword arguments) – Passed directly to constructor of Tree.

Returns:

A |Tree| object.

update_taxon_namespace()[source]

All Taxon objects associated with self or members of self that are not in self.taxon_namespace will be added. Note that, unlike reconstruct_taxon_namespace, no new Taxon objects will be created.

write(**kwargs)

Writes out self in schema format.

Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):

  • file (file) – File or file-like object opened for writing.

  • path (str) – Path to file to which to write.

Mandatory Schema-Specification Keyword Argument:

Optional Schema-Specific Keyword Arguments:

These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.

Examples

# Using a file path:
d.write(path="path/to/file.dat", schema="nexus")

# Using an open file:
with open("path/to/file.dat", "w") as f:
    d.write(file=f, schema="nexus")
write_to_path(dest, schema, **kwargs)

Writes to file specified by dest.

write_to_stream(dest, schema, **kwargs)

Writes to file-like object dest.

TreeList.read(\*\*kwargs)[source]
TreeList.put(\*\*kwargs)

Write out collection of trees to file.

Mandatory Destimation-Specification Keyword Arguments (one and exactly one of the following required):
  • file (file) – File or file-like object opened for writing.

  • path (str) – Path to file to which to write.

Mandatory Schema-Specification Keyword Argument:
  • schema (str) – Identifier of format of data given by the “file”, “path”, “data”, or “url” argument specified above: “newick”, “nexus”, or “nexml”.

Optional Schema-Specific Keyword Arguments:

The TreeArray Class

class dendropy.datamodel.treecollectionmodel.TreeArray(taxon_namespace=None, is_rooted_trees=None, ignore_edge_lengths=False, ignore_node_ages=True, use_tree_weights=True, ultrametricity_precision=1e-05, is_force_max_age=None, taxon_label_age_map=None)[source]

High-performance collection of tree structures.

Storage of minimal tree structural information as represented by toplogy and edge lengths, minimizing memory and processing time. This class stores trees as collections of splits and edge lengths. All other information, such as labels, metadata annotations, etc. will be discarded. A full Tree instance can be reconstructed as needed from the structural information stored by this class, at the cost of computation time.

Parameters:
  • taxon_namespace (TaxonNamespace) – The operational taxonomic unit concept namespace to manage taxon references.

  • is_rooted_trees (bool) – If not set, then it will be set based on the rooting state of the first tree added. If True, then trying to add an unrooted tree will result in an error. If False, then trying to add a rooted tree will result in an error.

  • ignore_edge_lengths (bool) – If True, then edge lengths of splits will not be stored. If False, then edge lengths will be stored.

  • ignore_node_ages (bool) – If True, then node ages of splits will not be stored. If False, then node ages will be stored.

  • use_tree_weights (bool) – If False, then tree weights will not be used to weight splits.

exception IncompatibleEdgeLengthsTreeArrayUpdate[source]
exception IncompatibleNodeAgesTreeArrayUpdate[source]
exception IncompatibleRootingTreeArrayUpdate[source]
exception IncompatibleTreeArrayUpdate[source]
exception IncompatibleTreeWeightsTreeArrayUpdate[source]
__add__(other)[source]

Creates and returns new TreeArray.

Parameters:

other (iterable of Tree objects) –

Returns:

tlist (|TreeArray| object) – TreeArray object containing clones of Tree objects in self and other.

__iadd__(tree_array)[source]

Accession of data from tree_array to self.

Parameters:

tree_array (TreeArray) – A TreeArray instance from which to add data.

__iter__()[source]

Yields pairs of (split, edge_length) from the store.

add_tree(tree, is_bipartitions_updated=False, index=None)[source]

Adds the structure represented by a Tree instance to the collection.

Parameters:
  • tree (Tree) – A Tree instance. This must have the same rooting state as all the other trees accessioned into this collection as well as that of self.is_rooted_trees.

  • is_bipartitions_updated (bool) – If False [default], then the tree will have its splits encoded or updated. Otherwise, if True, then the tree is assumed to have its splits already encoded and updated.

  • index (integer) – Insert before index.

Returns:

  • index (int) – The index of the accession.

  • s (iterable of splits) – A list of split bitmasks from tree.

  • e – A list of edge length values from tree.

add_trees(trees, is_bipartitions_updated=False)[source]

Adds multiple structures represneted by an iterator over or iterable of Tree instances to the collection.

Parameters:
  • trees (iterator over or iterable of Tree instances) – An iterator over or iterable of Tree instances. Thess must have the same rooting state as all the other trees accessioned into this collection as well as that of self.is_rooted_trees.

  • is_bipartitions_updated (bool) – If False [default], then the tree will have its splits encoded or updated. Otherwise, if True, then the tree is assumed to have its splits already encoded and updated.

append(tree, is_bipartitions_updated=False)[source]

Adds a Tree instance to the collection before position given by index.

Parameters:
  • tree (Tree) – A Tree instance. This must have the same rooting state as all the other trees accessioned into this collection as well as that of self.is_rooted_trees.

  • is_bipartitions_updated (bool) – If False [default], then the tree will have its splits encoded or updated. Otherwise, if True, then the tree is assumed to have its splits already encoded and updated.

bipartition_encoding_frequencies()[source]

Returns a dictionary with keys being bipartition encodings of trees (as frozenset collections of Bipartition objects) and values the frequency of occurrence of trees represented by that encoding in the collection.

calculate_log_product_of_split_supports(include_external_splits=False)[source]

Calculates the log product of split support for each of the trees in the collection.

Parameters:

include_external_splits (bool) – If True, then non-internal split posteriors will be included in the score. Defaults to False: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.

Returns:

s (tuple(list[numeric], integer)) – Returns a tuple, with the first element being the list of scores and the second being the index of the highest score. The element order corresponds to the trees accessioned in the collection.

calculate_sum_of_split_supports(include_external_splits=False)[source]

Calculates the sum of split support for all trees in the collection.

Parameters:

include_external_splits (bool) – If True, then non-internal split posteriors will be included in the score. Defaults to False: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.

Returns:

s (tuple(list[numeric], integer)) – Returns a tuple, with the first element being the list of scores and the second being the index of the highest score. The element order corresponds to the trees accessioned in the collection.

consensus_tree(min_freq=0.5, summarize_splits=True, **split_summarization_kwargs)[source]

Returns a consensus tree from splits in self.

Parameters:
  • min_freq (real) – The minimum frequency of a split in this distribution for it to be added to the tree.

  • is_rooted (bool) – Should tree be rooted or not? If all trees counted for splits are explicitly rooted or unrooted, then this will default to True or False, respectively. Otherwise it defaults to None.

  • **split_summarization_kwargs (keyword arguments) – These will be passed directly to the underlying SplitDistributionSummarizer object. See SplitDistributionSummarizer.configure for options.

Returns:

t (consensus tree)

extend(tree_array)[source]

Accession of data from tree_array to self.

Parameters:

tree_array (TreeArray) – A TreeArray instance from which to add data.

get_split_bitmask_and_edge_tuple(index)[source]

Returns a pair of tuples, ( (splits…), (lengths…) ), corresponding to the “tree” at index.

insert(index, tree, is_bipartitions_updated=False)[source]

Adds a Tree instance to the collection before position given by index.

Parameters:
  • index (integer) – Insert before index.

  • tree (Tree) – A Tree instance. This must have the same rooting state as all the other trees accessioned into this collection as well as that of self.is_rooted_trees.

  • is_bipartitions_updated (bool) – If False [default], then the tree will have its splits encoded or updated. Otherwise, if True, then the tree is assumed to have its splits already encoded and updated.

Returns:

  • index (int) – The index of the accession.

  • s (iterable of splits) – A list of split bitmasks from tree.

  • e – A list of edge length values tree.

maximum_product_of_split_support_tree(include_external_splits=False, summarize_splits=True, **split_summarization_kwargs)[source]

Return the tree with that maximizes the product of split supports, also known as the “Maximum Clade Credibility Tree” or MCCT.

Parameters:

include_external_splits (bool) – If True, then non-internal split posteriors will be included in the score. Defaults to False: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.

Returns:

mcct_tree (Tree) – Tree that maximizes the product of split supports.

maximum_sum_of_split_support_tree(include_external_splits=False, summarize_splits=True, **split_summarization_kwargs)[source]

Return the tree with that maximizes the sum of split supports.

Parameters:

include_external_splits (bool) – If True, then non-internal split posteriors will be included in the score. Defaults to False: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.

Returns:

mst_tree (Tree) – Tree that maximizes the sum of split supports.

read(**kwargs)[source]

Add Tree objects to existing TreeList from data source providing one or more collections of trees.

Mandatory Source-Specification Keyword Argument (Exactly One Required):

  • file (file) – File or file-like object of data opened for reading.

  • path (str) – Path to file of data.

  • url (str) – URL of data.

  • data (str) – Data given directly.

Mandatory Schema-Specification Keyword Argument:

Optional General Keyword Arguments:

  • collection_offset (int) – 0-based index of tree block or collection in source to be parsed. If not specified then the first collection (offset = 0) is assumed.

  • tree_offset (int) – 0-based index of first tree within the collection specified by collection_offset to be parsed (i.e., skipping the first tree_offset trees). If not specified, then the first tree (offset = 0) is assumed (i.e., no trees within the specified collection will be skipped). Use this to specify, e.g. a burn-in.

  • ignore_unrecognized_keyword_arguments (bool) – If True, then unsupported or unrecognized keyword arguments will not result in an error. Default is False: unsupported keyword arguments will result in an error.

Optional Schema-Specific Keyword Arguments:

These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.

Examples:

tree_array = dendropy.TreeArray()
tree_array.read(
        file=open('treefile.tre', 'rU'),
        schema="newick",
        tree_offset=100)
tree_array.read(
        path='sometrees.nexus',
        schema="nexus",
        collection_offset=2,
        tree_offset=100)
tree_array.read(
        data="((A,B),(C,D));((A,C),(B,D));",
        schema="newick")
tree_array.read(
        url="http://api.opentreeoflife.org/v2/study/pg_1144/tree/tree2324.nex",
        schema="nexus")
read_from_files(files, schema, **kwargs)[source]

Adds multiple structures from one or more external file sources to the collection.

Parameters:
  • files (iterable of strings and/or file objects) – A list or some other iterable of file paths or file-like objects (string elements will be assumed to be paths to files, while all other types of elements will be assumed to be file-like objects opened for reading).

  • schema (string) – The data format of the source. E.g., “nexus”, “newick”, “nexml”.

  • **kwargs (keyword arguments) – These will be passed directly to the underlying schema-specific reader implementation.

split_bitmask_set_frequencies()[source]

Returns a dictionary with keys being sets of split bitmasks and values being the frequency of occurrence of trees represented by those split bitmask sets in the collection.

topologies(sort_descending=None, frequency_attr_name='frequency', frequency_annotation_name='frequency')[source]

Returns a TreeList instance containing the reconstructed tree topologies (i.e. Tree instances with no edge weights) in the collection, with the frequency added as an attributed.

Parameters:
  • sort_descending (bool) – If True, then topologies will be sorted in descending frequency order (i.e., topologies with the highest frequencies will be listed first). If False, then they will be sorted in ascending frequency. If None (default), then they will not be sorted.

  • frequency_attr_name (str) – Name of attribute to add to each Tree representing the frequency of that topology in the collection. If None then the attribute will not be added.

  • frequency_annotation_name (str) – Name of annotation to add to the annotations of each Tree, representing the frequency of that topology in the collection. The value of this annotation will be dynamically-bound to the attribute specified by frequency_attr_name unless that is None. If frequency_annotation_name is None then the annotation will not be added.

The SplitDistribution Class

class dendropy.datamodel.treecollectionmodel.SplitDistribution(taxon_namespace=None, ignore_edge_lengths=False, ignore_node_ages=True, use_tree_weights=True, ultrametricity_precision=1e-05, is_force_max_age=False, taxon_label_age_map=None)[source]

Collects information regarding splits over multiple trees.

__getitem__(split_bitmask)[source]

Returns freqency of split_bitmask.

calc_freqs()[source]

Forces recalculation of frequencies.

collapse_edges_with_less_than_minimum_support(tree, min_freq=0.5)[source]

Collapse edges on tree that have support less than indicated by min_freq.

consensus_tree(min_freq=0.5, is_rooted=None, summarize_splits=True, **split_summarization_kwargs)[source]

Returns a consensus tree from splits in self.

Parameters:
  • min_freq (real) – The minimum frequency of a split in this distribution for it to be added to the tree.

  • is_rooted (bool) – Should tree be rooted or not? If all trees counted for splits are explicitly rooted or unrooted, then this will default to True or False, respectively. Otherwise it defaults to None.

  • **split_summarization_kwargs (keyword arguments) – These will be passed directly to the underlying SplitDistributionSummarizer object. See SplitDistributionSummarizer.configure for options.

Returns:

t (consensus tree)

count_splits_on_tree(tree, is_bipartitions_updated=False, default_edge_length_value=None)[source]

Counts splits in this tree and add to totals. tree must be decorated with splits, and no attempt is made to normalize taxa.

Parameters:
  • tree (a Tree object.) – The tree on which to count the splits.

  • is_bipartitions_updated (bool) – If False [default], then the tree will have its splits encoded or updated. Otherwise, if True, then the tree is assumed to have its splits already encoded and updated.

Returns:

  • s (iterable of splits) – A list of split bitmasks from tree.

  • e – A list of edge length values from tree.

  • a – A list of node age values from tree.

log_product_of_split_support_on_tree(tree, is_bipartitions_updated=False, include_external_splits=False)[source]

Calculates the (log) product of the support of the splits of the tree, where the support is given by the proportional frequency of the split in the current split distribution.

The tree that has the highest product of split support out of a sample of trees corresponds to the “maximum credibility tree” for that sample. This can also be referred to as the “maximum clade credibility tree”, though this latter term is sometimes use for the tree that has the highest sum of split support (see SplitDistribution.sum_of_split_support_on_tree).

Parameters:
  • tree (Tree) – The tree for which the score should be calculated.

  • is_bipartitions_updated (bool) – If True, then the splits are assumed to have already been encoded and will not be updated on the trees.

  • include_external_splits (bool) – If True, then non-internal split posteriors will be included in the score. Defaults to False: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.

Returns:

s (numeric) – The log product of the support of the splits of the tree.

normalize_bitmask(bitmask)[source]

“Normalizes” split, by ensuring that the least-significant bit is always 1 (used on unrooted trees to establish split identity independent of rotation).

Parameters:

bitmask (integer) – Split bitmask hash to be normalized.

Returns:

h (integer) – Normalized split bitmask.

split_support_iter(tree, is_bipartitions_updated=False, include_external_splits=False, traversal_strategy='preorder', node_support_attr_name=None, edge_support_attr_name=None)[source]

Returns iterator over support values for the splits of a given tree, where the support value is given by the proportional frequency of the split in the current split distribution.

Parameters:
  • tree (Tree) – The Tree which will be scored.

  • is_bipartitions_updated (bool) – If False [default], then the tree will have its splits encoded or updated. Otherwise, if True, then the tree is assumed to have its splits already encoded and updated.

  • include_external_splits (bool) – If True, then non-internal split posteriors will be included. If False, then these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.

  • traversal_strategy (str) – One of: “preorder” or “postorder”. Specfies order in which splits are visited.

Returns:

s (list of floats) – List of values for splits in the tree corresponding to the proportional frequency that the split is found in the current distribution.

splits_considered()[source]
Returns 4 values:

total number of splits counted total weighted number of unique splits counted total number of non-trivial splits counted total weighted number of unique non-trivial splits counted

sum_of_split_support_on_tree(tree, is_bipartitions_updated=False, include_external_splits=False)[source]

Calculates the sum of the support of the splits of the tree, where the support is given by the proportional frequency of the split in the current distribtion.

Parameters:
  • tree (Tree) – The tree for which the score should be calculated.

  • is_bipartitions_updated (bool) – If True, then the splits are assumed to have already been encoded and will not be updated on the trees.

  • include_external_splits (bool) – If True, then non-internal split posteriors will be included in the score. Defaults to False: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.

Returns:

s (numeric) – The sum of the support of the splits of the tree.

summarize_splits_on_tree(tree, is_bipartitions_updated=False, **split_summarization_kwargs)[source]

Summarizes support of splits/edges/node on tree.

Parameters:

The SplitDistributionSummarizer Class

class dendropy.datamodel.treecollectionmodel.SplitDistributionSummarizer(**kwargs)[source]

See SplitDistributionSummarizer.configure for configuration options.

configure(**kwargs)[source]

Configure rendition/mark-up.

Parameters:
  • set_edge_lengths (string) –

    For each edge, set the length based on:

    • ”support”: use support values split corresponding to edge

    • ”mean-length”: mean of edge lengths for split

    • ”median-length”: median of edge lengths for split

    • ”mean-age”: such that split age is equal to mean of ages

    • ”median-age”: such that split age is equal to mean of ages

    • None: do not set edge lengths

  • add_support_as_node_attribute (bool) – Adds each node’s support value as an attribute of the node, “support”.

  • add_support_as_node_annotation (bool) – Adds support as a metadata annotation, “support”. If add_support_as_node_attribute is True, then the value will be dynamically-bound to the value of the node’s “support” attribute.

  • set_support_as_node_label (bool) – Sets the label attribute of each node to the support value.

  • add_node_age_summaries_as_node_attributes (bool) –

    Summarizes the distribution of the ages of each node in the following attributes:

    • age_mean

    • age_median

    • age_sd

    • age_hpd95

    • age_range

  • add_node_age_summaries_as_node_annotations (bool) –

    Summarizes the distribution of the ages of each node in the following metadata annotations:

    • age_mean

    • age_median

    • age_sd

    • age_hpd95

    • age_range

    If add_node_age_summaries_as_node_attributes is True, then the values will be dynamically-bound to the corresponding node attributes.

  • add_edge_length_summaries_as_edge_attributes (bool) –

    Summarizes the distribution of the lengths of each edge in the following attribtutes:

    • length_mean

    • length_median

    • length_sd

    • length_hpd95

    • length_range

  • add_edge_length_summaries_as_edge_annotations (bool) –

    Summarizes the distribution of the lengths of each edge in the following metadata annotations:

    • length_mean

    • length_median

    • length_sd

    • length_hpd95

    • length_range

    If add_edge_length_summaries_as_edge_attributes is True, then the values will be dynamically-bound to the corresponding edge attributes.

  • support_label_decimals (int) – Number of decimal places to express when rendering the support value as a string for the node label.

  • support_as_percentages (bool) – Whether or not to express the support value as percentages (default is probability or proportion).

  • minimum_edge_length (numeric) – All edge lengths calculated to have a value less than this will be set to this.

  • error_on_negative_edge_lengths (bool) – If True, an inferred edge length that is less than 0 will result in a ValueError.