dendropy.datamodel.treecollectionmodel
: Collections of Trees¶
The TreeList
Class¶
- class dendropy.datamodel.treecollectionmodel.TreeList(*args, **kwargs)[source]¶
A collection of
Tree
objects, all referencing the same “universe” of opeational taxonomic unit concepts through the sameTaxonNamespace
object reference.Constructs a new
TreeList
object, populating it with any iterable container with Tree object members passed as unnamed argument, or from a data source ifstream
andschema
are passed.If passed an iterable container, the objects in that container must be of type
Tree
(or derived). If the container is of typeTreeList
, then, because eachTree
object must have the sameTaxonNamespace
reference as the containingTreeList
, the trees in the container passed as an initialization argument will be deep-copied (except for associatedTaxonNamespace
andTaxon
objects, which will be shallow-copied). If the container is any other type of iterable, then theTree
objects will be shallow-copied.TreeList
objects can directly thus be instantiated in the following ways:# /usr/bin/env python from dendropy import TaxonNamespace, Tree, TreeList # instantiate an empty tree tlst1 = TreeList() # TreeList objects can be instantiated from an external data source # using the 'get()' factory class method tlst2 = TreeList.get(file=open('treefile.tre', 'rU'), schema="newick") tlst3 = TreeList.get(path='sometrees.nexus', schema="nexus") tlst4 = TreeList.get(data="((A,B),(C,D));((A,C),(B,D));", schema="newick") # can also call `read()` on a TreeList object; each read adds # (appends) the tree(s) found to the TreeList tlst5 = TreeList() tlst5.read(file=open('boot1.tre', 'rU'), schema="newick") tlst5.read(path="boot3.tre", schema="newick") tlst5.read(value="((A,B),(C,D));((A,C),(B,D));", schema="newick") # populated from list of Tree objects tlist6_1 = Tree.get( data="((A,B),(C,D))", schema="newick") tlist6_2 = Tree.get( data="((A,C),(B,D))", schema="newick") tlist6 = TreeList([tlist5_1, tlist5_2]) # passing keywords to underlying tree parser tlst8 = TreeList.get( data="((A,B),(C,D));((A,C),(B,D));", schema="newick", taxon_namespace=tlst3.taxon_namespace, rooting="force-rooted", extract_comment_metadata=True, store_tree_weights=False, preserve_underscores=True) # Subsets of trees can be read. Note that in most cases, the entire # data source is parsed, so this is not more efficient than reading # all the trees and then manually-extracting them later; just more # convenient # skip the *first* 100 trees in the *first* (offset=0) collection of trees trees = TreeList.get( path="mcmc.tre", schema="newick", collection_offset=0, tree_offset=100) # get the *last* 10 trees in the *second* (offset=1) collection of trees trees = TreeList.get( path="mcmc.tre", schema="newick", collection_offset=1, tree_offset=-10) # get the last 10 trees in the second-to-last collection of trees trees = TreeList.get( path="mcmc.tre", schema="newick", collection_offset=-2, tree_offset=100) # Slices give shallow-copy: trees are references tlst4copy0a = t4[:] assert tlst4copy0a[0] is t4[0] tlst4copy0b = t4[:4] assert tlst4copy0b[0] is t4[0] # 'Taxon-namespace-scoped' copy: # I.e., Deep-copied objects but taxa and taxon namespace # are copied as references tlst4copy1a = TreeList(t4) tlst4copy1b = TreeList([Tree(t) for t in tlst5]) assert tlst4copy1a[0] is not tlst4[0] # True assert tlst4copy1a.taxon_namespace is tlst4.taxon_namespace # True assert tlst4copy1b[0] is not tlst4[0] # True assert tlst4copy1b.taxon_namespace is tlst4.taxon_namespace # True
- __add__(other)[source]¶
Creates and returns new
TreeList
with clones of all trees inself
as well as allTree
objects inother
. Ifother
is aTreeList
, then the trees are cloned and migrated intoself.taxon_namespace
; otherwise, the original objects are migrated intoself.taxon_namespace
and added directly.
- __getitem__(index)[source]¶
If
index
is an integer, thenTree
object at positionindex
is returned. Ifindex
is a slice, then aTreeList
is returned with references (i.e., not copies or clones, but the actual original instances themselves) toTree
objects in the positions given by the slice. TheTaxonNamespace
is the same asself
.- Parameters:
index (integer or slice) – Index or slice.
- Returns:
t (|Tree| object or |TreeList| object)
- __iadd__(other)[source]¶
In-place addition of
Tree
objects inother
toself
.If
other
is aTreeList
, then the trees are copied and migrated intoself.taxon_namespace
; otherwise, the original objects are migrated intoself.taxon_namespace
and added directly.- Parameters:
other (iterable of
Tree
objects) –- Returns:
``self`` (|TreeList|)
- append(tree, taxon_import_strategy='migrate', **kwargs)[source]¶
Adds a
Tree
object,tree
, to the collection.The
TaxonNamespace
reference oftree
will be set to that ofself
. AnyTaxon
objects associated with nodes intree
that are not already inself.taxon_namespace
will be handled according totaxon_import_strategy
:- ‘migrate’
Taxon
objects associated withtree
that are not already inself.taxon_nameaspace
will be remapped based on their labels, with new :class|Taxon| objects being reconstructed if none with matching labels are found. Specifically,dendropy.datamodel.treemodel.Tree.migrate_taxon_namespace
will be called ontree
, wherekwargs
is as passed to this function.
- Parameters:
taxon_import_strategy (string) – If
tree
is associated with a differentTaxonNamespace
, this argument determines how newTaxon
objects intree
are handled: ‘migrate’ or ‘add’. See above for details.**kwargs (keyword arguments) – These arguments will be passed directly to ‘migrate_taxon_namespace()’ method call on
tree
.
See also
Tree.migrate_taxon_namespace
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- as_tree_array(**kwargs)[source]¶
Return
TreeArray
collecting information on splits in contained trees. Keyword arguments get passed directly toTreeArray
constructor.
- clone(depth=1)¶
Creates and returns a copy of
self
.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_set
of top-level object and memberAnnotation
objects: these are full, independent instances (though any complex objects in thevalue
field ofAnnotation
objects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
instances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- consensus(min_freq=0.5, is_bipartitions_updated=False, summarize_splits=True, **kwargs)[source]¶
Returns a consensus tree of all trees in self, with minumum frequency of bipartition to be added to the consensus tree given by
min_freq
.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other
, which must be ofAnnotable
type.Copies are deep-copies, in that the
Annotation
objects added to theannotation_set
AnnotationSet
collection ofself
are independent copies of those in theannotate_set
collection ofother
. However, dynamic bound-attribute annotations retain references to the original objects as given inother
, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper
. In dynamic bound-attribute annotations, the_value
attribute of the annotations object (Annotation._value
) is a tuple consisting of “(obj, attr_name)
”, which instructs theAnnotation
object to return “getattr(obj, attr_name)
” (via: “getattr(*self._value)
”) when returning the value of the Annotation. “obj
” is typically the object to which theAnnotationSet
belongs (i.e.,self
). When a copy ofAnnotation
is created, the object reference given in the first element of the_value
tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable
) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memo
of__deepcopy__
, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotation
gives objectx
as the parent or owner of the attribute (that is, the first element of theAnnotation._value
tuple isother
) andid(x)
is found inattribute_object_mapper
, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]
. Ifattribute_object_mapper
isNone
(default), then the following mapping is automatically inserted:id(other): self
. That is, any references toother
in anyAnnotation
object will be remapped toself
. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
other
in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself
. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- extend(other)[source]¶
In-place addition of
Tree
objects inother
toself
.If
other
is aTreeList
, then the trees are copied and migrated intoself.taxon_namespace
; otherwise, the original objects are migrated intoself.taxon_namespace
and added directly.- Parameters:
other (iterable of
Tree
objects) –- Returns:
``self`` (|TreeList|)
- frequency_of_bipartition(**kwargs)[source]¶
Given a bipartition specified as:
a
Bipartition
instance given the keyword ‘bipartition’a split bitmask given the keyword ‘split_bitmask’
a list of
Taxon
objects given with the keywordtaxa
a list of taxon labels given with the keyword
labels
this function returns the proportion of trees in self in which the split is found.
If the tree(s) in the collection are unrooted, then the bipartition will be normalized for the comparison.
- classmethod get(**kwargs)[source]¶
Instantiate and return a new
TreeList
object from a data source.Mandatory Source-Specification Keyword Argument (Exactly One Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “newick”, “nexus”, or “nexml”. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
None
otherwise.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created.collection_offset (int) – 0-based index of tree block or collection in source to be parsed. If not specified then the first collection (offset = 0) is assumed.
tree_offset (int) – 0-based index of first tree within the collection specified by
collection_offset
to be parsed (i.e., skipping the firsttree_offset
trees). If not specified, then the first tree (offset = 0) is assumed (i.e., no trees within the specified collection will be skipped). Use this to specify, e.g. a burn-in.ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
tlst1 = dendropy.TreeList.get( file=open('treefile.tre', 'rU'), schema="newick") tlst2 = dendropy.TreeList.get( path='sometrees.nexus', schema="nexus", collection_offset=2, tree_offset=100) tlst3 = dendropy.TreeList.get( data="((A,B),(C,D));((A,C),(B,D));", schema="newick") tree4 = dendropy.dendropy.TreeList.get( url="http://api.opentreeoflife.org/v2/study/pg_1144/tree/tree2324.nex", schema="nexus")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src
.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src
.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src
.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src
.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- insert(index, tree, taxon_import_strategy='migrate', **kwargs)[source]¶
Inserts a
Tree
object,tree
, into the collection beforeindex
.The
TaxonNamespace
reference oftree
will be set to that ofself
. AnyTaxon
objects associated with nodes intree
that are not already inself.taxon_namespace
will be handled according totaxon_import_strategy
:- ‘migrate’
Taxon
objects associated withtree
that are not already inself.taxon_nameaspace
will be remapped based on their labels, with new :class|Taxon| objects being reconstructed if none with matching labels are found. Specifically,dendropy.datamodel.treemodel.Tree.migrate_taxon_namespace
will be called ontree
, wherekwargs
is as passed to this function.
- Parameters:
index (integer) – Position before which to insert
tree
.taxon_import_strategy (string) – If
tree
is associated with a differentTaxonNamespace
, this argument determines how newTaxon
objects intree
are handled: ‘migrate’ or ‘add’. See above for details.**kwargs (keyword arguments) – These arguments will be passed directly to ‘migrate_taxon_namespace()’ method call on
tree
.
See also
Tree.migrate_taxon_namespace
- maximum_product_of_split_support_tree(include_external_splits=False, score_attr='log_product_of_split_support')[source]¶
Return the tree with that maximizes the product of split supports, also known as the “Maximum Clade Credibility Tree” or MCCT.
- Parameters:
include_external_splits (bool) – If
True
, then non-internal split posteriors will be included in the score. Defaults toFalse
: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
mcct_tree (Tree) – Tree that maximizes the product of split supports.
- maximum_sum_of_split_support_tree(include_external_splits=False, score_attr='sum_of_split_support')[source]¶
Return the tree with that maximizes the sum of split supports.
- Parameters:
include_external_splits (bool) – If
True
, then non-internal split posteriors will be included in the score. Defaults toFalse
: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
mcct_tree (Tree) – Tree that maximizes the sum of split supports.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespace
value will be replaced with value given intaxon_namespace
if this is notNone
, or a newTaxonNamespace
object. Following this,reconstruct_taxon_namespace()
will be called: each distinctTaxon
object associated withself
or members ofself
that is not alread intaxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
into the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxon
object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- poll_taxa(taxa=None)[source]¶
Returns a set populated with all of
Taxon
instances associated withself
.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self
.
- purge_taxon_namespace()¶
Remove all
Taxon
instances inself.taxon_namespace
that are not associated withself
or any item inself
.
- read(**kwargs)[source]¶
Add
Tree
objects to existingTreeList
from data source providing one or more collections of trees.Mandatory Source-Specification Keyword Argument (Exactly One Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “newick”, “nexus”, or “nexml”. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
collection_offset (int) – 0-based index of tree block or collection in source to be parsed. If not specified then the first collection (offset = 0) is assumed.
tree_offset (int) – 0-based index of first tree within the collection specified by
collection_offset
to be parsed (i.e., skipping the firsttree_offset
trees). If not specified, then the first tree (offset = 0) is assumed (i.e., no trees within the specified collection will be skipped). Use this to specify, e.g. a burn-in.ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
tlist = dendropy.TreeList() tlist.read( file=open('treefile.tre', 'rU'), schema="newick", tree_offset=100) tlist.read( path='sometrees.nexus', schema="nexus", collection_offset=2, tree_offset=100) tlist.read( data="((A,B),(C,D));((A,C),(B,D));", schema="newick") tlist.read( url="http://api.opentreeoflife.org/v2/study/pg_1144/tree/tree2324.nex", schema="nexus")
- read_from_path(src, schema, **kwargs)¶
Reads data from file specified by
filepath
.- Parameters:
filepath (file or file-like) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple
[integer]) – A value indicating size of data read, where “size” depends on the object:Tree
: undefinedTreeList
: number of treesCharacterMatrix
: number of sequencesDataSet
:tuple
(number of taxon namespaces, number of tree lists, number of matrices)
- read_from_stream(src, schema, **kwargs)¶
Reads from file (exactly equivalent to just
read()
, provided here as a separate method for completeness.- Parameters:
fileobj (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple
[integer]) – A value indicating size of data read, where “size” depends on the object:Tree
: undefinedTreeList
: number of treesCharacterMatrix
: number of sequencesDataSet
:tuple
(number of taxon namespaces, number of tree lists, number of matrices)
- read_from_string(src, schema, **kwargs)¶
Reads a string.
- Parameters:
src_str (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple
[integer]) – A value indicating size of data read, where “size” depends on the object:Tree
: undefinedTreeList
: number of treesCharacterMatrix
: number of sequencesDataSet
:tuple
(number of taxon namespaces, number of tree lists, number of matrices)
- read_from_url(src, schema, **kwargs)¶
Reads a URL source.
- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple
[integer]) – A value indicating size of data read, where “size” depends on the object:Tree
: undefinedTreeList
: number of treesCharacterMatrix
: number of sequencesDataSet
:tuple
(number of taxon namespaces, number of tree lists, number of matrices)
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)[source]¶
Repopulates the current taxon namespace with new taxon objects, preserving labels. Each distinct
Taxon
object associated withself
or members ofself
that is not already inself.taxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.Note
Existing
Taxon
objects inself.taxon_namespace
are not removed. This method should thus only be called only whenself.taxon_namespace
has been changed. In fact, typical usage would not involve calling this method directly, but rather through- Parameters:
unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace.
- reindex_subcomponent_taxa()[source]¶
DEPRECATED: Use
reconstruct_taxon_namespace
instead. Derived classes should override this to ensure that their various components, attributes and members all refer to the sameTaxonNamespace
object asself.taxon_namespace
, and thatself.taxon_namespace
has all theTaxon
objects in the various members.
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()
instead. Rebuildstaxon_namespace
from scratch, or assignsTaxon
objects from givenTaxonNamespace
objecttaxon_namespace
based on label values.
- split_distribution(is_bipartitions_updated=False, default_edge_length_value=None, **kwargs)[source]¶
Return
SplitDistribution
collecting information on splits in contained trees. Keyword arguments get passed directly toSplitDistribution
constructor.
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
- classmethod tree_factory(*args, **kwargs)[source]¶
Creates and returns a
Tree
of a type that this list understands how to manage.Deriving classes can override this to provide for custom Tree-type object lists. You can simple override the class-level variable
DEFAULT_TREE_TYPE
in your derived class if the constructor signature of the alternate tree type is the same asTree
. If you want to have a TreeList instance that generates custom trees (i.e., as opposed to a TreeList-ish class of instances), set thetree_type
attribute of the TreeList instance.
- update_taxon_namespace()[source]¶
All
Taxon
objects associated withself
or members ofself
that are not inself.taxon_namespace
will be added. Note that, unlikereconstruct_taxon_namespace
, no newTaxon
objects will be created.
- write(**kwargs)¶
Writes out
self
inschema
format.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest
.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest
.
- TreeList.put(\*\*kwargs)¶
Write out collection of trees to file.
- Mandatory Destimation-Specification Keyword Arguments (one and exactly one of the following required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
- Mandatory Schema-Specification Keyword Argument:
- Optional Schema-Specific Keyword Arguments:
- These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument:
- These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
The TreeArray
Class¶
- class dendropy.datamodel.treecollectionmodel.TreeArray(taxon_namespace=None, is_rooted_trees=None, ignore_edge_lengths=False, ignore_node_ages=True, use_tree_weights=True, ultrametricity_precision=1e-05, is_force_max_age=None, taxon_label_age_map=None)[source]¶
High-performance collection of tree structures.
Storage of minimal tree structural information as represented by toplogy and edge lengths, minimizing memory and processing time. This class stores trees as collections of splits and edge lengths. All other information, such as labels, metadata annotations, etc. will be discarded. A full
Tree
instance can be reconstructed as needed from the structural information stored by this class, at the cost of computation time.- Parameters:
taxon_namespace (
TaxonNamespace
) – The operational taxonomic unit concept namespace to manage taxon references.is_rooted_trees (bool) – If not set, then it will be set based on the rooting state of the first tree added. If
True
, then trying to add an unrooted tree will result in an error. IfFalse
, then trying to add a rooted tree will result in an error.ignore_edge_lengths (bool) – If
True
, then edge lengths of splits will not be stored. IfFalse
, then edge lengths will be stored.ignore_node_ages (bool) – If
True
, then node ages of splits will not be stored. IfFalse
, then node ages will be stored.use_tree_weights (bool) – If
False
, then tree weights will not be used to weight splits.
- add_tree(tree, is_bipartitions_updated=False, index=None)[source]¶
Adds the structure represented by a
Tree
instance to the collection.- Parameters:
tree (
Tree
) – ATree
instance. This must have the same rooting state as all the other trees accessioned into this collection as well as that ofself.is_rooted_trees
.is_bipartitions_updated (bool) – If
False
[default], then the tree will have its splits encoded or updated. Otherwise, ifTrue
, then the tree is assumed to have its splits already encoded and updated.index (integer) – Insert before index.
- Returns:
index (int) – The index of the accession.
s (iterable of splits) – A list of split bitmasks from
tree
.e – A list of edge length values from
tree
.
- add_trees(trees, is_bipartitions_updated=False)[source]¶
Adds multiple structures represneted by an iterator over or iterable of
Tree
instances to the collection.- Parameters:
trees (iterator over or iterable of
Tree
instances) – An iterator over or iterable ofTree
instances. Thess must have the same rooting state as all the other trees accessioned into this collection as well as that ofself.is_rooted_trees
.is_bipartitions_updated (bool) – If
False
[default], then the tree will have its splits encoded or updated. Otherwise, ifTrue
, then the tree is assumed to have its splits already encoded and updated.
- append(tree, is_bipartitions_updated=False)[source]¶
Adds a
Tree
instance to the collection before position given byindex
.- Parameters:
tree (
Tree
) – ATree
instance. This must have the same rooting state as all the other trees accessioned into this collection as well as that ofself.is_rooted_trees
.is_bipartitions_updated (bool) – If
False
[default], then the tree will have its splits encoded or updated. Otherwise, ifTrue
, then the tree is assumed to have its splits already encoded and updated.
- bipartition_encoding_frequencies()[source]¶
Returns a dictionary with keys being bipartition encodings of trees (as
frozenset
collections ofBipartition
objects) and values the frequency of occurrence of trees represented by that encoding in the collection.
- calculate_log_product_of_split_supports(include_external_splits=False)[source]¶
Calculates the log product of split support for each of the trees in the collection.
- Parameters:
include_external_splits (bool) – If
True
, then non-internal split posteriors will be included in the score. Defaults toFalse
: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
s (tuple(list[numeric], integer)) – Returns a tuple, with the first element being the list of scores and the second being the index of the highest score. The element order corresponds to the trees accessioned in the collection.
- calculate_sum_of_split_supports(include_external_splits=False)[source]¶
Calculates the sum of split support for all trees in the collection.
- Parameters:
include_external_splits (bool) – If
True
, then non-internal split posteriors will be included in the score. Defaults toFalse
: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
s (tuple(list[numeric], integer)) – Returns a tuple, with the first element being the list of scores and the second being the index of the highest score. The element order corresponds to the trees accessioned in the collection.
- consensus_tree(min_freq=0.5, summarize_splits=True, **split_summarization_kwargs)[source]¶
Returns a consensus tree from splits in
self
.- Parameters:
min_freq (real) – The minimum frequency of a split in this distribution for it to be added to the tree.
is_rooted (bool) – Should tree be rooted or not? If all trees counted for splits are explicitly rooted or unrooted, then this will default to
True
orFalse
, respectively. Otherwise it defaults toNone
.**split_summarization_kwargs (keyword arguments) – These will be passed directly to the underlying
SplitDistributionSummarizer
object. SeeSplitDistributionSummarizer.configure
for options.
- Returns:
t (consensus tree)
- get_split_bitmask_and_edge_tuple(index)[source]¶
Returns a pair of tuples, ( (splits…), (lengths…) ), corresponding to the “tree” at
index
.
- insert(index, tree, is_bipartitions_updated=False)[source]¶
Adds a
Tree
instance to the collection before position given byindex
.- Parameters:
index (integer) – Insert before index.
tree (
Tree
) – ATree
instance. This must have the same rooting state as all the other trees accessioned into this collection as well as that ofself.is_rooted_trees
.is_bipartitions_updated (bool) – If
False
[default], then the tree will have its splits encoded or updated. Otherwise, ifTrue
, then the tree is assumed to have its splits already encoded and updated.
- Returns:
index (int) – The index of the accession.
s (iterable of splits) – A list of split bitmasks from
tree
.e – A list of edge length values
tree
.
- maximum_product_of_split_support_tree(include_external_splits=False, summarize_splits=True, **split_summarization_kwargs)[source]¶
Return the tree with that maximizes the product of split supports, also known as the “Maximum Clade Credibility Tree” or MCCT.
- Parameters:
include_external_splits (bool) – If
True
, then non-internal split posteriors will be included in the score. Defaults toFalse
: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
mcct_tree (Tree) – Tree that maximizes the product of split supports.
- maximum_sum_of_split_support_tree(include_external_splits=False, summarize_splits=True, **split_summarization_kwargs)[source]¶
Return the tree with that maximizes the sum of split supports.
- Parameters:
include_external_splits (bool) – If
True
, then non-internal split posteriors will be included in the score. Defaults toFalse
: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
mst_tree (Tree) – Tree that maximizes the sum of split supports.
- read(**kwargs)[source]¶
Add
Tree
objects to existingTreeList
from data source providing one or more collections of trees.Mandatory Source-Specification Keyword Argument (Exactly One Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “newick”, “nexus”, or “nexml”. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
collection_offset (int) – 0-based index of tree block or collection in source to be parsed. If not specified then the first collection (offset = 0) is assumed.
tree_offset (int) – 0-based index of first tree within the collection specified by
collection_offset
to be parsed (i.e., skipping the firsttree_offset
trees). If not specified, then the first tree (offset = 0) is assumed (i.e., no trees within the specified collection will be skipped). Use this to specify, e.g. a burn-in.ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
tree_array = dendropy.TreeArray() tree_array.read( file=open('treefile.tre', 'rU'), schema="newick", tree_offset=100) tree_array.read( path='sometrees.nexus', schema="nexus", collection_offset=2, tree_offset=100) tree_array.read( data="((A,B),(C,D));((A,C),(B,D));", schema="newick") tree_array.read( url="http://api.opentreeoflife.org/v2/study/pg_1144/tree/tree2324.nex", schema="nexus")
- read_from_files(files, schema, **kwargs)[source]¶
Adds multiple structures from one or more external file sources to the collection.
- Parameters:
files (iterable of strings and/or file objects) – A list or some other iterable of file paths or file-like objects (string elements will be assumed to be paths to files, while all other types of elements will be assumed to be file-like objects opened for reading).
schema (string) – The data format of the source. E.g., “nexus”, “newick”, “nexml”.
**kwargs (keyword arguments) – These will be passed directly to the underlying schema-specific reader implementation.
- split_bitmask_set_frequencies()[source]¶
Returns a dictionary with keys being sets of split bitmasks and values being the frequency of occurrence of trees represented by those split bitmask sets in the collection.
- topologies(sort_descending=None, frequency_attr_name='frequency', frequency_annotation_name='frequency')[source]¶
Returns a
TreeList
instance containing the reconstructed tree topologies (i.e.Tree
instances with no edge weights) in the collection, with the frequency added as an attributed.- Parameters:
sort_descending (bool) – If
True
, then topologies will be sorted in descending frequency order (i.e., topologies with the highest frequencies will be listed first). IfFalse
, then they will be sorted in ascending frequency. IfNone
(default), then they will not be sorted.frequency_attr_name (str) – Name of attribute to add to each
Tree
representing the frequency of that topology in the collection. IfNone
then the attribute will not be added.frequency_annotation_name (str) – Name of annotation to add to the annotations of each
Tree
, representing the frequency of that topology in the collection. The value of this annotation will be dynamically-bound to the attribute specified byfrequency_attr_name
unless that isNone
. Iffrequency_annotation_name
isNone
then the annotation will not be added.
The SplitDistribution
Class¶
- class dendropy.datamodel.treecollectionmodel.SplitDistribution(taxon_namespace=None, ignore_edge_lengths=False, ignore_node_ages=True, use_tree_weights=True, ultrametricity_precision=1e-05, is_force_max_age=False, taxon_label_age_map=None)[source]¶
Collects information regarding splits over multiple trees.
- collapse_edges_with_less_than_minimum_support(tree, min_freq=0.5)[source]¶
Collapse edges on tree that have support less than indicated by
min_freq
.
- consensus_tree(min_freq=0.5, is_rooted=None, summarize_splits=True, **split_summarization_kwargs)[source]¶
Returns a consensus tree from splits in
self
.- Parameters:
min_freq (real) – The minimum frequency of a split in this distribution for it to be added to the tree.
is_rooted (bool) – Should tree be rooted or not? If all trees counted for splits are explicitly rooted or unrooted, then this will default to
True
orFalse
, respectively. Otherwise it defaults toNone
.**split_summarization_kwargs (keyword arguments) – These will be passed directly to the underlying
SplitDistributionSummarizer
object. SeeSplitDistributionSummarizer.configure
for options.
- Returns:
t (consensus tree)
- count_splits_on_tree(tree, is_bipartitions_updated=False, default_edge_length_value=None)[source]¶
Counts splits in this tree and add to totals.
tree
must be decorated with splits, and no attempt is made to normalize taxa.- Parameters:
- Returns:
s (iterable of splits) – A list of split bitmasks from
tree
.e – A list of edge length values from
tree
.a – A list of node age values from
tree
.
- log_product_of_split_support_on_tree(tree, is_bipartitions_updated=False, include_external_splits=False)[source]¶
Calculates the (log) product of the support of the splits of the tree, where the support is given by the proportional frequency of the split in the current split distribution.
The tree that has the highest product of split support out of a sample of trees corresponds to the “maximum credibility tree” for that sample. This can also be referred to as the “maximum clade credibility tree”, though this latter term is sometimes use for the tree that has the highest sum of split support (see
SplitDistribution.sum_of_split_support_on_tree
).- Parameters:
tree (
Tree
) – The tree for which the score should be calculated.is_bipartitions_updated (bool) – If
True
, then the splits are assumed to have already been encoded and will not be updated on the trees.include_external_splits (bool) – If
True
, then non-internal split posteriors will be included in the score. Defaults toFalse
: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.
- Returns:
s (numeric) – The log product of the support of the splits of the tree.
- normalize_bitmask(bitmask)[source]¶
“Normalizes” split, by ensuring that the least-significant bit is always 1 (used on unrooted trees to establish split identity independent of rotation).
- Parameters:
bitmask (integer) – Split bitmask hash to be normalized.
- Returns:
h (integer) – Normalized split bitmask.
- split_support_iter(tree, is_bipartitions_updated=False, include_external_splits=False, traversal_strategy='preorder', node_support_attr_name=None, edge_support_attr_name=None)[source]¶
Returns iterator over support values for the splits of a given tree, where the support value is given by the proportional frequency of the split in the current split distribution.
- Parameters:
is_bipartitions_updated (bool) – If
False
[default], then the tree will have its splits encoded or updated. Otherwise, ifTrue
, then the tree is assumed to have its splits already encoded and updated.include_external_splits (bool) – If
True
, then non-internal split posteriors will be included. IfFalse
, then these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.traversal_strategy (str) – One of: “preorder” or “postorder”. Specfies order in which splits are visited.
- Returns:
s (list of floats) – List of values for splits in the tree corresponding to the proportional frequency that the split is found in the current distribution.
- splits_considered()[source]¶
- Returns 4 values:
total number of splits counted total weighted number of unique splits counted total number of non-trivial splits counted total weighted number of unique non-trivial splits counted
- sum_of_split_support_on_tree(tree, is_bipartitions_updated=False, include_external_splits=False)[source]¶
Calculates the sum of the support of the splits of the tree, where the support is given by the proportional frequency of the split in the current distribtion.
- Parameters:
tree (
Tree
) – The tree for which the score should be calculated.is_bipartitions_updated (bool) – If
True
, then the splits are assumed to have already been encoded and will not be updated on the trees.include_external_splits (bool) – If
True
, then non-internal split posteriors will be included in the score. Defaults toFalse
: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.
- Returns:
s (numeric) – The sum of the support of the splits of the tree.
- summarize_splits_on_tree(tree, is_bipartitions_updated=False, **split_summarization_kwargs)[source]¶
Summarizes support of splits/edges/node on tree.
- Parameters:
tree (
Tree
instance) – Tree to be decorated with support values.is_bipartitions_updated (bool) – If
True
, then bipartitions will not be recalculated.**split_summarization_kwargs (keyword arguments) – These will be passed directly to the underlying
SplitDistributionSummarizer
object. SeeSplitDistributionSummarizer.configure
for options.
The SplitDistributionSummarizer
Class¶
- class dendropy.datamodel.treecollectionmodel.SplitDistributionSummarizer(**kwargs)[source]¶
See
SplitDistributionSummarizer.configure
for configuration options.- configure(**kwargs)[source]¶
Configure rendition/mark-up.
- Parameters:
set_edge_lengths (string) –
For each edge, set the length based on:
”support”: use support values split corresponding to edge
”mean-length”: mean of edge lengths for split
”median-length”: median of edge lengths for split
”mean-age”: such that split age is equal to mean of ages
”median-age”: such that split age is equal to mean of ages
None
: do not set edge lengths
add_support_as_node_attribute (bool) – Adds each node’s support value as an attribute of the node, “
support
”.add_support_as_node_annotation (bool) – Adds support as a metadata annotation, “
support
”. Ifadd_support_as_node_attribute
isTrue
, then the value will be dynamically-bound to the value of the node’s “support
” attribute.set_support_as_node_label (bool) – Sets the
label
attribute of each node to the support value.add_node_age_summaries_as_node_attributes (bool) –
Summarizes the distribution of the ages of each node in the following attributes:
age_mean
age_median
age_sd
age_hpd95
age_range
add_node_age_summaries_as_node_annotations (bool) –
Summarizes the distribution of the ages of each node in the following metadata annotations:
age_mean
age_median
age_sd
age_hpd95
age_range
If
add_node_age_summaries_as_node_attributes
isTrue
, then the values will be dynamically-bound to the corresponding node attributes.add_edge_length_summaries_as_edge_attributes (bool) –
Summarizes the distribution of the lengths of each edge in the following attribtutes:
length_mean
length_median
length_sd
length_hpd95
length_range
add_edge_length_summaries_as_edge_annotations (bool) –
Summarizes the distribution of the lengths of each edge in the following metadata annotations:
length_mean
length_median
length_sd
length_hpd95
length_range
If
add_edge_length_summaries_as_edge_attributes
isTrue
, then the values will be dynamically-bound to the corresponding edge attributes.support_label_decimals (int) – Number of decimal places to express when rendering the support value as a string for the node label.
support_as_percentages (bool) – Whether or not to express the support value as percentages (default is probability or proportion).
minimum_edge_length (numeric) – All edge lengths calculated to have a value less than this will be set to this.
error_on_negative_edge_lengths (bool) – If
True
, an inferred edge length that is less than 0 will result in a ValueError.