dendropy.datamodel.treecollectionmodel: Collections of Trees¶
The TreeList Class¶
- class dendropy.datamodel.treecollectionmodel.TreeList(*args, **kwargs)[source]¶
A collection of
Treeobjects, all referencing the same “universe” of opeational taxonomic unit concepts through the sameTaxonNamespaceobject reference.Constructs a new
TreeListobject, populating it with any iterable container with Tree object members passed as unnamed argument, or from a data source ifstreamandschemaare passed.If passed an iterable container, the objects in that container must be of type
Tree(or derived). If the container is of typeTreeList, then, because eachTreeobject must have the sameTaxonNamespacereference as the containingTreeList, the trees in the container passed as an initialization argument will be deep-copied (except for associatedTaxonNamespaceandTaxonobjects, which will be shallow-copied). If the container is any other type of iterable, then theTreeobjects will be shallow-copied.TreeListobjects can directly thus be instantiated in the following ways:# /usr/bin/env python from dendropy import TaxonNamespace, Tree, TreeList # instantiate an empty tree tlst1 = TreeList() # TreeList objects can be instantiated from an external data source # using the 'get()' factory class method tlst2 = TreeList.get(file=open('treefile.tre', 'rU'), schema="newick") tlst3 = TreeList.get(path='sometrees.nexus', schema="nexus") tlst4 = TreeList.get(data="((A,B),(C,D));((A,C),(B,D));", schema="newick") # can also call `read()` on a TreeList object; each read adds # (appends) the tree(s) found to the TreeList tlst5 = TreeList() tlst5.read(file=open('boot1.tre', 'rU'), schema="newick") tlst5.read(path="boot3.tre", schema="newick") tlst5.read(value="((A,B),(C,D));((A,C),(B,D));", schema="newick") # populated from list of Tree objects tlist6_1 = Tree.get( data="((A,B),(C,D))", schema="newick") tlist6_2 = Tree.get( data="((A,C),(B,D))", schema="newick") tlist6 = TreeList([tlist5_1, tlist5_2]) # passing keywords to underlying tree parser tlst8 = TreeList.get( data="((A,B),(C,D));((A,C),(B,D));", schema="newick", taxon_namespace=tlst3.taxon_namespace, rooting="force-rooted", extract_comment_metadata=True, store_tree_weights=False, preserve_underscores=True) # Subsets of trees can be read. Note that in most cases, the entire # data source is parsed, so this is not more efficient than reading # all the trees and then manually-extracting them later; just more # convenient # skip the *first* 100 trees in the *first* (offset=0) collection of trees trees = TreeList.get( path="mcmc.tre", schema="newick", collection_offset=0, tree_offset=100) # get the *last* 10 trees in the *second* (offset=1) collection of trees trees = TreeList.get( path="mcmc.tre", schema="newick", collection_offset=1, tree_offset=-10) # get the last 10 trees in the second-to-last collection of trees trees = TreeList.get( path="mcmc.tre", schema="newick", collection_offset=-2, tree_offset=100) # Slices give shallow-copy: trees are references tlst4copy0a = t4[:] assert tlst4copy0a[0] is t4[0] tlst4copy0b = t4[:4] assert tlst4copy0b[0] is t4[0] # 'Taxon-namespace-scoped' copy: # I.e., Deep-copied objects but taxa and taxon namespace # are copied as references tlst4copy1a = TreeList(t4) tlst4copy1b = TreeList([Tree(t) for t in tlst5]) assert tlst4copy1a[0] is not tlst4[0] # True assert tlst4copy1a.taxon_namespace is tlst4.taxon_namespace # True assert tlst4copy1b[0] is not tlst4[0] # True assert tlst4copy1b.taxon_namespace is tlst4.taxon_namespace # True
- __add__(other)[source]¶
Creates and returns new
TreeListwith clones of all trees inselfas well as allTreeobjects inother. Ifotheris aTreeList, then the trees are cloned and migrated intoself.taxon_namespace; otherwise, the original objects are migrated intoself.taxon_namespaceand added directly.
- __getitem__(index)[source]¶
If
indexis an integer, thenTreeobject at positionindexis returned. Ifindexis a slice, then aTreeListis returned with references (i.e., not copies or clones, but the actual original instances themselves) toTreeobjects in the positions given by the slice. TheTaxonNamespaceis the same asself.- Parameters:
index (integer or slice) – Index or slice.
- Returns:
t (|Tree| object or |TreeList| object)
- __iadd__(other)[source]¶
In-place addition of
Treeobjects inothertoself.If
otheris aTreeList, then the trees are copied and migrated intoself.taxon_namespace; otherwise, the original objects are migrated intoself.taxon_namespaceand added directly.- Parameters:
other (iterable of
Treeobjects) –- Returns:
``self`` (|TreeList|)
- append(tree, taxon_import_strategy='migrate', **kwargs)[source]¶
Adds a
Treeobject,tree, to the collection.The
TaxonNamespacereference oftreewill be set to that ofself. AnyTaxonobjects associated with nodes intreethat are not already inself.taxon_namespacewill be handled according totaxon_import_strategy:- ‘migrate’
Taxonobjects associated withtreethat are not already inself.taxon_nameaspacewill be remapped based on their labels, with new :class|Taxon| objects being reconstructed if none with matching labels are found. Specifically,dendropy.datamodel.treemodel.Tree.migrate_taxon_namespacewill be called ontree, wherekwargsis as passed to this function.
- Parameters:
taxon_import_strategy (string) – If
treeis associated with a differentTaxonNamespace, this argument determines how newTaxonobjects intreeare handled: ‘migrate’ or ‘add’. See above for details.**kwargs (keyword arguments) – These arguments will be passed directly to ‘migrate_taxon_namespace()’ method call on
tree.
See also
Tree.migrate_taxon_namespace
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- as_tree_array(**kwargs)[source]¶
Return
TreeArraycollecting information on splits in contained trees. Keyword arguments get passed directly toTreeArrayconstructor.
- clone(depth=1)¶
Creates and returns a copy of
self.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_setof top-level object and memberAnnotationobjects: these are full, independent instances (though any complex objects in thevaluefield ofAnnotationobjects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxoninstances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- consensus(min_freq=0.5, is_bipartitions_updated=False, summarize_splits=True, **kwargs)[source]¶
Returns a consensus tree of all trees in self, with minumum frequency of bipartition to be added to the consensus tree given by
min_freq.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other, which must be ofAnnotabletype.Copies are deep-copies, in that the
Annotationobjects added to theannotation_setAnnotationSetcollection ofselfare independent copies of those in theannotate_setcollection ofother. However, dynamic bound-attribute annotations retain references to the original objects as given inother, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper. In dynamic bound-attribute annotations, the_valueattribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs theAnnotationobject to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which theAnnotationSetbelongs (i.e.,self). When a copy ofAnnotationis created, the object reference given in the first element of the_valuetuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memoof__deepcopy__, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotationgives objectxas the parent or owner of the attribute (that is, the first element of theAnnotation._valuetuple isother) andid(x)is found inattribute_object_mapper, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]. Ifattribute_object_mapperisNone(default), then the following mapping is automatically inserted:id(other): self. That is, any references tootherin anyAnnotationobject will be remapped toself. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
otherin any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- extend(other)[source]¶
In-place addition of
Treeobjects inothertoself.If
otheris aTreeList, then the trees are copied and migrated intoself.taxon_namespace; otherwise, the original objects are migrated intoself.taxon_namespaceand added directly.- Parameters:
other (iterable of
Treeobjects) –- Returns:
``self`` (|TreeList|)
- frequency_of_bipartition(**kwargs)[source]¶
Given a bipartition specified as:
a
Bipartitioninstance given the keyword ‘bipartition’a split bitmask given the keyword ‘split_bitmask’
a list of
Taxonobjects given with the keywordtaxaa list of taxon labels given with the keyword
labels
this function returns the proportion of trees in self in which the split is found.
If the tree(s) in the collection are unrooted, then the bipartition will be normalized for the comparison.
- classmethod get(**kwargs)[source]¶
Instantiate and return a new
TreeListobject from a data source.Mandatory Source-Specification Keyword Argument (Exactly One Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “newick”, “nexus”, or “nexml”. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
Noneotherwise.taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinstance to use to manage the taxon names. If not specified, a new one will be created.collection_offset (int) – 0-based index of tree block or collection in source to be parsed. If not specified then the first collection (offset = 0) is assumed.
tree_offset (int) – 0-based index of first tree within the collection specified by
collection_offsetto be parsed (i.e., skipping the firsttree_offsettrees). If not specified, then the first tree (offset = 0) is assumed (i.e., no trees within the specified collection will be skipped). Use this to specify, e.g. a burn-in.ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
tlst1 = dendropy.TreeList.get( file=open('treefile.tre', 'rU'), schema="newick") tlst2 = dendropy.TreeList.get( path='sometrees.nexus', schema="nexus", collection_offset=2, tree_offset=100) tlst3 = dendropy.TreeList.get( data="((A,B),(C,D));((A,C),(B,D));", schema="newick") tree4 = dendropy.dendropy.TreeList.get( url="http://api.opentreeoflife.org/v2/study/pg_1144/tree/tree2324.nex", schema="nexus")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- insert(index, tree, taxon_import_strategy='migrate', **kwargs)[source]¶
Inserts a
Treeobject,tree, into the collection beforeindex.The
TaxonNamespacereference oftreewill be set to that ofself. AnyTaxonobjects associated with nodes intreethat are not already inself.taxon_namespacewill be handled according totaxon_import_strategy:- ‘migrate’
Taxonobjects associated withtreethat are not already inself.taxon_nameaspacewill be remapped based on their labels, with new :class|Taxon| objects being reconstructed if none with matching labels are found. Specifically,dendropy.datamodel.treemodel.Tree.migrate_taxon_namespacewill be called ontree, wherekwargsis as passed to this function.
- Parameters:
index (integer) – Position before which to insert
tree.taxon_import_strategy (string) – If
treeis associated with a differentTaxonNamespace, this argument determines how newTaxonobjects intreeare handled: ‘migrate’ or ‘add’. See above for details.**kwargs (keyword arguments) – These arguments will be passed directly to ‘migrate_taxon_namespace()’ method call on
tree.
See also
Tree.migrate_taxon_namespace
- maximum_product_of_split_support_tree(include_external_splits=False, score_attr='log_product_of_split_support')[source]¶
Return the tree with that maximizes the product of split supports, also known as the “Maximum Clade Credibility Tree” or MCCT.
- Parameters:
include_external_splits (bool) – If
True, then non-internal split posteriors will be included in the score. Defaults toFalse: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
mcct_tree (Tree) – Tree that maximizes the product of split supports.
- maximum_sum_of_split_support_tree(include_external_splits=False, score_attr='sum_of_split_support')[source]¶
Return the tree with that maximizes the sum of split supports.
- Parameters:
include_external_splits (bool) – If
True, then non-internal split posteriors will be included in the score. Defaults toFalse: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
mcct_tree (Tree) – Tree that maximizes the sum of split supports.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespacevalue will be replaced with value given intaxon_namespaceif this is notNone, or a newTaxonNamespaceobject. Following this,reconstruct_taxon_namespace()will be called: each distinctTaxonobject associated withselfor members ofselfthat is not alread intaxon_namespacewill be replaced with a newTaxonobject that will be created with the same label and added toself.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitivesetting. IfFalseandunify_taxa_by_labelis alsoTrue, then the establishment of correspondence betweenTaxonobjects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxonobjects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxonobject in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue: ifunify_taxa_by_labelisTrue,Taxonobjects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinto the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True, then references to distinctTaxonobjects with identical labels in the current namespace will be replaced with a reference to a singleTaxonobject in the new namespace. IfFalse: references to distinctTaxonobjects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memoof deepcopy, this is a dictionary that mapsTaxonobjects in the old namespace to correspondingTaxonobjects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxonobject in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- poll_taxa(taxa=None)[source]¶
Returns a set populated with all of
Taxoninstances associated withself.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self.
- purge_taxon_namespace()¶
Remove all
Taxoninstances inself.taxon_namespacethat are not associated withselfor any item inself.
- read(**kwargs)[source]¶
Add
Treeobjects to existingTreeListfrom data source providing one or more collections of trees.Mandatory Source-Specification Keyword Argument (Exactly One Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “newick”, “nexus”, or “nexml”. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
collection_offset (int) – 0-based index of tree block or collection in source to be parsed. If not specified then the first collection (offset = 0) is assumed.
tree_offset (int) – 0-based index of first tree within the collection specified by
collection_offsetto be parsed (i.e., skipping the firsttree_offsettrees). If not specified, then the first tree (offset = 0) is assumed (i.e., no trees within the specified collection will be skipped). Use this to specify, e.g. a burn-in.ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
tlist = dendropy.TreeList() tlist.read( file=open('treefile.tre', 'rU'), schema="newick", tree_offset=100) tlist.read( path='sometrees.nexus', schema="nexus", collection_offset=2, tree_offset=100) tlist.read( data="((A,B),(C,D));((A,C),(B,D));", schema="newick") tlist.read( url="http://api.opentreeoflife.org/v2/study/pg_1144/tree/tree2324.nex", schema="nexus")
- read_from_path(src, schema, **kwargs)¶
Reads data from file specified by
filepath.- Parameters:
filepath (file or file-like) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple[integer]) – A value indicating size of data read, where “size” depends on the object:Tree: undefinedTreeList: number of treesCharacterMatrix: number of sequencesDataSet:tuple(number of taxon namespaces, number of tree lists, number of matrices)
- read_from_stream(src, schema, **kwargs)¶
Reads from file (exactly equivalent to just
read(), provided here as a separate method for completeness.- Parameters:
fileobj (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple[integer]) – A value indicating size of data read, where “size” depends on the object:Tree: undefinedTreeList: number of treesCharacterMatrix: number of sequencesDataSet:tuple(number of taxon namespaces, number of tree lists, number of matrices)
- read_from_string(src, schema, **kwargs)¶
Reads a string.
- Parameters:
src_str (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple[integer]) – A value indicating size of data read, where “size” depends on the object:Tree: undefinedTreeList: number of treesCharacterMatrix: number of sequencesDataSet:tuple(number of taxon namespaces, number of tree lists, number of matrices)
- read_from_url(src, schema, **kwargs)¶
Reads a URL source.
- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple[integer]) – A value indicating size of data read, where “size” depends on the object:Tree: undefinedTreeList: number of treesCharacterMatrix: number of sequencesDataSet:tuple(number of taxon namespaces, number of tree lists, number of matrices)
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)[source]¶
Repopulates the current taxon namespace with new taxon objects, preserving labels. Each distinct
Taxonobject associated withselfor members ofselfthat is not already inself.taxon_namespacewill be replaced with a newTaxonobject that will be created with the same label and added toself.taxon_namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitivesetting. IfFalseandunify_taxa_by_labelis alsoTrue, then the establishment of correspondence betweenTaxonobjects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxonobjects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxonobject in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue: ifunify_taxa_by_labelisTrue,Taxonobjects with labels identical except in case will be considered distinct.Note
Existing
Taxonobjects inself.taxon_namespaceare not removed. This method should thus only be called only whenself.taxon_namespacehas been changed. In fact, typical usage would not involve calling this method directly, but rather through- Parameters:
unify_taxa_by_label (boolean, optional) – If
True, then references to distinctTaxonobjects with identical labels in the current namespace will be replaced with a reference to a singleTaxonobject in the new namespace. IfFalse: references to distinctTaxonobjects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memoof deepcopy, this is a dictionary that mapsTaxonobjects in the old namespace to correspondingTaxonobjects in the new namespace. Mostly for interal use when migrating complex data to a new namespace.
- reindex_subcomponent_taxa()[source]¶
DEPRECATED: Use
reconstruct_taxon_namespaceinstead. Derived classes should override this to ensure that their various components, attributes and members all refer to the sameTaxonNamespaceobject asself.taxon_namespace, and thatself.taxon_namespacehas all theTaxonobjects in the various members.
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()instead. Rebuildstaxon_namespacefrom scratch, or assignsTaxonobjects from givenTaxonNamespaceobjecttaxon_namespacebased on label values.
- split_distribution(is_bipartitions_updated=False, default_edge_length_value=None, **kwargs)[source]¶
Return
SplitDistributioncollecting information on splits in contained trees. Keyword arguments get passed directly toSplitDistributionconstructor.
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxonobjects: these are preserved as references.
- classmethod tree_factory(*args, **kwargs)[source]¶
Creates and returns a
Treeof a type that this list understands how to manage.Deriving classes can override this to provide for custom Tree-type object lists. You can simple override the class-level variable
DEFAULT_TREE_TYPEin your derived class if the constructor signature of the alternate tree type is the same asTree. If you want to have a TreeList instance that generates custom trees (i.e., as opposed to a TreeList-ish class of instances), set thetree_typeattribute of the TreeList instance.
- update_taxon_namespace()[source]¶
All
Taxonobjects associated withselfor members ofselfthat are not inself.taxon_namespacewill be added. Note that, unlikereconstruct_taxon_namespace, no newTaxonobjects will be created.
- write(**kwargs)¶
Writes out
selfinschemaformat.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest.
- TreeList.put(\*\*kwargs)¶
Write out collection of trees to file.
- Mandatory Destimation-Specification Keyword Arguments (one and exactly one of the following required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
- Mandatory Schema-Specification Keyword Argument:
- Optional Schema-Specific Keyword Arguments:
- These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument:
- These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
The TreeArray Class¶
- class dendropy.datamodel.treecollectionmodel.TreeArray(taxon_namespace=None, is_rooted_trees=None, ignore_edge_lengths=False, ignore_node_ages=True, use_tree_weights=True, ultrametricity_precision=1e-05, is_force_max_age=None, taxon_label_age_map=None)[source]¶
High-performance collection of tree structures.
Storage of minimal tree structural information as represented by toplogy and edge lengths, minimizing memory and processing time. This class stores trees as collections of splits and edge lengths. All other information, such as labels, metadata annotations, etc. will be discarded. A full
Treeinstance can be reconstructed as needed from the structural information stored by this class, at the cost of computation time.- Parameters:
taxon_namespace (
TaxonNamespace) – The operational taxonomic unit concept namespace to manage taxon references.is_rooted_trees (bool) – If not set, then it will be set based on the rooting state of the first tree added. If
True, then trying to add an unrooted tree will result in an error. IfFalse, then trying to add a rooted tree will result in an error.ignore_edge_lengths (bool) – If
True, then edge lengths of splits will not be stored. IfFalse, then edge lengths will be stored.ignore_node_ages (bool) – If
True, then node ages of splits will not be stored. IfFalse, then node ages will be stored.use_tree_weights (bool) – If
False, then tree weights will not be used to weight splits.
- add_tree(tree, is_bipartitions_updated=False, index=None)[source]¶
Adds the structure represented by a
Treeinstance to the collection.- Parameters:
tree (
Tree) – ATreeinstance. This must have the same rooting state as all the other trees accessioned into this collection as well as that ofself.is_rooted_trees.is_bipartitions_updated (bool) – If
False[default], then the tree will have its splits encoded or updated. Otherwise, ifTrue, then the tree is assumed to have its splits already encoded and updated.index (integer) – Insert before index.
- Returns:
index (int) – The index of the accession.
s (iterable of splits) – A list of split bitmasks from
tree.e – A list of edge length values from
tree.
- add_trees(trees, is_bipartitions_updated=False)[source]¶
Adds multiple structures represneted by an iterator over or iterable of
Treeinstances to the collection.- Parameters:
trees (iterator over or iterable of
Treeinstances) – An iterator over or iterable ofTreeinstances. Thess must have the same rooting state as all the other trees accessioned into this collection as well as that ofself.is_rooted_trees.is_bipartitions_updated (bool) – If
False[default], then the tree will have its splits encoded or updated. Otherwise, ifTrue, then the tree is assumed to have its splits already encoded and updated.
- append(tree, is_bipartitions_updated=False)[source]¶
Adds a
Treeinstance to the collection before position given byindex.- Parameters:
tree (
Tree) – ATreeinstance. This must have the same rooting state as all the other trees accessioned into this collection as well as that ofself.is_rooted_trees.is_bipartitions_updated (bool) – If
False[default], then the tree will have its splits encoded or updated. Otherwise, ifTrue, then the tree is assumed to have its splits already encoded and updated.
- bipartition_encoding_frequencies()[source]¶
Returns a dictionary with keys being bipartition encodings of trees (as
frozensetcollections ofBipartitionobjects) and values the frequency of occurrence of trees represented by that encoding in the collection.
- calculate_log_product_of_split_supports(include_external_splits=False)[source]¶
Calculates the log product of split support for each of the trees in the collection.
- Parameters:
include_external_splits (bool) – If
True, then non-internal split posteriors will be included in the score. Defaults toFalse: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
s (tuple(list[numeric], integer)) – Returns a tuple, with the first element being the list of scores and the second being the index of the highest score. The element order corresponds to the trees accessioned in the collection.
- calculate_sum_of_split_supports(include_external_splits=False)[source]¶
Calculates the sum of split support for all trees in the collection.
- Parameters:
include_external_splits (bool) – If
True, then non-internal split posteriors will be included in the score. Defaults toFalse: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
s (tuple(list[numeric], integer)) – Returns a tuple, with the first element being the list of scores and the second being the index of the highest score. The element order corresponds to the trees accessioned in the collection.
- consensus_tree(min_freq=0.5, summarize_splits=True, **split_summarization_kwargs)[source]¶
Returns a consensus tree from splits in
self.- Parameters:
min_freq (real) – The minimum frequency of a split in this distribution for it to be added to the tree.
is_rooted (bool) – Should tree be rooted or not? If all trees counted for splits are explicitly rooted or unrooted, then this will default to
TrueorFalse, respectively. Otherwise it defaults toNone.**split_summarization_kwargs (keyword arguments) – These will be passed directly to the underlying
SplitDistributionSummarizerobject. SeeSplitDistributionSummarizer.configurefor options.
- Returns:
t (consensus tree)
- get_split_bitmask_and_edge_tuple(index)[source]¶
Returns a pair of tuples, ( (splits…), (lengths…) ), corresponding to the “tree” at
index.
- insert(index, tree, is_bipartitions_updated=False)[source]¶
Adds a
Treeinstance to the collection before position given byindex.- Parameters:
index (integer) – Insert before index.
tree (
Tree) – ATreeinstance. This must have the same rooting state as all the other trees accessioned into this collection as well as that ofself.is_rooted_trees.is_bipartitions_updated (bool) – If
False[default], then the tree will have its splits encoded or updated. Otherwise, ifTrue, then the tree is assumed to have its splits already encoded and updated.
- Returns:
index (int) – The index of the accession.
s (iterable of splits) – A list of split bitmasks from
tree.e – A list of edge length values
tree.
- maximum_product_of_split_support_tree(include_external_splits=False, summarize_splits=True, **split_summarization_kwargs)[source]¶
Return the tree with that maximizes the product of split supports, also known as the “Maximum Clade Credibility Tree” or MCCT.
- Parameters:
include_external_splits (bool) – If
True, then non-internal split posteriors will be included in the score. Defaults toFalse: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
mcct_tree (Tree) – Tree that maximizes the product of split supports.
- maximum_sum_of_split_support_tree(include_external_splits=False, summarize_splits=True, **split_summarization_kwargs)[source]¶
Return the tree with that maximizes the sum of split supports.
- Parameters:
include_external_splits (bool) – If
True, then non-internal split posteriors will be included in the score. Defaults toFalse: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.- Returns:
mst_tree (Tree) – Tree that maximizes the sum of split supports.
- read(**kwargs)[source]¶
Add
Treeobjects to existingTreeListfrom data source providing one or more collections of trees.Mandatory Source-Specification Keyword Argument (Exactly One Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “newick”, “nexus”, or “nexml”. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
collection_offset (int) – 0-based index of tree block or collection in source to be parsed. If not specified then the first collection (offset = 0) is assumed.
tree_offset (int) – 0-based index of first tree within the collection specified by
collection_offsetto be parsed (i.e., skipping the firsttree_offsettrees). If not specified, then the first tree (offset = 0) is assumed (i.e., no trees within the specified collection will be skipped). Use this to specify, e.g. a burn-in.ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
tree_array = dendropy.TreeArray() tree_array.read( file=open('treefile.tre', 'rU'), schema="newick", tree_offset=100) tree_array.read( path='sometrees.nexus', schema="nexus", collection_offset=2, tree_offset=100) tree_array.read( data="((A,B),(C,D));((A,C),(B,D));", schema="newick") tree_array.read( url="http://api.opentreeoflife.org/v2/study/pg_1144/tree/tree2324.nex", schema="nexus")
- read_from_files(files, schema, **kwargs)[source]¶
Adds multiple structures from one or more external file sources to the collection.
- Parameters:
files (iterable of strings and/or file objects) – A list or some other iterable of file paths or file-like objects (string elements will be assumed to be paths to files, while all other types of elements will be assumed to be file-like objects opened for reading).
schema (string) – The data format of the source. E.g., “nexus”, “newick”, “nexml”.
**kwargs (keyword arguments) – These will be passed directly to the underlying schema-specific reader implementation.
- split_bitmask_set_frequencies()[source]¶
Returns a dictionary with keys being sets of split bitmasks and values being the frequency of occurrence of trees represented by those split bitmask sets in the collection.
- topologies(sort_descending=None, frequency_attr_name='frequency', frequency_annotation_name='frequency')[source]¶
Returns a
TreeListinstance containing the reconstructed tree topologies (i.e.Treeinstances with no edge weights) in the collection, with the frequency added as an attributed.- Parameters:
sort_descending (bool) – If
True, then topologies will be sorted in descending frequency order (i.e., topologies with the highest frequencies will be listed first). IfFalse, then they will be sorted in ascending frequency. IfNone(default), then they will not be sorted.frequency_attr_name (str) – Name of attribute to add to each
Treerepresenting the frequency of that topology in the collection. IfNonethen the attribute will not be added.frequency_annotation_name (str) – Name of annotation to add to the annotations of each
Tree, representing the frequency of that topology in the collection. The value of this annotation will be dynamically-bound to the attribute specified byfrequency_attr_nameunless that isNone. Iffrequency_annotation_nameisNonethen the annotation will not be added.
The SplitDistribution Class¶
- class dendropy.datamodel.treecollectionmodel.SplitDistribution(taxon_namespace=None, ignore_edge_lengths=False, ignore_node_ages=True, use_tree_weights=True, ultrametricity_precision=1e-05, is_force_max_age=False, taxon_label_age_map=None)[source]¶
Collects information regarding splits over multiple trees.
- collapse_edges_with_less_than_minimum_support(tree, min_freq=0.5)[source]¶
Collapse edges on tree that have support less than indicated by
min_freq.
- consensus_tree(min_freq=0.5, is_rooted=None, summarize_splits=True, **split_summarization_kwargs)[source]¶
Returns a consensus tree from splits in
self.- Parameters:
min_freq (real) – The minimum frequency of a split in this distribution for it to be added to the tree.
is_rooted (bool) – Should tree be rooted or not? If all trees counted for splits are explicitly rooted or unrooted, then this will default to
TrueorFalse, respectively. Otherwise it defaults toNone.**split_summarization_kwargs (keyword arguments) – These will be passed directly to the underlying
SplitDistributionSummarizerobject. SeeSplitDistributionSummarizer.configurefor options.
- Returns:
t (consensus tree)
- count_splits_on_tree(tree, is_bipartitions_updated=False, default_edge_length_value=None)[source]¶
Counts splits in this tree and add to totals.
treemust be decorated with splits, and no attempt is made to normalize taxa.- Parameters:
- Returns:
s (iterable of splits) – A list of split bitmasks from
tree.e – A list of edge length values from
tree.a – A list of node age values from
tree.
- log_product_of_split_support_on_tree(tree, is_bipartitions_updated=False, include_external_splits=False)[source]¶
Calculates the (log) product of the support of the splits of the tree, where the support is given by the proportional frequency of the split in the current split distribution.
The tree that has the highest product of split support out of a sample of trees corresponds to the “maximum credibility tree” for that sample. This can also be referred to as the “maximum clade credibility tree”, though this latter term is sometimes use for the tree that has the highest sum of split support (see
SplitDistribution.sum_of_split_support_on_tree).- Parameters:
tree (
Tree) – The tree for which the score should be calculated.is_bipartitions_updated (bool) – If
True, then the splits are assumed to have already been encoded and will not be updated on the trees.include_external_splits (bool) – If
True, then non-internal split posteriors will be included in the score. Defaults toFalse: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.
- Returns:
s (numeric) – The log product of the support of the splits of the tree.
- normalize_bitmask(bitmask)[source]¶
“Normalizes” split, by ensuring that the least-significant bit is always 1 (used on unrooted trees to establish split identity independent of rotation).
- Parameters:
bitmask (integer) – Split bitmask hash to be normalized.
- Returns:
h (integer) – Normalized split bitmask.
- split_support_iter(tree, is_bipartitions_updated=False, include_external_splits=False, traversal_strategy='preorder', node_support_attr_name=None, edge_support_attr_name=None)[source]¶
Returns iterator over support values for the splits of a given tree, where the support value is given by the proportional frequency of the split in the current split distribution.
- Parameters:
is_bipartitions_updated (bool) – If
False[default], then the tree will have its splits encoded or updated. Otherwise, ifTrue, then the tree is assumed to have its splits already encoded and updated.include_external_splits (bool) – If
True, then non-internal split posteriors will be included. IfFalse, then these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.traversal_strategy (str) – One of: “preorder” or “postorder”. Specfies order in which splits are visited.
- Returns:
s (list of floats) – List of values for splits in the tree corresponding to the proportional frequency that the split is found in the current distribution.
- splits_considered()[source]¶
- Returns 4 values:
total number of splits counted total weighted number of unique splits counted total number of non-trivial splits counted total weighted number of unique non-trivial splits counted
- sum_of_split_support_on_tree(tree, is_bipartitions_updated=False, include_external_splits=False)[source]¶
Calculates the sum of the support of the splits of the tree, where the support is given by the proportional frequency of the split in the current distribtion.
- Parameters:
tree (
Tree) – The tree for which the score should be calculated.is_bipartitions_updated (bool) – If
True, then the splits are assumed to have already been encoded and will not be updated on the trees.include_external_splits (bool) – If
True, then non-internal split posteriors will be included in the score. Defaults toFalse: these are skipped. This should only make a difference when dealing with splits collected from trees of different leaf sets.
- Returns:
s (numeric) – The sum of the support of the splits of the tree.
- summarize_splits_on_tree(tree, is_bipartitions_updated=False, **split_summarization_kwargs)[source]¶
Summarizes support of splits/edges/node on tree.
- Parameters:
tree (
Treeinstance) – Tree to be decorated with support values.is_bipartitions_updated (bool) – If
True, then bipartitions will not be recalculated.**split_summarization_kwargs (keyword arguments) – These will be passed directly to the underlying
SplitDistributionSummarizerobject. SeeSplitDistributionSummarizer.configurefor options.
The SplitDistributionSummarizer Class¶
- class dendropy.datamodel.treecollectionmodel.SplitDistributionSummarizer(**kwargs)[source]¶
See
SplitDistributionSummarizer.configurefor configuration options.- configure(**kwargs)[source]¶
Configure rendition/mark-up.
- Parameters:
set_edge_lengths (string) –
For each edge, set the length based on:
”support”: use support values split corresponding to edge
”mean-length”: mean of edge lengths for split
”median-length”: median of edge lengths for split
”mean-age”: such that split age is equal to mean of ages
”median-age”: such that split age is equal to mean of ages
None: do not set edge lengths
add_support_as_node_attribute (bool) – Adds each node’s support value as an attribute of the node, “
support”.add_support_as_node_annotation (bool) – Adds support as a metadata annotation, “
support”. Ifadd_support_as_node_attributeisTrue, then the value will be dynamically-bound to the value of the node’s “support” attribute.set_support_as_node_label (bool) – Sets the
labelattribute of each node to the support value.add_node_age_summaries_as_node_attributes (bool) –
Summarizes the distribution of the ages of each node in the following attributes:
age_meanage_medianage_sdage_hpd95age_range
add_node_age_summaries_as_node_annotations (bool) –
Summarizes the distribution of the ages of each node in the following metadata annotations:
age_meanage_medianage_sdage_hpd95age_range
If
add_node_age_summaries_as_node_attributesisTrue, then the values will be dynamically-bound to the corresponding node attributes.add_edge_length_summaries_as_edge_attributes (bool) –
Summarizes the distribution of the lengths of each edge in the following attribtutes:
length_meanlength_medianlength_sdlength_hpd95length_range
add_edge_length_summaries_as_edge_annotations (bool) –
Summarizes the distribution of the lengths of each edge in the following metadata annotations:
length_meanlength_medianlength_sdlength_hpd95length_range
If
add_edge_length_summaries_as_edge_attributesisTrue, then the values will be dynamically-bound to the corresponding edge attributes.support_label_decimals (int) – Number of decimal places to express when rendering the support value as a string for the node label.
support_as_percentages (bool) – Whether or not to express the support value as percentages (default is probability or proportion).
minimum_edge_length (numeric) – All edge lengths calculated to have a value less than this will be set to this.
error_on_negative_edge_lengths (bool) – If
True, an inferred edge length that is less than 0 will result in a ValueError.


