dendropy.datamodel.taxonmodel: Taxonomic Namespace Reference and Management

The TaxonNamespace Class

class dendropy.datamodel.taxonmodel.TaxonNamespace(*args, **kwargs)[source]

A collection of Taxon objects representing a self-contained and complete domain of distinct operational taxonomic unit definitions. Provides the common semantic context in which operational taxonomic units referenced by various phylogenetic data objects (e.g., trees or alignments) can be related.

Parameters:
  • *args (positional arguments, optional) – Accepts a single iterable as an optional positional argument. If a TaxonNamespace object is passed as the positional argument, then clones or deep-copies of its member Taxon objects will be added to this one. If any other iterable is passed as the positional argument, then each string in the iterable will result in a new Taxon object being constructed and added to the namespace with the string as its label (name), while each Taxon object in the iterable will be added to the namespace directly.

  • **kwargs (keyword arguments) –

    labelstring

    The label or name for this namespace.

    is_mutableboolean, optional (default = True)

    If True (default), then Taxon objects can be added to this namespace. If False, then adding Taxon objects will result in an error.

    is_case_sensitiveboolean, optional (default = False)

    Whether or not taxon names are considered case sensitive or insensitive.

Notes

An empty TaxonNamespace can be created (with optional) label and Taxon objects added later:

>>> tns = dendropy.TaxonNamespace(label="taxa")
>>> t1 = Taxon("a")
>>> tns.add_taxon(t1)
>>> t2 = Taxon("b")
>>> tns.add_taxon(t2)
>>> tns.add_taxon("c")
>>> tns
<TaxonNamespace 0x106509090 'taxa': [<Taxon 0x10661f050 'a'>, <Taxon 0x10651c590 'b'>, <Taxon 0x106642a90 'c'>]>

Alternatively, an iterable can be passed in as an initializer, and all Taxon objects will be added directly while, for each string, a new Taxon object will be created and added. So, the below are all equivalent to the above:

>>> tns = dendropy.TaxonNamespace(["a", "b", "c"], label="taxa")
>>> taxa = [Taxon(n) for n in ["a", "b", "c"]]
>>> tns = dendropy.taxonnamespace(taxa, label="taxa")
>>> t1 = Taxon("a")
>>> t2 = Taxon("b")
>>> taxa = [t1, t2, "c"]
>>> tns = dendropy.TaxonNamespace(taxa, label="taxa")

If a TaxonNamespace object is passed as the initializer argument, a shallow copy of the object is constructed:

>>> tns1 = dendropy.TaxonNamespace(["a", "b", "c"], label="taxa1")
>>> tns1
<TaxonNamespace 0x1097275d0 'taxa1': [<Taxon 0x109727610 'a'>, <Taxon 0x109727e10 'b'>, <Taxon 0x109727e90 'c'>]>
>>> tns2 = dendropy.TaxonNamespace(tns1, label="2")
>>> tns2
<TaxonNamespace 0x109727d50 'taxa1': [<Taxon 0x109727610 'a'>, <Taxon 0x109727e10 'b'>, <Taxon 0x109727e90 'c'>]>

Thus, while “tns1” and “tns2” are independent collections, and addition/deletion of Taxon instances to one will not effect the other, the label of a Taxon instance that is an element in one will of course effect the same instance if it is in the other:

>>> print(tns1[0].label)
>>> a
>>> print(tns2[0].label)
>>> a
>>> tns1[0].label = "Z"
>>> print(tns1[0].label)
>>> Z
>>> print(tns2[0].label)
>>> Z

In contrast to actual data (i.e., the Taxon objects), alll metadata associated with “tns2” (i.e., the AnnotationSet object, in the TaxonNamespace.annotations attribute), will be a full, independent deep-copy.

If what is needed is a true deep-copy of the data of a particular TaxonNamespace object, including copies of the member Taxon instances, then this can be achieved using copy.deepcopy.

>>> import copy
>>> tns1 = dendropy.TaxonNamespace(["a", "b", "c"], label="taxa1")
>>> tns2 = copy.deepcopy(tns1)
__contains__(taxon)[source]

Returns True if Taxon object taxon is in self.

__getitem__(key)[source]

Returns Taxon object with index or slice given by key.

__len__()[source]

Returns number of Taxon objects in this TaxonNamespace.

accession_index(taxon)[source]

Returns the accession index of taxon. Note that this may not be the same as the list index of the taxon if taxa have been deleted from the namespace.

Parameters:

taxon (Taxon) – Taxon object for which to return the accession index.

Returns:

h (integer) – The accession index.

add_taxa(taxa)[source]

Adds multiple Taxon objects to self.

Each Taxon object in taxa that is not already in the collection of Taxon objects in this namespace is added to it. If any of the Taxon objects are already in the collection, then nothing happens. If the namespace is immutable, then TypeError is raised when trying to add Taxon objects.

Parameters:

taxa (collections.Iterable [Taxon]) – A list of Taxon objects to be accessioned or registered in this collection.

Raises:

TypeError – If this namespace is immutable (i.e. TaxonNamespace.is_mutable is False).

add_taxon(taxon)[source]

Adds a new Taxon object to self.

If taxon is not already in the collection of Taxon objects in this namespace, and this namespace is mutable, it is added to the collection. If it is already in the collection, then nothing happens. If it is not already in the collection, but the namespace is not mutable, then TypeError is raised.

Parameters:

taxon (Taxon) – The Taxon object to be accessioned or registered in this collection.

Raises:

TypeError – If this namespace is immutable (i.e. TaxonNamespace.is_mutable is False).

all_taxa_bitmask()[source]

Returns mask of all taxa.

Returns:

h (integer) – Bitmask spanning all Taxon objects in self.

append(taxon)[source]

LEGACY. Use ‘add_taxon()’ instead.

bitmask_as_newick_string(bitmask, preserve_spaces=False, quote_underscores=True)[source]

Represents a split as a newick string.

Parameters:
  • bitmask (integer) – Split hash bitmask value.

  • preserve_spaces (boolean, optional) – If False (default), then spaces in taxon labels will be replaced by underscores. If True, then taxon labels with spaces will be wrapped in quotes.

  • quote_underscores (boolean, optional) – If True (default), then taxon labels with underscores will be wrapped in quotes. If False, then the labels will not be wrapped in quotes.

Returns:

s (string) – NEWICK representation of split specified by bitmask.

bitmask_taxa_list(bitmask, index=0)[source]

Returns list of Taxon objects represented by split bitmask.

Parameters:
  • bitmask (integer) – Split hash bitmask value.

  • index (integer, optional) – Start from this Taxon object instead of the first Taxon object in the collection.

Returns:

taxa (list [Taxon]) – List of Taxon objects specified or spanned by bitmask.

clear()[source]

Removes all Taxon objects from this namespace.

description(depth=1, indent=0, itemize='', output=None, **kwargs)[source]

Returns description of object, up to level depth.

discard_taxon_label(label, is_case_sensitive=None, first_match_only=False)[source]

Removes all Taxon objects with label matching label from the collection in this namespace.

Parameters:
  • label (string or string-like) – The value of the Taxon object label to remove.

  • is_case_sensitive (None or bool) – By default, label lookup will use the is_case_sensitive attribute of self to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented by Taxon instances. This can be over-ridden by specifying is_case_sensitive to True (forcing case-sensitivity) or False (forcing case-insensitivity).

  • first_match_only (bool) – If False, then the entire namespace will be searched and all Taxon objects with the matching labels will be remove. If True then only the first Taxon object with a matching label will be removed (i.e., the entire namespace is not searched). Setting this argument to True will be more efficient and should be preferred if there are no redundant or duplicate labels.

See also

TaxonNamespace.remove_taxon_label

Similar, but raises an error if no matching Taxon objects are found.

findall(label, is_case_sensitive=None)[source]

Return list of Taxon object(s) with label matching label.

Parameters:
  • label (string or string-like) – The value which the label attribute of the Taxon object(s) to be returned must match.

  • is_case_sensitive (None or bool) – By default, label lookup will use the is_case_sensitive attribute of self to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented by Taxon instances. This can be over-ridden by specifying is_case_sensitive to True (forcing case-sensitivity) or False (forcing case-insensitivity).

Returns:

taxa (list [Taxon]) – A list containing zero or more Taxon objects with labels matching label.

get_taxa(labels, is_case_sensitive=None, first_match_only=False)[source]

Retrieves list of Taxon objects with given labels.

Parameters:
  • labels (collections.Iterable [string]) – Any Taxon object in this namespace collection that has a label attribute that matches any value in labels will be included in the list returned.

  • is_case_sensitive (None or bool) – By default, label lookup will use the is_case_sensitive attribute of self to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented by Taxon instances. This can be over-ridden by specifying is_case_sensitive to True (forcing case-sensitivity) or False (forcing case-insensitivity).

  • first_match_only (bool) – If False, then for each label in labels, the entire namespace will be searched and all Taxon objects with the matches will be added to the lest. If True then, for each label in labels, only the first Taxon object with a matching label will be added to the list (i.e., the entire namespace is not searched). Setting this argument to True will be more efficient and should be preferred if there are no redundant or duplicate labels.

Returns:

taxa (list [Taxon]) – A list containing zero or more Taxon objects with labels matching label.

get_taxa_bitmask(**kwargs)[source]

LEGACY. Use ‘taxa_bitmask’ instead.

get_taxon(label, is_case_sensitive=None)[source]

Retrieves a Taxon object with the given label.

If multiple Taxon objects exist with labels that match label, then only the first one is returned. If no Taxon object is found in this namespace with the specified critieria, None is returned.

Parameters:
  • label (string or string-like) – The value which the label attribute of the Taxon object to be returned must match.

  • is_case_sensitive (None or bool) – By default, label lookup will use the is_case_sensitive attribute of self to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented by Taxon instances. This can be over-ridden by specifying is_case_sensitive to True (forcing case-sensitivity) or False (forcing case-insensitivity).

Returns:

taxon (|Taxon| object or |None|) – The first Taxon object in this namespace collection with a label matching label, or None if no such Taxon object exists.

has_taxa_labels(labels, is_case_sensitive=None)[source]

Checks for presence of Taxon objects with the given labels.

Parameters:
  • labels (collections.Iterable [string]) – The values of the Taxon object labels to match.

  • is_case_sensitive (None or bool) – By default, label lookup will use the is_case_sensitive attribute of self to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented by Taxon instances. This can be over-ridden by specifying is_case_sensitive to True (forcing case-sensitivity) or False (forcing case-insensitivity).

Returns:

b (boolean) – Returns True if, for every element in the iterable labels, there is at least one Taxon object that has a label attribute that matches this. False otherwise.

has_taxon_label(label, is_case_sensitive=None)[source]

Checks for presence of a Taxon object with the given label.

Parameters:
  • label (string or string-like) – The value of the Taxon object label to match.

  • is_case_sensitive (None or bool) – By default, label lookup will use the is_case_sensitive attribute of self to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented by Taxon instances. This can be over-ridden by specifying is_case_sensitive to True (forcing case-sensitivity) or False (forcing case-insensitivity).

Returns:

b (boolean) – True if there is at least one Taxon object in this namespace with a label matching the value of label. Otherwise, False.

label_taxon_map(is_case_sensitive=None)[source]

Returns dictionary with taxon labels as keys and corresponding Taxon objects as values.

If the TaxonNamespace is currently case-insensitive, then the dictionary returned will have case-insensitive keys, other the dictionary will be case-sensitive. You can override this by explicitly specifying is_case_sensitive to False or True.

No attempt is made to handle collisions.

Returns:

d (dictonary-like) – Dictionary with Taxon.label values of Taxon objects in self as keys and corresponding Taxon objects as values.

labels()[source]

Returns list of labels of all Taxon objects in self.

Returns:

labels (list [string]) – List of Taxon.label values of Taxon objects in self.

new_taxa(labels)[source]

Creates and add a new Taxon with corresponding label for each label in labels. Returns list of Taxon objects created.

Parameters:

labels (collections.Iterable [string]) – The values of the label attributes of the new Taxon objects to be created, added to this namespace collection, and returned.

Returns:

taxa (collections.Iterable [Taxon]) – A list of Taxon objects created and added.

Raises:

TypeError – If this namespace is immutable (i.e. TaxonNamespace.is_mutable is False).

new_taxon(label)[source]

Creates, adds, and returns a new Taxon object with corresponding label.

Parameters:

label (string or string-like) – The name or label of the new operational taxonomic unit concept.

Returns:

taxon (|Taxon|) – The new Taxon object,

remove_taxon(taxon)[source]

Removes specified Taxon object from the collection in this namespace.

Parameters:

taxon (a Taxon object) – The Taxon object to be removed.

Raises:

ValueError – If taxon is not in the collection of this namespace.

remove_taxon_label(label, is_case_sensitive=None, first_match_only=False)[source]

Removes all Taxon objects with label matching label from the collection in this namespace.

Parameters:
  • label (string or string-like) – The value of the Taxon object label to remove.

  • is_case_sensitive (None or bool) – By default, label lookup will use the is_case_sensitive attribute of self to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented by Taxon instances. This can be over-ridden by specifying is_case_sensitive to True (forcing case-sensitivity) or False (forcing case-insensitivity).

  • first_match_only (bool) – If False, then the entire namespace will be searched and all Taxon objects with the matching labels will be remove. If True then only the first Taxon object with a matching label will be removed (i.e., the entire namespace is not searched). Setting this argument to True will be more efficient and should be preferred if there are no redundant or duplicate labels.

Raises:

LookupError – If no Taxon objects are found with matching label(s).

See also

TaxonNamespace.discard_taxon_labels

Similar, but does not raise an error if no matching Taxon objects are found.

require_taxon(label, is_case_sensitive=None)[source]

Retrieves a Taxon object with the given label, creating it if necessary.

Retrieves a Taxon object with the label, label. If multiple Taxon objects exist with labels that match label, then only the first one is returned. If no such Taxon object exists in the current namespace and the TaxonNamespace is NOT mutable, an exception is raised. If no such Taxon object exists in the current namespace and TaxonNamespace is mutable, then a new Taxon is created, added, and returned.

Parameters:
  • label (string or string-like) – The value which the label attribute of the Taxon object to be returned must match.

  • is_case_sensitive (None or bool) – By default, label lookup will use the is_case_sensitive attribute of self to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented by Taxon instances. This can be over-ridden by specifying is_case_sensitive to True (forcing case-sensitivity) or False (forcing case-insensitivity).

Returns:

taxon (|Taxon| object or |None|) – A Taxon object in this namespace collection with a label matching label.

Raises:

TypeError – If no Taxon object is currently in the collection with a label matching the input label and the is_mutable attribute of self is False.

reverse()[source]

Reverses order of Taxon objects in collection.

sort(key=None, reverse=False)[source]

Sorts Taxon objects in collection. If key is not given, defaults to sorting by label (i.e., key = lambda x: x.label).

Parameters:
  • key (key function object, optional) – Function that takes a Taxon object as an argument and returns the value that determines its sort order. Defaults to sorting by label.

  • reverse (boolean, optional) – If True, sort will be in reverse order.

split_as_newick_string(split, preserve_spaces=False, quote_underscores=True)[source]

Represents a split as a newick string.

Parameters:
  • bitmask (integer) – Split hash bitmask value.

  • preserve_spaces (boolean, optional) – If False (default), then spaces in taxon labels will be replaced by underscores. If True, then taxon labels with spaces will be wrapped in quotes.

  • quote_underscores (boolean, optional) – If True (default), then taxon labels with underscores will be wrapped in quotes. If False, then the labels will not be wrapped in quotes.

Returns:

s (string) – NEWICK representation of split specified by bitmask.

taxa_bipartition(**kwargs)[source]

Returns a bipartition that represents all taxa specified by keyword-specified list of taxon objects (taxa=) or labels (labels=).

Parameters:

**kwargs (keyword arguments) –

Requires one of:

taxacollections.Iterable [Taxon]

Iterable of Taxon objects.

labelscollections.Iterable [string]

Iterable of Taxon label values.

Returns:

b (list [integer]) – List of split hash bitmask values for specified Taxon objects.

taxa_bitmask(**kwargs)[source]

Retrieves the list of split hash bitmask values representing all taxa specified by keyword-specified list of taxon objects (taxa=) or labels (labels=).

Parameters:

**kwargs (keyword arguments) –

Requires one of:

taxacollections.Iterable [Taxon]

Iterable of Taxon objects.

labelscollections.Iterable [string]

Iterable of Taxon label values.

Returns:

b (list [integer]) – List of split hash bitmask values for specified Taxon objects.

taxon_bitmask(taxon)[source]

Returns bitmask value of split hash for split subtending node with taxon.

Parameters:

taxon (Taxon) – Taxon object for which to calculate split hash bitmask.

Returns:

h (integer) – Split hash bitmask value for node associated with Taxon object taxon.

taxon_namespace_scoped_copy(memo=None)[source]

Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for TaxonNamespace and Taxon objects: these are preserved as references.

The Taxon Class

class dendropy.datamodel.taxonmodel.Taxon(label=None)[source]

A taxon associated with a sequence or a node on a tree.

Parameters:

label (string or Taxon object) – Label or name of this operational taxonomic unit concept. If a string, then the label attribute of self is set to this value. If a Taxon object, then the label attribute of self is set to the same value as the label attribute the other Taxon object and all annotations/metadata are copied.

__str__()[source]

String representation of self = taxon name.

description(depth=1, indent=0, itemize='', output=None, **kwargs)[source]

Returns description of object, up to level depth.

taxon_namespace_scoped_copy(memo=None)[source]

Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for TaxonNamespace and Taxon objects: these are preserved as references.

The TaxonNamespaceAssociated Class

class dendropy.datamodel.taxonmodel.TaxonNamespaceAssociated(taxon_namespace=None)[source]

Provides infrastructure for the maintenance of references to taxa.

migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)[source]

Move this object and all members to a new operational taxonomic unit concept namespace scope.

Current self.taxon_namespace value will be replaced with value given in taxon_namespace if this is not None, or a new TaxonNamespace object. Following this, reconstruct_taxon_namespace() will be called: each distinct Taxon object associated with self or members of self that is not alread in taxon_namespace will be replaced with a new Taxon object that will be created with the same label and added to self.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.

Label mapping case sensitivity follows the self.taxon_namespace.is_case_sensitive setting. If False and unify_taxa_by_label is also True, then the establishment of correspondence between Taxon objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are four Taxon objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single new Taxon object in the new namespace (with a label some existing casing variant of ‘foo’). If True: if unify_taxa_by_label is True, Taxon objects with labels identical except in case will be considered distinct.

Parameters:
  • taxon_namespace (TaxonNamespace) – The TaxonNamespace into the scope of which this object will be moved.

  • unify_taxa_by_label (boolean, optional) – If True, then references to distinct Taxon objects with identical labels in the current namespace will be replaced with a reference to a single Taxon object in the new namespace. If False: references to distinct Taxon objects will remain distinct, even if the labels are the same.

  • taxon_mapping_memo (dictionary) – Similar to memo of deepcopy, this is a dictionary that maps Taxon objects in the old namespace to corresponding Taxon objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if a Taxon object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.

Examples

Use this method to move an object from one taxon namespace to another.

For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:

# Get handle to the new TaxonNamespace
other_taxon_namespace = some_other_data.taxon_namespace

# Get a taxon-namespace scoped copy of a tree
# in another namespace
t2 = Tree(t1)

# Replace taxon namespace of copy
t2.migrate_taxon_namespace(other_taxon_namespace)

You can also use this method to get a copy of a structure and then move it to a new namespace:

t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())

# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)

poll_taxa(taxa=None)[source]

Returns a set populated with all of Taxon instances associated with self.

Parameters:

taxa (set()) – Set to populate. If not specified, a new one will be created.

Returns:

taxa (set[|Taxon|]) – Set of taxa associated with self.

purge_taxon_namespace()[source]

Remove all Taxon instances in self.taxon_namespace that are not associated with self or any item in self.

reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)[source]

Repopulates the current taxon namespace with new taxon objects, preserving labels. Each distinct Taxon object associated with self or members of self that is not already in self.taxon_namespace will be replaced with a new Taxon object that will be created with the same label and added to self.taxon_namespace.

Label mapping case sensitivity follows the self.taxon_namespace.is_case_sensitive setting. If False and unify_taxa_by_label is also True, then the establishment of correspondence between Taxon objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are four Taxon objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single new Taxon object in the new namespace (with a label some existing casing variant of ‘foo’). If True: if unify_taxa_by_label is True, Taxon objects with labels identical except in case will be considered distinct.

Note

Existing Taxon objects in self.taxon_namespace are not removed. This method should thus only be called only when self.taxon_namespace has been changed. In fact, typical usage would not involve calling this method directly, but rather through

Parameters:
  • unify_taxa_by_label (boolean, optional) – If True, then references to distinct Taxon objects with identical labels in the current namespace will be replaced with a reference to a single Taxon object in the new namespace. If False: references to distinct Taxon objects will remain distinct, even if the labels are the same.

  • taxon_mapping_memo (dictionary) – Similar to memo of deepcopy, this is a dictionary that maps Taxon objects in the old namespace to corresponding Taxon objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace.

reindex_subcomponent_taxa()[source]

DEPRECATED: Use reconstruct_taxon_namespace instead. Derived classes should override this to ensure that their various components, attributes and members all refer to the same TaxonNamespace object as self.taxon_namespace, and that self.taxon_namespace has all the Taxon objects in the various members.

reindex_taxa(taxon_namespace=None, clear=False)[source]

DEPRECATED: Use migrate_taxon_namespace() instead. Rebuilds taxon_namespace from scratch, or assigns Taxon objects from given TaxonNamespace object taxon_namespace based on label values.

update_taxon_namespace()[source]

All Taxon objects associated with self or members of self that are not in self.taxon_namespace will be added. Note that, unlike reconstruct_taxon_namespace, no new Taxon objects will be created.