dendropy.datamodel.taxonmodel
: Taxonomic Namespace Reference and Management¶
The TaxonNamespace
Class¶
- class dendropy.datamodel.taxonmodel.TaxonNamespace(*args, **kwargs)[source]¶
A collection of
Taxon
objects representing a self-contained and complete domain of distinct operational taxonomic unit definitions. Provides the common semantic context in which operational taxonomic units referenced by various phylogenetic data objects (e.g., trees or alignments) can be related.- Parameters:
*args (positional arguments, optional) – Accepts a single iterable as an optional positional argument. If a
TaxonNamespace
object is passed as the positional argument, then clones or deep-copies of its memberTaxon
objects will be added to this one. If any other iterable is passed as the positional argument, then each string in the iterable will result in a newTaxon
object being constructed and added to the namespace with the string as its label (name), while each Taxon object in the iterable will be added to the namespace directly.**kwargs (keyword arguments) –
- labelstring
The label or name for this namespace.
- is_mutableboolean, optional (default =
True
) If
True
(default), thenTaxon
objects can be added to this namespace. IfFalse
, then addingTaxon
objects will result in an error.- is_case_sensitiveboolean, optional (default =
False
) Whether or not taxon names are considered case sensitive or insensitive.
Notes
An empty
TaxonNamespace
can be created (with optional) label andTaxon
objects added later:>>> tns = dendropy.TaxonNamespace(label="taxa") >>> t1 = Taxon("a") >>> tns.add_taxon(t1) >>> t2 = Taxon("b") >>> tns.add_taxon(t2) >>> tns.add_taxon("c") >>> tns <TaxonNamespace 0x106509090 'taxa': [<Taxon 0x10661f050 'a'>, <Taxon 0x10651c590 'b'>, <Taxon 0x106642a90 'c'>]>
Alternatively, an iterable can be passed in as an initializer, and all
Taxon
objects will be added directly while, for each string, a newTaxon
object will be created and added. So, the below are all equivalent to the above:>>> tns = dendropy.TaxonNamespace(["a", "b", "c"], label="taxa")
>>> taxa = [Taxon(n) for n in ["a", "b", "c"]] >>> tns = dendropy.taxonnamespace(taxa, label="taxa")
>>> t1 = Taxon("a") >>> t2 = Taxon("b") >>> taxa = [t1, t2, "c"] >>> tns = dendropy.TaxonNamespace(taxa, label="taxa")
If a
TaxonNamespace
object is passed as the initializer argument, a shallow copy of the object is constructed:>>> tns1 = dendropy.TaxonNamespace(["a", "b", "c"], label="taxa1") >>> tns1 <TaxonNamespace 0x1097275d0 'taxa1': [<Taxon 0x109727610 'a'>, <Taxon 0x109727e10 'b'>, <Taxon 0x109727e90 'c'>]> >>> tns2 = dendropy.TaxonNamespace(tns1, label="2") >>> tns2 <TaxonNamespace 0x109727d50 'taxa1': [<Taxon 0x109727610 'a'>, <Taxon 0x109727e10 'b'>, <Taxon 0x109727e90 'c'>]>
Thus, while “
tns1
” and “tns2
” are independent collections, and addition/deletion ofTaxon
instances to one will not effect the other, the label of aTaxon
instance that is an element in one will of course effect the same instance if it is in the other:>>> print(tns1[0].label) >>> a >>> print(tns2[0].label) >>> a >>> tns1[0].label = "Z" >>> print(tns1[0].label) >>> Z >>> print(tns2[0].label) >>> Z
In contrast to actual data (i.e., the
Taxon
objects), alll metadata associated with “tns2
” (i.e., theAnnotationSet
object, in theTaxonNamespace.annotations
attribute), will be a full, independent deep-copy.If what is needed is a true deep-copy of the data of a particular
TaxonNamespace
object, including copies of the memberTaxon
instances, then this can be achieved usingcopy.deepcopy
.>>> import copy >>> tns1 = dendropy.TaxonNamespace(["a", "b", "c"], label="taxa1") >>> tns2 = copy.deepcopy(tns1)
- __len__()[source]¶
Returns number of
Taxon
objects in thisTaxonNamespace
.
- accession_index(taxon)[source]¶
Returns the accession index of
taxon
. Note that this may not be the same as the list index of the taxon if taxa have been deleted from the namespace.
- add_taxa(taxa)[source]¶
Adds multiple
Taxon
objects to self.Each
Taxon
object intaxa
that is not already in the collection ofTaxon
objects in this namespace is added to it. If any of theTaxon
objects are already in the collection, then nothing happens. If the namespace is immutable, then TypeError is raised when trying to addTaxon
objects.
- add_taxon(taxon)[source]¶
Adds a new
Taxon
object toself
.If
taxon
is not already in the collection ofTaxon
objects in this namespace, and this namespace is mutable, it is added to the collection. If it is already in the collection, then nothing happens. If it is not already in the collection, but the namespace is not mutable, then TypeError is raised.
- all_taxa_bitmask()[source]¶
Returns mask of all taxa.
- Returns:
h (integer) – Bitmask spanning all
Taxon
objects in self.
- bitmask_as_newick_string(bitmask, preserve_spaces=False, quote_underscores=True)[source]¶
Represents a split as a newick string.
- Parameters:
bitmask (integer) – Split hash bitmask value.
preserve_spaces (boolean, optional) – If
False
(default), then spaces in taxon labels will be replaced by underscores. IfTrue
, then taxon labels with spaces will be wrapped in quotes.quote_underscores (boolean, optional) – If
True
(default), then taxon labels with underscores will be wrapped in quotes. IfFalse
, then the labels will not be wrapped in quotes.
- Returns:
s (string) – NEWICK representation of split specified by
bitmask
.
- bitmask_taxa_list(bitmask, index=0)[source]¶
Returns list of
Taxon
objects represented by splitbitmask
.
- description(depth=1, indent=0, itemize='', output=None, **kwargs)[source]¶
Returns description of object, up to level
depth
.
- discard_taxon_label(label, is_case_sensitive=None, first_match_only=False)[source]¶
Removes all
Taxon
objects with label matchinglabel
from the collection in this namespace.- Parameters:
label (string or string-like) – The value of the
Taxon
object label to remove.is_case_sensitive (
None
or bool) – By default, label lookup will use theis_case_sensitive
attribute ofself
to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented byTaxon
instances. This can be over-ridden by specifyingis_case_sensitive
toTrue
(forcing case-sensitivity) orFalse
(forcing case-insensitivity).first_match_only (bool) – If
False
, then the entire namespace will be searched and allTaxon
objects with the matching labels will be remove. IfTrue
then only the firstTaxon
object with a matching label will be removed (i.e., the entire namespace is not searched). Setting this argument toTrue
will be more efficient and should be preferred if there are no redundant or duplicate labels.
See also
TaxonNamespace.remove_taxon_label
Similar, but raises an error if no matching
Taxon
objects are found.
- findall(label, is_case_sensitive=None)[source]¶
Return list of
Taxon
object(s) with label matchinglabel
.- Parameters:
label (string or string-like) – The value which the
label
attribute of theTaxon
object(s) to be returned must match.is_case_sensitive (
None
or bool) – By default, label lookup will use theis_case_sensitive
attribute ofself
to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented byTaxon
instances. This can be over-ridden by specifyingis_case_sensitive
toTrue
(forcing case-sensitivity) orFalse
(forcing case-insensitivity).
- Returns:
taxa (
list
[Taxon
]) – A list containing zero or moreTaxon
objects with labels matchinglabel
.
- get_taxa(labels, is_case_sensitive=None, first_match_only=False)[source]¶
Retrieves list of
Taxon
objects with given labels.- Parameters:
labels (
collections.Iterable
[string]) – AnyTaxon
object in this namespace collection that has a label attribute that matches any value inlabels
will be included in the list returned.is_case_sensitive (
None
or bool) – By default, label lookup will use theis_case_sensitive
attribute ofself
to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented byTaxon
instances. This can be over-ridden by specifyingis_case_sensitive
toTrue
(forcing case-sensitivity) orFalse
(forcing case-insensitivity).first_match_only (bool) – If
False
, then for each label inlabels
, the entire namespace will be searched and allTaxon
objects with the matches will be added to the lest. IfTrue
then, for each label inlabels
, only the firstTaxon
object with a matching label will be added to the list (i.e., the entire namespace is not searched). Setting this argument toTrue
will be more efficient and should be preferred if there are no redundant or duplicate labels.
- Returns:
taxa (
list
[Taxon
]) – A list containing zero or moreTaxon
objects with labels matchinglabel
.
- get_taxon(label, is_case_sensitive=None)[source]¶
Retrieves a
Taxon
object with the given label.If multiple
Taxon
objects exist with labels that matchlabel
, then only the first one is returned. If noTaxon
object is found in this namespace with the specified critieria,None
is returned.- Parameters:
label (string or string-like) – The value which the
label
attribute of theTaxon
object to be returned must match.is_case_sensitive (
None
or bool) – By default, label lookup will use theis_case_sensitive
attribute ofself
to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented byTaxon
instances. This can be over-ridden by specifyingis_case_sensitive
toTrue
(forcing case-sensitivity) orFalse
(forcing case-insensitivity).
- Returns:
taxon (|Taxon| object or |None|) – The first
Taxon
object in this namespace collection with a label matchinglabel
, orNone
if no suchTaxon
object exists.
- has_taxa_labels(labels, is_case_sensitive=None)[source]¶
Checks for presence of
Taxon
objects with the given labels.- Parameters:
labels (
collections.Iterable
[string]) – The values of theTaxon
object labels to match.is_case_sensitive (
None
or bool) – By default, label lookup will use theis_case_sensitive
attribute ofself
to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented byTaxon
instances. This can be over-ridden by specifyingis_case_sensitive
toTrue
(forcing case-sensitivity) orFalse
(forcing case-insensitivity).
- Returns:
b (boolean) – Returns
True
if, for every element in the iterablelabels
, there is at least oneTaxon
object that has a label attribute that matches this.False
otherwise.
- has_taxon_label(label, is_case_sensitive=None)[source]¶
Checks for presence of a
Taxon
object with the given label.- Parameters:
label (string or string-like) – The value of the
Taxon
object label to match.is_case_sensitive (
None
or bool) – By default, label lookup will use theis_case_sensitive
attribute ofself
to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented byTaxon
instances. This can be over-ridden by specifyingis_case_sensitive
toTrue
(forcing case-sensitivity) orFalse
(forcing case-insensitivity).
- Returns:
b (boolean) –
True
if there is at least oneTaxon
object in this namespace with a label matching the value oflabel
. Otherwise,False
.
- label_taxon_map(is_case_sensitive=None)[source]¶
Returns dictionary with taxon labels as keys and corresponding
Taxon
objects as values.If the
TaxonNamespace
is currently case-insensitive, then the dictionary returned will have case-insensitive keys, other the dictionary will be case-sensitive. You can override this by explicitly specifyingis_case_sensitive
toFalse
orTrue
.No attempt is made to handle collisions.
- labels()[source]¶
Returns list of labels of all
Taxon
objects inself
.- Returns:
labels (
list
[string]) – List ofTaxon.label
values ofTaxon
objects inself
.
- new_taxa(labels)[source]¶
Creates and add a new
Taxon
with corresponding label for each label inlabels
. Returns list ofTaxon
objects created.- Parameters:
labels (
collections.Iterable
[string]) – The values of thelabel
attributes of the newTaxon
objects to be created, added to this namespace collection, and returned.- Returns:
taxa (
collections.Iterable
[Taxon
]) – A list ofTaxon
objects created and added.- Raises:
TypeError – If this namespace is immutable (i.e.
TaxonNamespace.is_mutable
isFalse
).
- new_taxon(label)[source]¶
Creates, adds, and returns a new
Taxon
object with corresponding label.- Parameters:
label (string or string-like) – The name or label of the new operational taxonomic unit concept.
- Returns:
taxon (|Taxon|) – The new
Taxon
object,
- remove_taxon(taxon)[source]¶
Removes specified
Taxon
object from the collection in this namespace.- Parameters:
- Raises:
ValueError – If
taxon
is not in the collection of this namespace.
- remove_taxon_label(label, is_case_sensitive=None, first_match_only=False)[source]¶
Removes all
Taxon
objects with label matchinglabel
from the collection in this namespace.- Parameters:
label (string or string-like) – The value of the
Taxon
object label to remove.is_case_sensitive (
None
or bool) – By default, label lookup will use theis_case_sensitive
attribute ofself
to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented byTaxon
instances. This can be over-ridden by specifyingis_case_sensitive
toTrue
(forcing case-sensitivity) orFalse
(forcing case-insensitivity).first_match_only (bool) – If
False
, then the entire namespace will be searched and allTaxon
objects with the matching labels will be remove. IfTrue
then only the firstTaxon
object with a matching label will be removed (i.e., the entire namespace is not searched). Setting this argument toTrue
will be more efficient and should be preferred if there are no redundant or duplicate labels.
- Raises:
LookupError – If no
Taxon
objects are found with matching label(s).
See also
TaxonNamespace.discard_taxon_labels
Similar, but does not raise an error if no matching
Taxon
objects are found.
- require_taxon(label, is_case_sensitive=None)[source]¶
Retrieves a
Taxon
object with the given label, creating it if necessary.Retrieves a Taxon object with the label,
label
. If multipleTaxon
objects exist with labels that matchlabel
, then only the first one is returned. If no suchTaxon
object exists in the current namespace and theTaxonNamespace
is NOT mutable, an exception is raised. If no suchTaxon
object exists in the current namespace andTaxonNamespace
is mutable, then a newTaxon
is created, added, and returned.- Parameters:
label (string or string-like) – The value which the
label
attribute of theTaxon
object to be returned must match.is_case_sensitive (
None
or bool) – By default, label lookup will use theis_case_sensitive
attribute ofself
to decide whether or not to respect case when trying to match labels to operational taxonomic unit names represented byTaxon
instances. This can be over-ridden by specifyingis_case_sensitive
toTrue
(forcing case-sensitivity) orFalse
(forcing case-insensitivity).
- Returns:
taxon (|Taxon| object or |None|) – A
Taxon
object in this namespace collection with a label matchinglabel
.- Raises:
TypeError – If no
Taxon
object is currently in the collection with a label matching the inputlabel
and theis_mutable
attribute of self isFalse
.
- sort(key=None, reverse=False)[source]¶
Sorts
Taxon
objects in collection. Ifkey
is not given, defaults to sorting by label (i.e.,key = lambda x: x.label
).- Parameters:
key (key function object, optional) – Function that takes a
Taxon
object as an argument and returns the value that determines its sort order. Defaults to sorting by label.reverse (boolean, optional) – If
True
, sort will be in reverse order.
- split_as_newick_string(split, preserve_spaces=False, quote_underscores=True)[source]¶
Represents a split as a newick string.
- Parameters:
bitmask (integer) – Split hash bitmask value.
preserve_spaces (boolean, optional) – If
False
(default), then spaces in taxon labels will be replaced by underscores. IfTrue
, then taxon labels with spaces will be wrapped in quotes.quote_underscores (boolean, optional) – If
True
(default), then taxon labels with underscores will be wrapped in quotes. IfFalse
, then the labels will not be wrapped in quotes.
- Returns:
s (string) – NEWICK representation of split specified by
bitmask
.
- taxa_bipartition(**kwargs)[source]¶
Returns a bipartition that represents all taxa specified by keyword-specified list of taxon objects (
taxa=
) or labels (labels=
).- Parameters:
**kwargs (keyword arguments) –
Requires one of:
- Returns:
b (
list
[integer]) – List of split hash bitmask values for specifiedTaxon
objects.
- taxa_bitmask(**kwargs)[source]¶
Retrieves the list of split hash bitmask values representing all taxa specified by keyword-specified list of taxon objects (
taxa=
) or labels (labels=
).- Parameters:
**kwargs (keyword arguments) –
Requires one of:
- Returns:
b (
list
[integer]) – List of split hash bitmask values for specifiedTaxon
objects.
- taxon_bitmask(taxon)[source]¶
Returns bitmask value of split hash for split subtending node with
taxon
.
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
The Taxon
Class¶
- class dendropy.datamodel.taxonmodel.Taxon(label=None)[source]¶
A taxon associated with a sequence or a node on a tree.
- Parameters:
label (string or
Taxon
object) – Label or name of this operational taxonomic unit concept. If a string, then thelabel
attribute ofself
is set to this value. If aTaxon
object, then thelabel
attribute ofself
is set to the same value as thelabel
attribute the otherTaxon
object and all annotations/metadata are copied.
- description(depth=1, indent=0, itemize='', output=None, **kwargs)[source]¶
Returns description of object, up to level
depth
.
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
The TaxonNamespaceAssociated
Class¶
- class dendropy.datamodel.taxonmodel.TaxonNamespaceAssociated(taxon_namespace=None)[source]¶
Provides infrastructure for the maintenance of references to taxa.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)[source]¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespace
value will be replaced with value given intaxon_namespace
if this is notNone
, or a newTaxonNamespace
object. Following this,reconstruct_taxon_namespace()
will be called: each distinctTaxon
object associated withself
or members ofself
that is not alread intaxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
into the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxon
object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- poll_taxa(taxa=None)[source]¶
Returns a set populated with all of
Taxon
instances associated withself
.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self
.
- purge_taxon_namespace()[source]¶
Remove all
Taxon
instances inself.taxon_namespace
that are not associated withself
or any item inself
.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)[source]¶
Repopulates the current taxon namespace with new taxon objects, preserving labels. Each distinct
Taxon
object associated withself
or members ofself
that is not already inself.taxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.Note
Existing
Taxon
objects inself.taxon_namespace
are not removed. This method should thus only be called only whenself.taxon_namespace
has been changed. In fact, typical usage would not involve calling this method directly, but rather through- Parameters:
unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace.
- reindex_subcomponent_taxa()[source]¶
DEPRECATED: Use
reconstruct_taxon_namespace
instead. Derived classes should override this to ensure that their various components, attributes and members all refer to the sameTaxonNamespace
object asself.taxon_namespace
, and thatself.taxon_namespace
has all theTaxon
objects in the various members.
- reindex_taxa(taxon_namespace=None, clear=False)[source]¶
DEPRECATED: Use
migrate_taxon_namespace()
instead. Rebuildstaxon_namespace
from scratch, or assignsTaxon
objects from givenTaxonNamespace
objecttaxon_namespace
based on label values.