dendropy.datamodel.charmatrixmodel
: Character Sequences and Matrices¶
Character Sequences¶
- class dendropy.datamodel.charmatrixmodel.CharacterDataSequence(character_values=None, character_types=None, character_annotations=None)[source]¶
A sequence of character values or values for a particular taxon or entry in a data matrix.
Objects of this class can be (almost) treated as simple lists, where the elements are the values of characters (typically, real values in the case of continuous data, and special instances of
StateIdentity
objects in the case of discrete data.Character type data (represented by
CharacterType
instances) and metadata annotations (represented byAnnotationSet
instances), if any, are maintained in a parallel list that need to be accessed separately using the index of the value to which the data correspond. So, for example, theAnnotationSet
object containing the metadata annotations for the first value in a sequence,s[0]
, is available throughs.annotations_at(0)
, while the character type information for that first element is available throughs.character_type_at(0)
and can be set throughs.set_character_type_at(0, c)
.In most cases where metadata annotations and character type information are not needed, treating objects of this class as a simple list provides all the functionality needed. Where metadata annotations or character type information are required, all the standard list mutation methods (e.g.,
CharacterDataSequence.insert
,CharacterDataSequence.append
,CharacterDataSequence.extend
) also take optionalcharacter_type
andcharacter_annotations
argument in addition to the primarycharacter_value
argument, thus allowing for setting of the value, character type, and annotation set simultaneously. While iteration over character values are available through the standard list iteration interface, the methodCharacterDataSequence.cell_iter()
provides for iterating over<character-value, character-type, character-annotation-set>
triplets.- Parameters:
character_values (iterable of values) – A set of values for this sequence.
- annotations_at(idx)[source]¶
Return metadata annotations of character at
idx
.- Parameters:
idx (integer) – Index of element annotations to return.
- Returns:
c (|AnnotationSet|) –
AnnotationSet
representing metadata annotations of character at indexidx
.
- append(character_value, character_type=None, character_annotations=None)[source]¶
Adds a value to
self
.- Parameters:
character_value (object) – Value to be stored.
character_type (
CharacterType
) – Description of character value.character_annotations (
AnnotationSet
) – Metadata annotations associated with this character.
- cell_iter()[source]¶
Iterate over triplets of character values and associated
CharacterType
andAnnotationSet
instances.
- character_type_at(idx)[source]¶
Return type of character at
idx
.- Parameters:
idx (integer) – Index of element character type to return.
- Returns:
c (|CharacterType|) –
CharacterType
associated with character indexidx
.
- extend(character_values, character_types=None, character_annotations=None)[source]¶
Extends
self
with values.- Parameters:
character_values (iterable of objects) – Values to be stored.
character_types (iterable of
CharacterType
objects) – Descriptions of character values.character_annotations (iterable
AnnotationSet
objects) – Metadata annotations associated with characters.
- has_annotations_at(idx)[source]¶
Return
True
if character atidx
has metadata annotations.- Parameters:
idx (integer) – Index of element annotations to check.
- Returns:
b (bool) –
True
if character atidx
has metadata annotations,False
otherwise.
- insert(idx, character_value, character_type=None, character_annotations=None)[source]¶
Insert value and associated character type and metadata annotations for element at
idx
.- Parameters:
idx (integer) – Index of element to set.
character_value (object) – Value to be stored.
character_type (
CharacterType
) – Description of character value.character_annotations (
AnnotationSet
) – Metadata annotations associated with this character.
- set_annotations_at(idx, annotations)[source]¶
Set metadata annotations of character at
idx
.- Parameters:
idx (integer) – Index of element annotations to set.
- set_at(idx, character_value, character_type=None, character_annotations=None)[source]¶
Set value and associated character type and metadata annotations for element at
idx
.- Parameters:
idx (integer) – Index of element to set.
character_value (object) – Value to be stored.
character_type (
CharacterType
) – Description of character value.character_annotations (
AnnotationSet
) – Metadata annotations associated with this character.
- set_character_type_at(idx, character_type)[source]¶
Set type of character at
idx
.- Parameters:
idx (integer) – Index of element character type to set.
- symbols_as_list()[source]¶
Returns list of string representation of values of this vector.
- Returns:
v (list) – List of string representation of values making up this vector.
- symbols_as_string(sep='')[source]¶
Returns values of this vector as a single string, with individual value elements separated by
sep
.- Returns:
s (string) – String representation of values making up this vector.
Character Types¶
- class dendropy.datamodel.charmatrixmodel.CharacterType(label=None, state_alphabet=None)[source]¶
A character format or type of a particular column: i.e., maps a particular set of character state definitions to a column in a character matrix.
- property state_alphabet¶
The
StateAlphabet
representing the state alphabet for this column: i.e., the collection of symbols and the state identities to which they map.
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
Character Subsets¶
- class dendropy.datamodel.charmatrixmodel.CharacterSubset(label=None, character_indices=None)[source]¶
Tracks definition of a subset of characters.
- Parameters:
label (str) – Name of this subset.
character_indices (iterable of
int
) – Iterable of 0-based (integer) indices of column positions that constitute this subset.
Character Matrices¶
The CharacterMatrix
Class¶
- class dendropy.datamodel.charmatrixmodel.CharacterMatrix(*args, **kwargs)[source]¶
A data structure that manages assocation of operational taxononomic unit concepts to sequences of character state identities or values.
This is a base class that provides general functionality; derived classes specialize for particular data types. You will not be using the class directly, but rather one of the derived classes below, specialized for data types such as DNA, RNA, continuous, etc.
This class and derived classes behave like a dictionary where the keys are
Taxon
objects and the values areCharacterDataSequence
objects. Access to sequences based on taxon labels as well as indexes are also provided. Numerous methods are provided to manipulate and iterate over sequences. Character partitions can be managed throughCharacterSubset
objects, while management of detailed metadata on character types are available throughCharacterType
objects.Objects can be instantiated by reading data from external sources through the usual
get_from_stream()
,get_from_path()
, orget_from_string()
functions. In addition, a single matrix object can be instantiated from multiple matrices (concatenate()
) or data sources (concatenate_from_paths
).A range of methods also exist for importing data from another matrix object. These vary depending on how “new” and “existing” are treated. A “new” sequence is a sequence in the other matrix associated with a
Taxon
object for which there is no sequence defined in the current matrix. An “existing” sequence is a sequence in the other matrix associated with aTaxon
object for which there is a sequence defined in the current matrix.New Sequences: IGNORED
New Sequences: ADDED
Existing Sequences: IGNORED
[NO-OP]
Existing Sequences: OVERWRITTEN
Existing Sequences: EXTENDED
If character subsets have been defined, these subsets can be exported to independent matrices.
- __delitem__(key)[source]¶
Removes sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- __getitem__(key)[source]¶
Retrieves sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence
) – A sequence associated with theTaxon
instance referenced bykey
.
- __setitem__(key, values)[source]¶
Assigns sequence
values
to taxon specified bykey
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)[source]¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)[source]¶
Adds sequences for
Taxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to add sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
as a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
CharacterDataSequence
- clone(depth=1)¶
Creates and returns a copy of
self
.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_set
of top-level object and memberAnnotation
objects: these are full, independent instances (though any complex objects in thevalue
field ofAnnotation
objects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
instances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)[source]¶
Converts elements of
values
to type of matrix.This method is called by
CharacterMatrix.from_dict
to create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvalues
consists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix
) would dereference the string elements ofvalues
to return a list ofStateIdentity
objects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvalues
should be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)[source]¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)[source]¶
Read a character matrix from each file path given in
paths
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)[source]¶
Read a character matrix from each file object given in
streams
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other
, which must be ofAnnotable
type.Copies are deep-copies, in that the
Annotation
objects added to theannotation_set
AnnotationSet
collection ofself
are independent copies of those in theannotate_set
collection ofother
. However, dynamic bound-attribute annotations retain references to the original objects as given inother
, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper
. In dynamic bound-attribute annotations, the_value
attribute of the annotations object (Annotation._value
) is a tuple consisting of “(obj, attr_name)
”, which instructs theAnnotation
object to return “getattr(obj, attr_name)
” (via: “getattr(*self._value)
”) when returning the value of the Annotation. “obj
” is typically the object to which theAnnotationSet
belongs (i.e.,self
). When a copy ofAnnotation
is created, the object reference given in the first element of the_value
tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable
) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memo
of__deepcopy__
, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotation
gives objectx
as the parent or owner of the attribute (that is, the first element of theAnnotation._value
tuple isother
) andid(x)
is found inattribute_object_mapper
, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]
. Ifattribute_object_mapper
isNone
(default), then the following mapping is automatically inserted:id(other): self
. That is, any references toother
in anyAnnotation
object will be remapped toself
. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
other
in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself
. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)[source]¶
Returns description of object, up to level
depth
.
- discard_sequences(taxa)[source]¶
Removes sequences associated with
Taxon
instances specified intaxa
if they exist.
- export_character_indices(indices)[source]¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices
. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)[source]¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset
. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)[source]¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appending to the sequence currently associated with thatTaxon
reference inself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will replace the sequence currently associated with thatTaxon
reference inself
.
- extend_sequences(other_matrix, is_add_new_sequences=False)[source]¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appended to the sequence currently associated with thatTaxon
reference inself
.All other sequences will be ignored.
- fill(value, size=None, append=True)[source]¶
Pads out all sequences in
self
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- fill_taxa()[source]¶
Adds a new (empty) sequence for each
Taxon
instance in current taxon namespace that does not have a sequence.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)[source]¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxon
objects and sequences as needed.Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. If key is specified as string, then it will be dereferenced to the first existingTaxon
object in the current taxon namespace with the same label. If no suchTaxon
object can be found, then a newTaxon
object is created and added to the current namespace. If a key is specified as aTaxon
object, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence
, then they are added as-is. OtherwiseCharacterDataSequence
instances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()
will be called for this.Examples
The following creates a
DnaCharacterMatrix
instance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxon
objects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence
, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix
) – Instance ofCharacterMatrix
to populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs
.case_sensitive_taxon_labels (boolean) – If
True
, matching of string labels specified as keys ind
will be matched toTaxon
objects in current taxon namespace with case being respected. IfFalse
, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrix
when creating new instance to populate, if no target instance is provided viachar_matrix
.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrix
populated by data fromd
.
- classmethod get(**kwargs)[source]¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
None
otherwise.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src
.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src
.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src
.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src
.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- keep_sequences(taxa)[source]¶
Discards all sequences not associated with any of the
Taxon
instances.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespace
value will be replaced with value given intaxon_namespace
if this is notNone
, or a newTaxonNamespace
object. Following this,reconstruct_taxon_namespace()
will be called: each distinctTaxon
object associated withself
or members ofself
that is not alread intaxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
into the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxon
object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)[source]¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)[source]¶
Creates a new
CharacterDataSequence
associated withTaxon
taxon
, and populates it with values invalues
.- Parameters:
- Returns:
s (
CharacterDataSequence
) – A newCharacterDataSequence
associated withTaxon
taxon
.
- pack(value=None, size=None, append=True)[source]¶
Adds missing sequences for all
Taxon
instances in current namespace, and then pads out all sequences inself
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified. A combination ofCharacterMatrix.fill_taxa
andCharacterMatrix.fill
.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)[source]¶
Returns a set populated with all of
Taxon
instances associated withself
.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self
.
- purge_taxon_namespace()¶
Remove all
Taxon
instances inself.taxon_namespace
that are not associated withself
or any item inself
.
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()
instead. Rebuildstaxon_namespace
from scratch, or assignsTaxon
objects from givenTaxonNamespace
objecttaxon_namespace
based on label values.
- remove_sequences(taxa)[source]¶
Removes sequences associated with
Taxon
instances specified intaxa
. A KeyError is raised if aTaxon
instance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)[source]¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to replace sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()[source]¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequence
objects in self)
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
- update_sequences(other_matrix)[source]¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to update sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.
- update_taxon_namespace()[source]¶
All
Taxon
objects inself
that are not inself.taxon_namespace
will be added.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
self
inschema
format.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest
.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest
.
ContinuousCharacterMatrix
: Continuous Data¶
- class dendropy.datamodel.charmatrixmodel.ContinuousCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrix
for continuous data.Sequences stored using
ContinuousCharacterDataSequence
, with values of elements assumed to befloat
.- __delitem__(key)¶
Removes sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence
) – A sequence associated with theTaxon
instance referenced bykey
.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
values
to taxon specified bykey
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to add sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
as a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
ContinuousCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self
.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_set
of top-level object and memberAnnotation
objects: these are full, independent instances (though any complex objects in thevalue
field ofAnnotation
objects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
instances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
values
to type of matrix.This method is called by
CharacterMatrix.from_dict
to create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvalues
consists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix
) would dereference the string elements ofvalues
to return a list ofStateIdentity
objects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvalues
should be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other
, which must be ofAnnotable
type.Copies are deep-copies, in that the
Annotation
objects added to theannotation_set
AnnotationSet
collection ofself
are independent copies of those in theannotate_set
collection ofother
. However, dynamic bound-attribute annotations retain references to the original objects as given inother
, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper
. In dynamic bound-attribute annotations, the_value
attribute of the annotations object (Annotation._value
) is a tuple consisting of “(obj, attr_name)
”, which instructs theAnnotation
object to return “getattr(obj, attr_name)
” (via: “getattr(*self._value)
”) when returning the value of the Annotation. “obj
” is typically the object to which theAnnotationSet
belongs (i.e.,self
). When a copy ofAnnotation
is created, the object reference given in the first element of the_value
tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable
) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memo
of__deepcopy__
, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotation
gives objectx
as the parent or owner of the attribute (that is, the first element of theAnnotation._value
tuple isother
) andid(x)
is found inattribute_object_mapper
, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]
. Ifattribute_object_mapper
isNone
(default), then the following mapping is automatically inserted:id(other): self
. That is, any references toother
in anyAnnotation
object will be remapped toself
. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
other
in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself
. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth
.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
if they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices
. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset
. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appending to the sequence currently associated with thatTaxon
reference inself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will replace the sequence currently associated with thatTaxon
reference inself
.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appended to the sequence currently associated with thatTaxon
reference inself
.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
self
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxon
instance in current taxon namespace that does not have a sequence.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxon
objects and sequences as needed.Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. If key is specified as string, then it will be dereferenced to the first existingTaxon
object in the current taxon namespace with the same label. If no suchTaxon
object can be found, then a newTaxon
object is created and added to the current namespace. If a key is specified as aTaxon
object, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence
, then they are added as-is. OtherwiseCharacterDataSequence
instances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()
will be called for this.Examples
The following creates a
DnaCharacterMatrix
instance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxon
objects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence
, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix
) – Instance ofCharacterMatrix
to populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs
.case_sensitive_taxon_labels (boolean) – If
True
, matching of string labels specified as keys ind
will be matched toTaxon
objects in current taxon namespace with case being respected. IfFalse
, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrix
when creating new instance to populate, if no target instance is provided viachar_matrix
.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrix
populated by data fromd
.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
None
otherwise.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src
.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src
.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src
.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src
.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespace
value will be replaced with value given intaxon_namespace
if this is notNone
, or a newTaxonNamespace
object. Following this,reconstruct_taxon_namespace()
will be called: each distinctTaxon
object associated withself
or members ofself
that is not alread intaxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
into the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxon
object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequence
associated withTaxon
taxon
, and populates it with values invalues
.- Parameters:
- Returns:
s (
CharacterDataSequence
) – A newCharacterDataSequence
associated withTaxon
taxon
.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxon
instances in current namespace, and then pads out all sequences inself
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified. A combination ofCharacterMatrix.fill_taxa
andCharacterMatrix.fill
.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxon
instances associated withself
.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self
.
- purge_taxon_namespace()¶
Remove all
Taxon
instances inself.taxon_namespace
that are not associated withself
or any item inself
.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()
instead. Rebuildstaxon_namespace
from scratch, or assignsTaxon
objects from givenTaxonNamespace
objecttaxon_namespace
based on label values.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
. A KeyError is raised if aTaxon
instance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to replace sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequence
objects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to update sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.
- update_taxon_namespace()¶
All
Taxon
objects inself
that are not inself.taxon_namespace
will be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
self
inschema
format.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest
.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest
.
DnaCharacterMatrix
: DNA Data¶
- class dendropy.datamodel.charmatrixmodel.DnaCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrix
for DNA data.- __delitem__(key)¶
Removes sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence
) – A sequence associated with theTaxon
instance referenced bykey
.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
values
to taxon specified bykey
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to add sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
as a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
DnaCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self
.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_set
of top-level object and memberAnnotation
objects: these are full, independent instances (though any complex objects in thevalue
field ofAnnotation
objects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
instances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
values
to type of matrix.This method is called by
CharacterMatrix.from_dict
to create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvalues
consists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix
) would dereference the string elements ofvalues
to return a list ofStateIdentity
objects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvalues
should be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other
, which must be ofAnnotable
type.Copies are deep-copies, in that the
Annotation
objects added to theannotation_set
AnnotationSet
collection ofself
are independent copies of those in theannotate_set
collection ofother
. However, dynamic bound-attribute annotations retain references to the original objects as given inother
, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper
. In dynamic bound-attribute annotations, the_value
attribute of the annotations object (Annotation._value
) is a tuple consisting of “(obj, attr_name)
”, which instructs theAnnotation
object to return “getattr(obj, attr_name)
” (via: “getattr(*self._value)
”) when returning the value of the Annotation. “obj
” is typically the object to which theAnnotationSet
belongs (i.e.,self
). When a copy ofAnnotation
is created, the object reference given in the first element of the_value
tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable
) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memo
of__deepcopy__
, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotation
gives objectx
as the parent or owner of the attribute (that is, the first element of theAnnotation._value
tuple isother
) andid(x)
is found inattribute_object_mapper
, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]
. Ifattribute_object_mapper
isNone
(default), then the following mapping is automatically inserted:id(other): self
. That is, any references toother
in anyAnnotation
object will be remapped toself
. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
other
in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself
. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth
.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
if they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices
. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset
. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appending to the sequence currently associated with thatTaxon
reference inself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will replace the sequence currently associated with thatTaxon
reference inself
.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appended to the sequence currently associated with thatTaxon
reference inself
.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
self
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxon
instance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxon
objects and sequences as needed.Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. If key is specified as string, then it will be dereferenced to the first existingTaxon
object in the current taxon namespace with the same label. If no suchTaxon
object can be found, then a newTaxon
object is created and added to the current namespace. If a key is specified as aTaxon
object, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence
, then they are added as-is. OtherwiseCharacterDataSequence
instances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()
will be called for this.Examples
The following creates a
DnaCharacterMatrix
instance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxon
objects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence
, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix
) – Instance ofCharacterMatrix
to populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs
.case_sensitive_taxon_labels (boolean) – If
True
, matching of string labels specified as keys ind
will be matched toTaxon
objects in current taxon namespace with case being respected. IfFalse
, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrix
when creating new instance to populate, if no target instance is provided viachar_matrix
.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrix
populated by data fromd
.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
None
otherwise.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src
.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src
.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src
.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src
.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespace
value will be replaced with value given intaxon_namespace
if this is notNone
, or a newTaxonNamespace
object. Following this,reconstruct_taxon_namespace()
will be called: each distinctTaxon
object associated withself
or members ofself
that is not alread intaxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
into the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxon
object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequence
associated withTaxon
taxon
, and populates it with values invalues
.- Parameters:
- Returns:
s (
CharacterDataSequence
) – A newCharacterDataSequence
associated withTaxon
taxon
.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxon
instances in current namespace, and then pads out all sequences inself
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified. A combination ofCharacterMatrix.fill_taxa
andCharacterMatrix.fill
.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxon
instances associated withself
.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self
.
- purge_taxon_namespace()¶
Remove all
Taxon
instances inself.taxon_namespace
that are not associated withself
or any item inself
.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()
instead. Rebuildstaxon_namespace
from scratch, or assignsTaxon
objects from givenTaxonNamespace
objecttaxon_namespace
based on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa
, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insa
that has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
. A KeyError is raised if aTaxon
instance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to replace sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequence
objects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None
[default], then all characters are included.gaps_as_missing (boolean) – If
True
[default] then gap characters will be treated as missing data values. IfFalse
, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxon
objects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to update sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.
- update_taxon_namespace()¶
All
Taxon
objects inself
that are not inself.taxon_namespace
will be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
self
inschema
format.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest
.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest
.
RnaCharacterMatrix
: RNA Data¶
- class dendropy.datamodel.charmatrixmodel.RnaCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrix
for DNA data.- __delitem__(key)¶
Removes sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence
) – A sequence associated with theTaxon
instance referenced bykey
.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
values
to taxon specified bykey
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to add sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
as a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
RnaCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self
.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_set
of top-level object and memberAnnotation
objects: these are full, independent instances (though any complex objects in thevalue
field ofAnnotation
objects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
instances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
values
to type of matrix.This method is called by
CharacterMatrix.from_dict
to create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvalues
consists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix
) would dereference the string elements ofvalues
to return a list ofStateIdentity
objects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvalues
should be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other
, which must be ofAnnotable
type.Copies are deep-copies, in that the
Annotation
objects added to theannotation_set
AnnotationSet
collection ofself
are independent copies of those in theannotate_set
collection ofother
. However, dynamic bound-attribute annotations retain references to the original objects as given inother
, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper
. In dynamic bound-attribute annotations, the_value
attribute of the annotations object (Annotation._value
) is a tuple consisting of “(obj, attr_name)
”, which instructs theAnnotation
object to return “getattr(obj, attr_name)
” (via: “getattr(*self._value)
”) when returning the value of the Annotation. “obj
” is typically the object to which theAnnotationSet
belongs (i.e.,self
). When a copy ofAnnotation
is created, the object reference given in the first element of the_value
tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable
) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memo
of__deepcopy__
, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotation
gives objectx
as the parent or owner of the attribute (that is, the first element of theAnnotation._value
tuple isother
) andid(x)
is found inattribute_object_mapper
, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]
. Ifattribute_object_mapper
isNone
(default), then the following mapping is automatically inserted:id(other): self
. That is, any references toother
in anyAnnotation
object will be remapped toself
. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
other
in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself
. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth
.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
if they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices
. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset
. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appending to the sequence currently associated with thatTaxon
reference inself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will replace the sequence currently associated with thatTaxon
reference inself
.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appended to the sequence currently associated with thatTaxon
reference inself
.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
self
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxon
instance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxon
objects and sequences as needed.Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. If key is specified as string, then it will be dereferenced to the first existingTaxon
object in the current taxon namespace with the same label. If no suchTaxon
object can be found, then a newTaxon
object is created and added to the current namespace. If a key is specified as aTaxon
object, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence
, then they are added as-is. OtherwiseCharacterDataSequence
instances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()
will be called for this.Examples
The following creates a
DnaCharacterMatrix
instance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxon
objects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence
, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix
) – Instance ofCharacterMatrix
to populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs
.case_sensitive_taxon_labels (boolean) – If
True
, matching of string labels specified as keys ind
will be matched toTaxon
objects in current taxon namespace with case being respected. IfFalse
, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrix
when creating new instance to populate, if no target instance is provided viachar_matrix
.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrix
populated by data fromd
.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
None
otherwise.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src
.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src
.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src
.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src
.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespace
value will be replaced with value given intaxon_namespace
if this is notNone
, or a newTaxonNamespace
object. Following this,reconstruct_taxon_namespace()
will be called: each distinctTaxon
object associated withself
or members ofself
that is not alread intaxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
into the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxon
object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequence
associated withTaxon
taxon
, and populates it with values invalues
.- Parameters:
- Returns:
s (
CharacterDataSequence
) – A newCharacterDataSequence
associated withTaxon
taxon
.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxon
instances in current namespace, and then pads out all sequences inself
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified. A combination ofCharacterMatrix.fill_taxa
andCharacterMatrix.fill
.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxon
instances associated withself
.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self
.
- purge_taxon_namespace()¶
Remove all
Taxon
instances inself.taxon_namespace
that are not associated withself
or any item inself
.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()
instead. Rebuildstaxon_namespace
from scratch, or assignsTaxon
objects from givenTaxonNamespace
objecttaxon_namespace
based on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa
, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insa
that has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
. A KeyError is raised if aTaxon
instance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to replace sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequence
objects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None
[default], then all characters are included.gaps_as_missing (boolean) – If
True
[default] then gap characters will be treated as missing data values. IfFalse
, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxon
objects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to update sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.
- update_taxon_namespace()¶
All
Taxon
objects inself
that are not inself.taxon_namespace
will be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
self
inschema
format.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest
.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest
.
ProteinCharacterMatrix
: Protein (Amino Acid) Data¶
- class dendropy.datamodel.charmatrixmodel.ProteinCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrix
for protein or amino acid data.- __delitem__(key)¶
Removes sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence
) – A sequence associated with theTaxon
instance referenced bykey
.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
values
to taxon specified bykey
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to add sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
as a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
ProteinCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self
.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_set
of top-level object and memberAnnotation
objects: these are full, independent instances (though any complex objects in thevalue
field ofAnnotation
objects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
instances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
values
to type of matrix.This method is called by
CharacterMatrix.from_dict
to create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvalues
consists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix
) would dereference the string elements ofvalues
to return a list ofStateIdentity
objects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvalues
should be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other
, which must be ofAnnotable
type.Copies are deep-copies, in that the
Annotation
objects added to theannotation_set
AnnotationSet
collection ofself
are independent copies of those in theannotate_set
collection ofother
. However, dynamic bound-attribute annotations retain references to the original objects as given inother
, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper
. In dynamic bound-attribute annotations, the_value
attribute of the annotations object (Annotation._value
) is a tuple consisting of “(obj, attr_name)
”, which instructs theAnnotation
object to return “getattr(obj, attr_name)
” (via: “getattr(*self._value)
”) when returning the value of the Annotation. “obj
” is typically the object to which theAnnotationSet
belongs (i.e.,self
). When a copy ofAnnotation
is created, the object reference given in the first element of the_value
tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable
) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memo
of__deepcopy__
, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotation
gives objectx
as the parent or owner of the attribute (that is, the first element of theAnnotation._value
tuple isother
) andid(x)
is found inattribute_object_mapper
, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]
. Ifattribute_object_mapper
isNone
(default), then the following mapping is automatically inserted:id(other): self
. That is, any references toother
in anyAnnotation
object will be remapped toself
. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
other
in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself
. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth
.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
if they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices
. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset
. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appending to the sequence currently associated with thatTaxon
reference inself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will replace the sequence currently associated with thatTaxon
reference inself
.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appended to the sequence currently associated with thatTaxon
reference inself
.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
self
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxon
instance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxon
objects and sequences as needed.Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. If key is specified as string, then it will be dereferenced to the first existingTaxon
object in the current taxon namespace with the same label. If no suchTaxon
object can be found, then a newTaxon
object is created and added to the current namespace. If a key is specified as aTaxon
object, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence
, then they are added as-is. OtherwiseCharacterDataSequence
instances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()
will be called for this.Examples
The following creates a
DnaCharacterMatrix
instance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxon
objects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence
, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix
) – Instance ofCharacterMatrix
to populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs
.case_sensitive_taxon_labels (boolean) – If
True
, matching of string labels specified as keys ind
will be matched toTaxon
objects in current taxon namespace with case being respected. IfFalse
, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrix
when creating new instance to populate, if no target instance is provided viachar_matrix
.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrix
populated by data fromd
.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
None
otherwise.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src
.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src
.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src
.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src
.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespace
value will be replaced with value given intaxon_namespace
if this is notNone
, or a newTaxonNamespace
object. Following this,reconstruct_taxon_namespace()
will be called: each distinctTaxon
object associated withself
or members ofself
that is not alread intaxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
into the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxon
object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequence
associated withTaxon
taxon
, and populates it with values invalues
.- Parameters:
- Returns:
s (
CharacterDataSequence
) – A newCharacterDataSequence
associated withTaxon
taxon
.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxon
instances in current namespace, and then pads out all sequences inself
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified. A combination ofCharacterMatrix.fill_taxa
andCharacterMatrix.fill
.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxon
instances associated withself
.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self
.
- purge_taxon_namespace()¶
Remove all
Taxon
instances inself.taxon_namespace
that are not associated withself
or any item inself
.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()
instead. Rebuildstaxon_namespace
from scratch, or assignsTaxon
objects from givenTaxonNamespace
objecttaxon_namespace
based on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa
, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insa
that has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
. A KeyError is raised if aTaxon
instance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to replace sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequence
objects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None
[default], then all characters are included.gaps_as_missing (boolean) – If
True
[default] then gap characters will be treated as missing data values. IfFalse
, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxon
objects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to update sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.
- update_taxon_namespace()¶
All
Taxon
objects inself
that are not inself.taxon_namespace
will be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
self
inschema
format.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest
.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest
.
RestrictionSitesCharacterMatrix
: Restriction Sites Data¶
- class dendropy.datamodel.charmatrixmodel.RestrictionSitesCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrix
for restriction site data.- __delitem__(key)¶
Removes sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence
) – A sequence associated with theTaxon
instance referenced bykey
.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
values
to taxon specified bykey
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to add sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
as a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
RestrictionSitesCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self
.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_set
of top-level object and memberAnnotation
objects: these are full, independent instances (though any complex objects in thevalue
field ofAnnotation
objects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
instances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
values
to type of matrix.This method is called by
CharacterMatrix.from_dict
to create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvalues
consists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix
) would dereference the string elements ofvalues
to return a list ofStateIdentity
objects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvalues
should be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other
, which must be ofAnnotable
type.Copies are deep-copies, in that the
Annotation
objects added to theannotation_set
AnnotationSet
collection ofself
are independent copies of those in theannotate_set
collection ofother
. However, dynamic bound-attribute annotations retain references to the original objects as given inother
, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper
. In dynamic bound-attribute annotations, the_value
attribute of the annotations object (Annotation._value
) is a tuple consisting of “(obj, attr_name)
”, which instructs theAnnotation
object to return “getattr(obj, attr_name)
” (via: “getattr(*self._value)
”) when returning the value of the Annotation. “obj
” is typically the object to which theAnnotationSet
belongs (i.e.,self
). When a copy ofAnnotation
is created, the object reference given in the first element of the_value
tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable
) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memo
of__deepcopy__
, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotation
gives objectx
as the parent or owner of the attribute (that is, the first element of theAnnotation._value
tuple isother
) andid(x)
is found inattribute_object_mapper
, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]
. Ifattribute_object_mapper
isNone
(default), then the following mapping is automatically inserted:id(other): self
. That is, any references toother
in anyAnnotation
object will be remapped toself
. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
other
in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself
. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth
.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
if they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices
. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset
. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appending to the sequence currently associated with thatTaxon
reference inself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will replace the sequence currently associated with thatTaxon
reference inself
.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appended to the sequence currently associated with thatTaxon
reference inself
.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
self
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxon
instance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxon
objects and sequences as needed.Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. If key is specified as string, then it will be dereferenced to the first existingTaxon
object in the current taxon namespace with the same label. If no suchTaxon
object can be found, then a newTaxon
object is created and added to the current namespace. If a key is specified as aTaxon
object, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence
, then they are added as-is. OtherwiseCharacterDataSequence
instances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()
will be called for this.Examples
The following creates a
DnaCharacterMatrix
instance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxon
objects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence
, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix
) – Instance ofCharacterMatrix
to populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs
.case_sensitive_taxon_labels (boolean) – If
True
, matching of string labels specified as keys ind
will be matched toTaxon
objects in current taxon namespace with case being respected. IfFalse
, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrix
when creating new instance to populate, if no target instance is provided viachar_matrix
.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrix
populated by data fromd
.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
None
otherwise.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src
.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src
.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src
.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src
.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespace
value will be replaced with value given intaxon_namespace
if this is notNone
, or a newTaxonNamespace
object. Following this,reconstruct_taxon_namespace()
will be called: each distinctTaxon
object associated withself
or members ofself
that is not alread intaxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
into the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxon
object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequence
associated withTaxon
taxon
, and populates it with values invalues
.- Parameters:
- Returns:
s (
CharacterDataSequence
) – A newCharacterDataSequence
associated withTaxon
taxon
.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxon
instances in current namespace, and then pads out all sequences inself
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified. A combination ofCharacterMatrix.fill_taxa
andCharacterMatrix.fill
.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxon
instances associated withself
.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self
.
- purge_taxon_namespace()¶
Remove all
Taxon
instances inself.taxon_namespace
that are not associated withself
or any item inself
.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()
instead. Rebuildstaxon_namespace
from scratch, or assignsTaxon
objects from givenTaxonNamespace
objecttaxon_namespace
based on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa
, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insa
that has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
. A KeyError is raised if aTaxon
instance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to replace sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequence
objects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None
[default], then all characters are included.gaps_as_missing (boolean) – If
True
[default] then gap characters will be treated as missing data values. IfFalse
, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxon
objects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to update sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.
- update_taxon_namespace()¶
All
Taxon
objects inself
that are not inself.taxon_namespace
will be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
self
inschema
format.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest
.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest
.
InfiniteSitesCharacterMatrix
: Infinite Sites Data¶
- class dendropy.datamodel.charmatrixmodel.InfiniteSitesCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrix
for infinite sites data.- __delitem__(key)¶
Removes sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence
) – A sequence associated with theTaxon
instance referenced bykey
.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
values
to taxon specified bykey
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to add sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
as a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
InfiniteSitesCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self
.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_set
of top-level object and memberAnnotation
objects: these are full, independent instances (though any complex objects in thevalue
field ofAnnotation
objects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
instances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
values
to type of matrix.This method is called by
CharacterMatrix.from_dict
to create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvalues
consists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix
) would dereference the string elements ofvalues
to return a list ofStateIdentity
objects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvalues
should be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other
, which must be ofAnnotable
type.Copies are deep-copies, in that the
Annotation
objects added to theannotation_set
AnnotationSet
collection ofself
are independent copies of those in theannotate_set
collection ofother
. However, dynamic bound-attribute annotations retain references to the original objects as given inother
, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper
. In dynamic bound-attribute annotations, the_value
attribute of the annotations object (Annotation._value
) is a tuple consisting of “(obj, attr_name)
”, which instructs theAnnotation
object to return “getattr(obj, attr_name)
” (via: “getattr(*self._value)
”) when returning the value of the Annotation. “obj
” is typically the object to which theAnnotationSet
belongs (i.e.,self
). When a copy ofAnnotation
is created, the object reference given in the first element of the_value
tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable
) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memo
of__deepcopy__
, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotation
gives objectx
as the parent or owner of the attribute (that is, the first element of theAnnotation._value
tuple isother
) andid(x)
is found inattribute_object_mapper
, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]
. Ifattribute_object_mapper
isNone
(default), then the following mapping is automatically inserted:id(other): self
. That is, any references toother
in anyAnnotation
object will be remapped toself
. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
other
in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself
. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth
.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
if they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices
. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset
. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appending to the sequence currently associated with thatTaxon
reference inself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will replace the sequence currently associated with thatTaxon
reference inself
.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appended to the sequence currently associated with thatTaxon
reference inself
.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
self
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxon
instance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxon
objects and sequences as needed.Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. If key is specified as string, then it will be dereferenced to the first existingTaxon
object in the current taxon namespace with the same label. If no suchTaxon
object can be found, then a newTaxon
object is created and added to the current namespace. If a key is specified as aTaxon
object, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence
, then they are added as-is. OtherwiseCharacterDataSequence
instances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()
will be called for this.Examples
The following creates a
DnaCharacterMatrix
instance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxon
objects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence
, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix
) – Instance ofCharacterMatrix
to populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs
.case_sensitive_taxon_labels (boolean) – If
True
, matching of string labels specified as keys ind
will be matched toTaxon
objects in current taxon namespace with case being respected. IfFalse
, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrix
when creating new instance to populate, if no target instance is provided viachar_matrix
.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrix
populated by data fromd
.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
None
otherwise.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src
.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src
.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src
.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src
.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespace
value will be replaced with value given intaxon_namespace
if this is notNone
, or a newTaxonNamespace
object. Following this,reconstruct_taxon_namespace()
will be called: each distinctTaxon
object associated withself
or members ofself
that is not alread intaxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
into the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxon
object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequence
associated withTaxon
taxon
, and populates it with values invalues
.- Parameters:
- Returns:
s (
CharacterDataSequence
) – A newCharacterDataSequence
associated withTaxon
taxon
.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxon
instances in current namespace, and then pads out all sequences inself
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified. A combination ofCharacterMatrix.fill_taxa
andCharacterMatrix.fill
.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxon
instances associated withself
.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self
.
- purge_taxon_namespace()¶
Remove all
Taxon
instances inself.taxon_namespace
that are not associated withself
or any item inself
.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()
instead. Rebuildstaxon_namespace
from scratch, or assignsTaxon
objects from givenTaxonNamespace
objecttaxon_namespace
based on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa
, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insa
that has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
. A KeyError is raised if aTaxon
instance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to replace sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequence
objects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None
[default], then all characters are included.gaps_as_missing (boolean) – If
True
[default] then gap characters will be treated as missing data values. IfFalse
, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxon
objects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to update sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.
- update_taxon_namespace()¶
All
Taxon
objects inself
that are not inself.taxon_namespace
will be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
self
inschema
format.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest
.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest
.
StandardCharacterMatrix
: “Standard” Data¶
- class dendropy.datamodel.charmatrixmodel.StandardCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrix
for “standard” data (i.e., generic discrete character data).A default state alphabet consisting of state symbols of 0-9 will automatically be created unless the
default_state_alphabet=None
is passed in. To specify a different default state alphabet:default_state_alphabet=dendropy.new_standard_state_alphabet("abc") default_state_alphabet=dendropy.new_standard_state_alphabet("ij")
- __delitem__(key)¶
Removes sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence
) – A sequence associated with theTaxon
instance referenced bykey
.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
values
to taxon specified bykey
, which can be a index or a label of aTaxon
instance in the current taxon namespace, or aTaxon
instance directly.If no sequence is currently associated with specified
Taxon
, a new one will be created. Note that theTaxon
object must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon
) – If an integer, assumed to be an index of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. If a string, assumed to be a label of aTaxon
object in the currentTaxonNamespace
object ofself.taxon_namespace
. Otherwise, assumed to beTaxon
instance directly. In all cases, theTaxon
object must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to add sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
as a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
StandardCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self
.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_set
of top-level object and memberAnnotation
objects: these are full, independent instances (though any complex objects in thevalue
field ofAnnotation
objects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
instances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)[source]¶
Converts elements of
values
to type of matrix.This method is called by
CharacterMatrix.from_dict
to create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvalues
consists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix
) would dereference the string elements ofvalues
to return a list ofStateIdentity
objects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvalues
should be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams
, assuming data format/schemaschema
, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other
, which must be ofAnnotable
type.Copies are deep-copies, in that the
Annotation
objects added to theannotation_set
AnnotationSet
collection ofself
are independent copies of those in theannotate_set
collection ofother
. However, dynamic bound-attribute annotations retain references to the original objects as given inother
, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper
. In dynamic bound-attribute annotations, the_value
attribute of the annotations object (Annotation._value
) is a tuple consisting of “(obj, attr_name)
”, which instructs theAnnotation
object to return “getattr(obj, attr_name)
” (via: “getattr(*self._value)
”) when returning the value of the Annotation. “obj
” is typically the object to which theAnnotationSet
belongs (i.e.,self
). When a copy ofAnnotation
is created, the object reference given in the first element of the_value
tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable
) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memo
of__deepcopy__
, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotation
gives objectx
as the parent or owner of the attribute (that is, the first element of theAnnotation._value
tuple isother
) andid(x)
is found inattribute_object_mapper
, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]
. Ifattribute_object_mapper
isNone
(default), then the following mapping is automatically inserted:id(other): self
. That is, any references toother
in anyAnnotation
object will be remapped toself
. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
other
in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself
. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth
.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
if they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices
. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset
. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appending to the sequence currently associated with thatTaxon
reference inself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will replace the sequence currently associated with thatTaxon
reference inself
.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
self
with characters associated with correspondingTaxon
objects inother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to extend sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
that is also inself
will be appended to the sequence currently associated with thatTaxon
reference inself
.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
self
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxon
instance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxon
objects and sequences as needed.Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. If key is specified as string, then it will be dereferenced to the first existingTaxon
object in the current taxon namespace with the same label. If no suchTaxon
object can be found, then a newTaxon
object is created and added to the current namespace. If a key is specified as aTaxon
object, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence
, then they are added as-is. OtherwiseCharacterDataSequence
instances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()
will be called for this.Examples
The following creates a
DnaCharacterMatrix
instance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxon
objects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence
, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxon
objects orTaxon
objects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix
) – Instance ofCharacterMatrix
to populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs
.case_sensitive_taxon_labels (boolean) – If
True
, matching of string labels specified as keys ind
will be matched toTaxon
objects in current taxon namespace with case being respected. IfFalse
, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrix
when creating new instance to populate, if no target instance is provided viachar_matrix
.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrix
populated by data fromd
.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
None
otherwise.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src
.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src
.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src
.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src
.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespace
value will be replaced with value given intaxon_namespace
if this is notNone
, or a newTaxonNamespace
object. Following this,reconstruct_taxon_namespace()
will be called: each distinctTaxon
object associated withself
or members ofself
that is not alread intaxon_namespace
will be replaced with a newTaxon
object that will be created with the same label and added toself.taxon_namespace
. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitive
setting. IfFalse
andunify_taxa_by_label
is alsoTrue
, then the establishment of correspondence betweenTaxon
objects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxon
objects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxon
object in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue
: ifunify_taxa_by_label
isTrue
,Taxon
objects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
into the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True
, then references to distinctTaxon
objects with identical labels in the current namespace will be replaced with a reference to a singleTaxon
object in the new namespace. IfFalse
: references to distinctTaxon
objects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memo
of deepcopy, this is a dictionary that mapsTaxon
objects in the old namespace to correspondingTaxon
objects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxon
object in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequence
associated withTaxon
taxon
, and populates it with values invalues
.- Parameters:
- Returns:
s (
CharacterDataSequence
) – A newCharacterDataSequence
associated withTaxon
taxon
.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxon
instances in current namespace, and then pads out all sequences inself
by addingvalue
to each sequence until its length issize
long or equal to the length of the longest sequence ifsize
is not specified. A combination ofCharacterMatrix.fill_taxa
andCharacterMatrix.fill
.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentity
for discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None
, then the maximum (longest) sequence size will be used.append (boolean) – If
True
(default), then new values will be added to the end of each sequence. IfFalse
, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxon
instances associated withself
.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self
.
- purge_taxon_namespace()¶
Remove all
Taxon
instances inself.taxon_namespace
that are not associated withself
or any item inself
.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()
instead. Rebuildstaxon_namespace
from scratch, or assignsTaxon
objects from givenTaxonNamespace
objecttaxon_namespace
based on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa
, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insa
that has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxon
instances specified intaxa
. A KeyError is raised if aTaxon
instance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to replace sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequence
objects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None
[default], then all characters are included.gaps_as_missing (boolean) – If
True
[default] then gap characters will be treated as missing data values. IfFalse
, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxon
objects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False
{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxon
objects shared betweenself
andother_matrix
and adds sequences forTaxon
objects that are inother_matrix
but not inself
.- Parameters:
other_matrix (
CharacterMatrix
) – Matrix from which to update sequences.
Notes
other_matrix
must be of same type asself
.other_matrix
must have the sameTaxonNamespace
asself
.Each sequence associated with a
Taxon
reference inother_matrix
but not inself
will be added toself
.Each sequence in
self
associated with aTaxon
that is also represented inother_matrix
will be replaced with a shallow-copy of the corresponding sequence fromother_matrix
.
- update_taxon_namespace()¶
All
Taxon
objects inself
that are not inself.taxon_namespace
will be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
self
inschema
format.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest
.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest
.