dendropy.datamodel.charmatrixmodel: Character Sequences and Matrices¶
Character Sequences¶
- class dendropy.datamodel.charmatrixmodel.CharacterDataSequence(character_values=None, character_types=None, character_annotations=None)[source]¶
A sequence of character values or values for a particular taxon or entry in a data matrix.
Objects of this class can be (almost) treated as simple lists, where the elements are the values of characters (typically, real values in the case of continuous data, and special instances of
StateIdentityobjects in the case of discrete data.Character type data (represented by
CharacterTypeinstances) and metadata annotations (represented byAnnotationSetinstances), if any, are maintained in a parallel list that need to be accessed separately using the index of the value to which the data correspond. So, for example, theAnnotationSetobject containing the metadata annotations for the first value in a sequence,s[0], is available throughs.annotations_at(0), while the character type information for that first element is available throughs.character_type_at(0)and can be set throughs.set_character_type_at(0, c).In most cases where metadata annotations and character type information are not needed, treating objects of this class as a simple list provides all the functionality needed. Where metadata annotations or character type information are required, all the standard list mutation methods (e.g.,
CharacterDataSequence.insert,CharacterDataSequence.append,CharacterDataSequence.extend) also take optionalcharacter_typeandcharacter_annotationsargument in addition to the primarycharacter_valueargument, thus allowing for setting of the value, character type, and annotation set simultaneously. While iteration over character values are available through the standard list iteration interface, the methodCharacterDataSequence.cell_iter()provides for iterating over<character-value, character-type, character-annotation-set>triplets.- Parameters:
character_values (iterable of values) – A set of values for this sequence.
- annotations_at(idx)[source]¶
Return metadata annotations of character at
idx.- Parameters:
idx (integer) – Index of element annotations to return.
- Returns:
c (|AnnotationSet|) –
AnnotationSetrepresenting metadata annotations of character at indexidx.
- append(character_value, character_type=None, character_annotations=None)[source]¶
Adds a value to
self.- Parameters:
character_value (object) – Value to be stored.
character_type (
CharacterType) – Description of character value.character_annotations (
AnnotationSet) – Metadata annotations associated with this character.
- cell_iter()[source]¶
Iterate over triplets of character values and associated
CharacterTypeandAnnotationSetinstances.
- character_type_at(idx)[source]¶
Return type of character at
idx.- Parameters:
idx (integer) – Index of element character type to return.
- Returns:
c (|CharacterType|) –
CharacterTypeassociated with character indexidx.
- extend(character_values, character_types=None, character_annotations=None)[source]¶
Extends
selfwith values.- Parameters:
character_values (iterable of objects) – Values to be stored.
character_types (iterable of
CharacterTypeobjects) – Descriptions of character values.character_annotations (iterable
AnnotationSetobjects) – Metadata annotations associated with characters.
- has_annotations_at(idx)[source]¶
Return
Trueif character atidxhas metadata annotations.- Parameters:
idx (integer) – Index of element annotations to check.
- Returns:
b (bool) –
Trueif character atidxhas metadata annotations,Falseotherwise.
- insert(idx, character_value, character_type=None, character_annotations=None)[source]¶
Insert value and associated character type and metadata annotations for element at
idx.- Parameters:
idx (integer) – Index of element to set.
character_value (object) – Value to be stored.
character_type (
CharacterType) – Description of character value.character_annotations (
AnnotationSet) – Metadata annotations associated with this character.
- set_annotations_at(idx, annotations)[source]¶
Set metadata annotations of character at
idx.- Parameters:
idx (integer) – Index of element annotations to set.
- set_at(idx, character_value, character_type=None, character_annotations=None)[source]¶
Set value and associated character type and metadata annotations for element at
idx.- Parameters:
idx (integer) – Index of element to set.
character_value (object) – Value to be stored.
character_type (
CharacterType) – Description of character value.character_annotations (
AnnotationSet) – Metadata annotations associated with this character.
- set_character_type_at(idx, character_type)[source]¶
Set type of character at
idx.- Parameters:
idx (integer) – Index of element character type to set.
- symbols_as_list()[source]¶
Returns list of string representation of values of this vector.
- Returns:
v (list) – List of string representation of values making up this vector.
- symbols_as_string(sep='')[source]¶
Returns values of this vector as a single string, with individual value elements separated by
sep.- Returns:
s (string) – String representation of values making up this vector.
Character Types¶
- class dendropy.datamodel.charmatrixmodel.CharacterType(label=None, state_alphabet=None)[source]¶
A character format or type of a particular column: i.e., maps a particular set of character state definitions to a column in a character matrix.
- property state_alphabet¶
The
StateAlphabetrepresenting the state alphabet for this column: i.e., the collection of symbols and the state identities to which they map.
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxonobjects: these are preserved as references.
Character Subsets¶
- class dendropy.datamodel.charmatrixmodel.CharacterSubset(label=None, character_indices=None)[source]¶
Tracks definition of a subset of characters.
- Parameters:
label (str) – Name of this subset.
character_indices (iterable of
int) – Iterable of 0-based (integer) indices of column positions that constitute this subset.
Character Matrices¶
The CharacterMatrix Class¶
- class dendropy.datamodel.charmatrixmodel.CharacterMatrix(*args, **kwargs)[source]¶
A data structure that manages assocation of operational taxononomic unit concepts to sequences of character state identities or values.
This is a base class that provides general functionality; derived classes specialize for particular data types. You will not be using the class directly, but rather one of the derived classes below, specialized for data types such as DNA, RNA, continuous, etc.
This class and derived classes behave like a dictionary where the keys are
Taxonobjects and the values areCharacterDataSequenceobjects. Access to sequences based on taxon labels as well as indexes are also provided. Numerous methods are provided to manipulate and iterate over sequences. Character partitions can be managed throughCharacterSubsetobjects, while management of detailed metadata on character types are available throughCharacterTypeobjects.Objects can be instantiated by reading data from external sources through the usual
get_from_stream(),get_from_path(), orget_from_string()functions. In addition, a single matrix object can be instantiated from multiple matrices (concatenate()) or data sources (concatenate_from_paths).A range of methods also exist for importing data from another matrix object. These vary depending on how “new” and “existing” are treated. A “new” sequence is a sequence in the other matrix associated with a
Taxonobject for which there is no sequence defined in the current matrix. An “existing” sequence is a sequence in the other matrix associated with aTaxonobject for which there is a sequence defined in the current matrix.New Sequences: IGNORED
New Sequences: ADDED
Existing Sequences: IGNORED
[NO-OP]
Existing Sequences: OVERWRITTEN
Existing Sequences: EXTENDED
If character subsets have been defined, these subsets can be exported to independent matrices.
- __delitem__(key)[source]¶
Removes sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- __getitem__(key)[source]¶
Retrieves sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence) – A sequence associated with theTaxoninstance referenced bykey.
- __setitem__(key, values)[source]¶
Assigns sequence
valuesto taxon specified bykey, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)[source]¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)[source]¶
Adds sequences for
Taxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to add sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toselfas a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
CharacterDataSequence
- clone(depth=1)¶
Creates and returns a copy of
self.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_setof top-level object and memberAnnotationobjects: these are full, independent instances (though any complex objects in thevaluefield ofAnnotationobjects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxoninstances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)[source]¶
Converts elements of
valuesto type of matrix.This method is called by
CharacterMatrix.from_dictto create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvaluesconsists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix) would dereference the string elements ofvaluesto return a list ofStateIdentityobjects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvaluesshould be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)[source]¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)[source]¶
Read a character matrix from each file path given in
paths, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)[source]¶
Read a character matrix from each file object given in
streams, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other, which must be ofAnnotabletype.Copies are deep-copies, in that the
Annotationobjects added to theannotation_setAnnotationSetcollection ofselfare independent copies of those in theannotate_setcollection ofother. However, dynamic bound-attribute annotations retain references to the original objects as given inother, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper. In dynamic bound-attribute annotations, the_valueattribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs theAnnotationobject to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which theAnnotationSetbelongs (i.e.,self). When a copy ofAnnotationis created, the object reference given in the first element of the_valuetuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memoof__deepcopy__, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotationgives objectxas the parent or owner of the attribute (that is, the first element of theAnnotation._valuetuple isother) andid(x)is found inattribute_object_mapper, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]. Ifattribute_object_mapperisNone(default), then the following mapping is automatically inserted:id(other): self. That is, any references tootherin anyAnnotationobject will be remapped toself. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
otherin any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)[source]¶
Returns description of object, up to level
depth.
- discard_sequences(taxa)[source]¶
Removes sequences associated with
Taxoninstances specified intaxaif they exist.
- export_character_indices(indices)[source]¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)[source]¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)[source]¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appending to the sequence currently associated with thatTaxonreference inself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill replace the sequence currently associated with thatTaxonreference inself.
- extend_sequences(other_matrix, is_add_new_sequences=False)[source]¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appended to the sequence currently associated with thatTaxonreference inself.All other sequences will be ignored.
- fill(value, size=None, append=True)[source]¶
Pads out all sequences in
selfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- fill_taxa()[source]¶
Adds a new (empty) sequence for each
Taxoninstance in current taxon namespace that does not have a sequence.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)[source]¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxonobjects and sequences as needed.Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. If key is specified as string, then it will be dereferenced to the first existingTaxonobject in the current taxon namespace with the same label. If no suchTaxonobject can be found, then a newTaxonobject is created and added to the current namespace. If a key is specified as aTaxonobject, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence, then they are added as-is. OtherwiseCharacterDataSequenceinstances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()will be called for this.Examples
The following creates a
DnaCharacterMatrixinstance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxonobjects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix) – Instance ofCharacterMatrixto populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs.case_sensitive_taxon_labels (boolean) – If
True, matching of string labels specified as keys indwill be matched toTaxonobjects in current taxon namespace with case being respected. IfFalse, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrixwhen creating new instance to populate, if no target instance is provided viachar_matrix.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrixpopulated by data fromd.
- classmethod get(**kwargs)[source]¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
Noneotherwise.taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinstance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- keep_sequences(taxa)[source]¶
Discards all sequences not associated with any of the
Taxoninstances.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespacevalue will be replaced with value given intaxon_namespaceif this is notNone, or a newTaxonNamespaceobject. Following this,reconstruct_taxon_namespace()will be called: each distinctTaxonobject associated withselfor members ofselfthat is not alread intaxon_namespacewill be replaced with a newTaxonobject that will be created with the same label and added toself.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitivesetting. IfFalseandunify_taxa_by_labelis alsoTrue, then the establishment of correspondence betweenTaxonobjects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxonobjects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxonobject in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue: ifunify_taxa_by_labelisTrue,Taxonobjects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinto the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True, then references to distinctTaxonobjects with identical labels in the current namespace will be replaced with a reference to a singleTaxonobject in the new namespace. IfFalse: references to distinctTaxonobjects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memoof deepcopy, this is a dictionary that mapsTaxonobjects in the old namespace to correspondingTaxonobjects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxonobject in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)[source]¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)[source]¶
Creates a new
CharacterDataSequenceassociated withTaxontaxon, and populates it with values invalues.- Parameters:
- Returns:
s (
CharacterDataSequence) – A newCharacterDataSequenceassociated withTaxontaxon.
- pack(value=None, size=None, append=True)[source]¶
Adds missing sequences for all
Taxoninstances in current namespace, and then pads out all sequences inselfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified. A combination ofCharacterMatrix.fill_taxaandCharacterMatrix.fill.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)[source]¶
Returns a set populated with all of
Taxoninstances associated withself.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self.
- purge_taxon_namespace()¶
Remove all
Taxoninstances inself.taxon_namespacethat are not associated withselfor any item inself.
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()instead. Rebuildstaxon_namespacefrom scratch, or assignsTaxonobjects from givenTaxonNamespaceobjecttaxon_namespacebased on label values.
- remove_sequences(taxa)[source]¶
Removes sequences associated with
Taxoninstances specified intaxa. A KeyError is raised if aTaxoninstance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)[source]¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to replace sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()[source]¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequenceobjects in self)
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxonobjects: these are preserved as references.
- update_sequences(other_matrix)[source]¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to update sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.
- update_taxon_namespace()[source]¶
All
Taxonobjects inselfthat are not inself.taxon_namespacewill be added.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
selfinschemaformat.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest.
ContinuousCharacterMatrix: Continuous Data¶
- class dendropy.datamodel.charmatrixmodel.ContinuousCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrixfor continuous data.Sequences stored using
ContinuousCharacterDataSequence, with values of elements assumed to befloat.- __delitem__(key)¶
Removes sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence) – A sequence associated with theTaxoninstance referenced bykey.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
valuesto taxon specified bykey, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to add sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toselfas a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
ContinuousCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_setof top-level object and memberAnnotationobjects: these are full, independent instances (though any complex objects in thevaluefield ofAnnotationobjects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxoninstances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
valuesto type of matrix.This method is called by
CharacterMatrix.from_dictto create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvaluesconsists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix) would dereference the string elements ofvaluesto return a list ofStateIdentityobjects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvaluesshould be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other, which must be ofAnnotabletype.Copies are deep-copies, in that the
Annotationobjects added to theannotation_setAnnotationSetcollection ofselfare independent copies of those in theannotate_setcollection ofother. However, dynamic bound-attribute annotations retain references to the original objects as given inother, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper. In dynamic bound-attribute annotations, the_valueattribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs theAnnotationobject to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which theAnnotationSetbelongs (i.e.,self). When a copy ofAnnotationis created, the object reference given in the first element of the_valuetuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memoof__deepcopy__, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotationgives objectxas the parent or owner of the attribute (that is, the first element of theAnnotation._valuetuple isother) andid(x)is found inattribute_object_mapper, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]. Ifattribute_object_mapperisNone(default), then the following mapping is automatically inserted:id(other): self. That is, any references tootherin anyAnnotationobject will be remapped toself. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
otherin any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxaif they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appending to the sequence currently associated with thatTaxonreference inself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill replace the sequence currently associated with thatTaxonreference inself.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appended to the sequence currently associated with thatTaxonreference inself.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
selfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxoninstance in current taxon namespace that does not have a sequence.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxonobjects and sequences as needed.Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. If key is specified as string, then it will be dereferenced to the first existingTaxonobject in the current taxon namespace with the same label. If no suchTaxonobject can be found, then a newTaxonobject is created and added to the current namespace. If a key is specified as aTaxonobject, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence, then they are added as-is. OtherwiseCharacterDataSequenceinstances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()will be called for this.Examples
The following creates a
DnaCharacterMatrixinstance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxonobjects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix) – Instance ofCharacterMatrixto populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs.case_sensitive_taxon_labels (boolean) – If
True, matching of string labels specified as keys indwill be matched toTaxonobjects in current taxon namespace with case being respected. IfFalse, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrixwhen creating new instance to populate, if no target instance is provided viachar_matrix.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrixpopulated by data fromd.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
Noneotherwise.taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinstance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespacevalue will be replaced with value given intaxon_namespaceif this is notNone, or a newTaxonNamespaceobject. Following this,reconstruct_taxon_namespace()will be called: each distinctTaxonobject associated withselfor members ofselfthat is not alread intaxon_namespacewill be replaced with a newTaxonobject that will be created with the same label and added toself.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitivesetting. IfFalseandunify_taxa_by_labelis alsoTrue, then the establishment of correspondence betweenTaxonobjects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxonobjects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxonobject in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue: ifunify_taxa_by_labelisTrue,Taxonobjects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinto the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True, then references to distinctTaxonobjects with identical labels in the current namespace will be replaced with a reference to a singleTaxonobject in the new namespace. IfFalse: references to distinctTaxonobjects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memoof deepcopy, this is a dictionary that mapsTaxonobjects in the old namespace to correspondingTaxonobjects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxonobject in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequenceassociated withTaxontaxon, and populates it with values invalues.- Parameters:
- Returns:
s (
CharacterDataSequence) – A newCharacterDataSequenceassociated withTaxontaxon.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxoninstances in current namespace, and then pads out all sequences inselfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified. A combination ofCharacterMatrix.fill_taxaandCharacterMatrix.fill.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxoninstances associated withself.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self.
- purge_taxon_namespace()¶
Remove all
Taxoninstances inself.taxon_namespacethat are not associated withselfor any item inself.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()instead. Rebuildstaxon_namespacefrom scratch, or assignsTaxonobjects from givenTaxonNamespaceobjecttaxon_namespacebased on label values.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxa. A KeyError is raised if aTaxoninstance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to replace sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequenceobjects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxonobjects: these are preserved as references.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to update sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.
- update_taxon_namespace()¶
All
Taxonobjects inselfthat are not inself.taxon_namespacewill be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
selfinschemaformat.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest.
DnaCharacterMatrix: DNA Data¶
- class dendropy.datamodel.charmatrixmodel.DnaCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrixfor DNA data.- __delitem__(key)¶
Removes sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence) – A sequence associated with theTaxoninstance referenced bykey.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
valuesto taxon specified bykey, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to add sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toselfas a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
DnaCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_setof top-level object and memberAnnotationobjects: these are full, independent instances (though any complex objects in thevaluefield ofAnnotationobjects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxoninstances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
valuesto type of matrix.This method is called by
CharacterMatrix.from_dictto create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvaluesconsists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix) would dereference the string elements ofvaluesto return a list ofStateIdentityobjects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvaluesshould be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other, which must be ofAnnotabletype.Copies are deep-copies, in that the
Annotationobjects added to theannotation_setAnnotationSetcollection ofselfare independent copies of those in theannotate_setcollection ofother. However, dynamic bound-attribute annotations retain references to the original objects as given inother, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper. In dynamic bound-attribute annotations, the_valueattribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs theAnnotationobject to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which theAnnotationSetbelongs (i.e.,self). When a copy ofAnnotationis created, the object reference given in the first element of the_valuetuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memoof__deepcopy__, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotationgives objectxas the parent or owner of the attribute (that is, the first element of theAnnotation._valuetuple isother) andid(x)is found inattribute_object_mapper, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]. Ifattribute_object_mapperisNone(default), then the following mapping is automatically inserted:id(other): self. That is, any references tootherin anyAnnotationobject will be remapped toself. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
otherin any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxaif they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appending to the sequence currently associated with thatTaxonreference inself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill replace the sequence currently associated with thatTaxonreference inself.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appended to the sequence currently associated with thatTaxonreference inself.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
selfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxoninstance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxonobjects and sequences as needed.Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. If key is specified as string, then it will be dereferenced to the first existingTaxonobject in the current taxon namespace with the same label. If no suchTaxonobject can be found, then a newTaxonobject is created and added to the current namespace. If a key is specified as aTaxonobject, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence, then they are added as-is. OtherwiseCharacterDataSequenceinstances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()will be called for this.Examples
The following creates a
DnaCharacterMatrixinstance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxonobjects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix) – Instance ofCharacterMatrixto populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs.case_sensitive_taxon_labels (boolean) – If
True, matching of string labels specified as keys indwill be matched toTaxonobjects in current taxon namespace with case being respected. IfFalse, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrixwhen creating new instance to populate, if no target instance is provided viachar_matrix.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrixpopulated by data fromd.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
Noneotherwise.taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinstance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespacevalue will be replaced with value given intaxon_namespaceif this is notNone, or a newTaxonNamespaceobject. Following this,reconstruct_taxon_namespace()will be called: each distinctTaxonobject associated withselfor members ofselfthat is not alread intaxon_namespacewill be replaced with a newTaxonobject that will be created with the same label and added toself.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitivesetting. IfFalseandunify_taxa_by_labelis alsoTrue, then the establishment of correspondence betweenTaxonobjects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxonobjects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxonobject in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue: ifunify_taxa_by_labelisTrue,Taxonobjects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinto the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True, then references to distinctTaxonobjects with identical labels in the current namespace will be replaced with a reference to a singleTaxonobject in the new namespace. IfFalse: references to distinctTaxonobjects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memoof deepcopy, this is a dictionary that mapsTaxonobjects in the old namespace to correspondingTaxonobjects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxonobject in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequenceassociated withTaxontaxon, and populates it with values invalues.- Parameters:
- Returns:
s (
CharacterDataSequence) – A newCharacterDataSequenceassociated withTaxontaxon.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxoninstances in current namespace, and then pads out all sequences inselfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified. A combination ofCharacterMatrix.fill_taxaandCharacterMatrix.fill.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxoninstances associated withself.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self.
- purge_taxon_namespace()¶
Remove all
Taxoninstances inself.taxon_namespacethat are not associated withselfor any item inself.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()instead. Rebuildstaxon_namespacefrom scratch, or assignsTaxonobjects from givenTaxonNamespaceobjecttaxon_namespacebased on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insathat has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxa. A KeyError is raised if aTaxoninstance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to replace sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequenceobjects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxonobjects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None[default], then all characters are included.gaps_as_missing (boolean) – If
True[default] then gap characters will be treated as missing data values. IfFalse, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxonobjects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to update sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.
- update_taxon_namespace()¶
All
Taxonobjects inselfthat are not inself.taxon_namespacewill be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
selfinschemaformat.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest.
RnaCharacterMatrix: RNA Data¶
- class dendropy.datamodel.charmatrixmodel.RnaCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrixfor DNA data.- __delitem__(key)¶
Removes sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence) – A sequence associated with theTaxoninstance referenced bykey.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
valuesto taxon specified bykey, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to add sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toselfas a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
RnaCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_setof top-level object and memberAnnotationobjects: these are full, independent instances (though any complex objects in thevaluefield ofAnnotationobjects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxoninstances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
valuesto type of matrix.This method is called by
CharacterMatrix.from_dictto create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvaluesconsists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix) would dereference the string elements ofvaluesto return a list ofStateIdentityobjects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvaluesshould be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other, which must be ofAnnotabletype.Copies are deep-copies, in that the
Annotationobjects added to theannotation_setAnnotationSetcollection ofselfare independent copies of those in theannotate_setcollection ofother. However, dynamic bound-attribute annotations retain references to the original objects as given inother, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper. In dynamic bound-attribute annotations, the_valueattribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs theAnnotationobject to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which theAnnotationSetbelongs (i.e.,self). When a copy ofAnnotationis created, the object reference given in the first element of the_valuetuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memoof__deepcopy__, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotationgives objectxas the parent or owner of the attribute (that is, the first element of theAnnotation._valuetuple isother) andid(x)is found inattribute_object_mapper, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]. Ifattribute_object_mapperisNone(default), then the following mapping is automatically inserted:id(other): self. That is, any references tootherin anyAnnotationobject will be remapped toself. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
otherin any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxaif they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appending to the sequence currently associated with thatTaxonreference inself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill replace the sequence currently associated with thatTaxonreference inself.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appended to the sequence currently associated with thatTaxonreference inself.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
selfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxoninstance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxonobjects and sequences as needed.Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. If key is specified as string, then it will be dereferenced to the first existingTaxonobject in the current taxon namespace with the same label. If no suchTaxonobject can be found, then a newTaxonobject is created and added to the current namespace. If a key is specified as aTaxonobject, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence, then they are added as-is. OtherwiseCharacterDataSequenceinstances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()will be called for this.Examples
The following creates a
DnaCharacterMatrixinstance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxonobjects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix) – Instance ofCharacterMatrixto populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs.case_sensitive_taxon_labels (boolean) – If
True, matching of string labels specified as keys indwill be matched toTaxonobjects in current taxon namespace with case being respected. IfFalse, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrixwhen creating new instance to populate, if no target instance is provided viachar_matrix.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrixpopulated by data fromd.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
Noneotherwise.taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinstance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespacevalue will be replaced with value given intaxon_namespaceif this is notNone, or a newTaxonNamespaceobject. Following this,reconstruct_taxon_namespace()will be called: each distinctTaxonobject associated withselfor members ofselfthat is not alread intaxon_namespacewill be replaced with a newTaxonobject that will be created with the same label and added toself.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitivesetting. IfFalseandunify_taxa_by_labelis alsoTrue, then the establishment of correspondence betweenTaxonobjects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxonobjects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxonobject in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue: ifunify_taxa_by_labelisTrue,Taxonobjects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinto the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True, then references to distinctTaxonobjects with identical labels in the current namespace will be replaced with a reference to a singleTaxonobject in the new namespace. IfFalse: references to distinctTaxonobjects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memoof deepcopy, this is a dictionary that mapsTaxonobjects in the old namespace to correspondingTaxonobjects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxonobject in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequenceassociated withTaxontaxon, and populates it with values invalues.- Parameters:
- Returns:
s (
CharacterDataSequence) – A newCharacterDataSequenceassociated withTaxontaxon.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxoninstances in current namespace, and then pads out all sequences inselfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified. A combination ofCharacterMatrix.fill_taxaandCharacterMatrix.fill.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxoninstances associated withself.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self.
- purge_taxon_namespace()¶
Remove all
Taxoninstances inself.taxon_namespacethat are not associated withselfor any item inself.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()instead. Rebuildstaxon_namespacefrom scratch, or assignsTaxonobjects from givenTaxonNamespaceobjecttaxon_namespacebased on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insathat has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxa. A KeyError is raised if aTaxoninstance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to replace sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequenceobjects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxonobjects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None[default], then all characters are included.gaps_as_missing (boolean) – If
True[default] then gap characters will be treated as missing data values. IfFalse, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxonobjects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to update sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.
- update_taxon_namespace()¶
All
Taxonobjects inselfthat are not inself.taxon_namespacewill be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
selfinschemaformat.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest.
ProteinCharacterMatrix: Protein (Amino Acid) Data¶
- class dendropy.datamodel.charmatrixmodel.ProteinCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrixfor protein or amino acid data.- __delitem__(key)¶
Removes sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence) – A sequence associated with theTaxoninstance referenced bykey.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
valuesto taxon specified bykey, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to add sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toselfas a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
ProteinCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_setof top-level object and memberAnnotationobjects: these are full, independent instances (though any complex objects in thevaluefield ofAnnotationobjects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxoninstances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
valuesto type of matrix.This method is called by
CharacterMatrix.from_dictto create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvaluesconsists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix) would dereference the string elements ofvaluesto return a list ofStateIdentityobjects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvaluesshould be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other, which must be ofAnnotabletype.Copies are deep-copies, in that the
Annotationobjects added to theannotation_setAnnotationSetcollection ofselfare independent copies of those in theannotate_setcollection ofother. However, dynamic bound-attribute annotations retain references to the original objects as given inother, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper. In dynamic bound-attribute annotations, the_valueattribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs theAnnotationobject to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which theAnnotationSetbelongs (i.e.,self). When a copy ofAnnotationis created, the object reference given in the first element of the_valuetuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memoof__deepcopy__, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotationgives objectxas the parent or owner of the attribute (that is, the first element of theAnnotation._valuetuple isother) andid(x)is found inattribute_object_mapper, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]. Ifattribute_object_mapperisNone(default), then the following mapping is automatically inserted:id(other): self. That is, any references tootherin anyAnnotationobject will be remapped toself. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
otherin any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxaif they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appending to the sequence currently associated with thatTaxonreference inself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill replace the sequence currently associated with thatTaxonreference inself.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appended to the sequence currently associated with thatTaxonreference inself.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
selfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxoninstance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxonobjects and sequences as needed.Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. If key is specified as string, then it will be dereferenced to the first existingTaxonobject in the current taxon namespace with the same label. If no suchTaxonobject can be found, then a newTaxonobject is created and added to the current namespace. If a key is specified as aTaxonobject, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence, then they are added as-is. OtherwiseCharacterDataSequenceinstances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()will be called for this.Examples
The following creates a
DnaCharacterMatrixinstance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxonobjects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix) – Instance ofCharacterMatrixto populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs.case_sensitive_taxon_labels (boolean) – If
True, matching of string labels specified as keys indwill be matched toTaxonobjects in current taxon namespace with case being respected. IfFalse, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrixwhen creating new instance to populate, if no target instance is provided viachar_matrix.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrixpopulated by data fromd.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
Noneotherwise.taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinstance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespacevalue will be replaced with value given intaxon_namespaceif this is notNone, or a newTaxonNamespaceobject. Following this,reconstruct_taxon_namespace()will be called: each distinctTaxonobject associated withselfor members ofselfthat is not alread intaxon_namespacewill be replaced with a newTaxonobject that will be created with the same label and added toself.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitivesetting. IfFalseandunify_taxa_by_labelis alsoTrue, then the establishment of correspondence betweenTaxonobjects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxonobjects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxonobject in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue: ifunify_taxa_by_labelisTrue,Taxonobjects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinto the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True, then references to distinctTaxonobjects with identical labels in the current namespace will be replaced with a reference to a singleTaxonobject in the new namespace. IfFalse: references to distinctTaxonobjects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memoof deepcopy, this is a dictionary that mapsTaxonobjects in the old namespace to correspondingTaxonobjects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxonobject in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequenceassociated withTaxontaxon, and populates it with values invalues.- Parameters:
- Returns:
s (
CharacterDataSequence) – A newCharacterDataSequenceassociated withTaxontaxon.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxoninstances in current namespace, and then pads out all sequences inselfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified. A combination ofCharacterMatrix.fill_taxaandCharacterMatrix.fill.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxoninstances associated withself.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self.
- purge_taxon_namespace()¶
Remove all
Taxoninstances inself.taxon_namespacethat are not associated withselfor any item inself.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()instead. Rebuildstaxon_namespacefrom scratch, or assignsTaxonobjects from givenTaxonNamespaceobjecttaxon_namespacebased on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insathat has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxa. A KeyError is raised if aTaxoninstance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to replace sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequenceobjects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxonobjects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None[default], then all characters are included.gaps_as_missing (boolean) – If
True[default] then gap characters will be treated as missing data values. IfFalse, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxonobjects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to update sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.
- update_taxon_namespace()¶
All
Taxonobjects inselfthat are not inself.taxon_namespacewill be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
selfinschemaformat.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest.
RestrictionSitesCharacterMatrix: Restriction Sites Data¶
- class dendropy.datamodel.charmatrixmodel.RestrictionSitesCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrixfor restriction site data.- __delitem__(key)¶
Removes sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence) – A sequence associated with theTaxoninstance referenced bykey.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
valuesto taxon specified bykey, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to add sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toselfas a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
RestrictionSitesCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_setof top-level object and memberAnnotationobjects: these are full, independent instances (though any complex objects in thevaluefield ofAnnotationobjects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxoninstances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
valuesto type of matrix.This method is called by
CharacterMatrix.from_dictto create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvaluesconsists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix) would dereference the string elements ofvaluesto return a list ofStateIdentityobjects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvaluesshould be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other, which must be ofAnnotabletype.Copies are deep-copies, in that the
Annotationobjects added to theannotation_setAnnotationSetcollection ofselfare independent copies of those in theannotate_setcollection ofother. However, dynamic bound-attribute annotations retain references to the original objects as given inother, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper. In dynamic bound-attribute annotations, the_valueattribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs theAnnotationobject to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which theAnnotationSetbelongs (i.e.,self). When a copy ofAnnotationis created, the object reference given in the first element of the_valuetuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memoof__deepcopy__, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotationgives objectxas the parent or owner of the attribute (that is, the first element of theAnnotation._valuetuple isother) andid(x)is found inattribute_object_mapper, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]. Ifattribute_object_mapperisNone(default), then the following mapping is automatically inserted:id(other): self. That is, any references tootherin anyAnnotationobject will be remapped toself. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
otherin any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxaif they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appending to the sequence currently associated with thatTaxonreference inself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill replace the sequence currently associated with thatTaxonreference inself.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appended to the sequence currently associated with thatTaxonreference inself.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
selfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxoninstance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxonobjects and sequences as needed.Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. If key is specified as string, then it will be dereferenced to the first existingTaxonobject in the current taxon namespace with the same label. If no suchTaxonobject can be found, then a newTaxonobject is created and added to the current namespace. If a key is specified as aTaxonobject, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence, then they are added as-is. OtherwiseCharacterDataSequenceinstances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()will be called for this.Examples
The following creates a
DnaCharacterMatrixinstance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxonobjects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix) – Instance ofCharacterMatrixto populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs.case_sensitive_taxon_labels (boolean) – If
True, matching of string labels specified as keys indwill be matched toTaxonobjects in current taxon namespace with case being respected. IfFalse, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrixwhen creating new instance to populate, if no target instance is provided viachar_matrix.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrixpopulated by data fromd.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
Noneotherwise.taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinstance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespacevalue will be replaced with value given intaxon_namespaceif this is notNone, or a newTaxonNamespaceobject. Following this,reconstruct_taxon_namespace()will be called: each distinctTaxonobject associated withselfor members ofselfthat is not alread intaxon_namespacewill be replaced with a newTaxonobject that will be created with the same label and added toself.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitivesetting. IfFalseandunify_taxa_by_labelis alsoTrue, then the establishment of correspondence betweenTaxonobjects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxonobjects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxonobject in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue: ifunify_taxa_by_labelisTrue,Taxonobjects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinto the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True, then references to distinctTaxonobjects with identical labels in the current namespace will be replaced with a reference to a singleTaxonobject in the new namespace. IfFalse: references to distinctTaxonobjects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memoof deepcopy, this is a dictionary that mapsTaxonobjects in the old namespace to correspondingTaxonobjects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxonobject in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequenceassociated withTaxontaxon, and populates it with values invalues.- Parameters:
- Returns:
s (
CharacterDataSequence) – A newCharacterDataSequenceassociated withTaxontaxon.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxoninstances in current namespace, and then pads out all sequences inselfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified. A combination ofCharacterMatrix.fill_taxaandCharacterMatrix.fill.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxoninstances associated withself.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self.
- purge_taxon_namespace()¶
Remove all
Taxoninstances inself.taxon_namespacethat are not associated withselfor any item inself.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()instead. Rebuildstaxon_namespacefrom scratch, or assignsTaxonobjects from givenTaxonNamespaceobjecttaxon_namespacebased on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insathat has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxa. A KeyError is raised if aTaxoninstance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to replace sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequenceobjects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxonobjects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None[default], then all characters are included.gaps_as_missing (boolean) – If
True[default] then gap characters will be treated as missing data values. IfFalse, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxonobjects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to update sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.
- update_taxon_namespace()¶
All
Taxonobjects inselfthat are not inself.taxon_namespacewill be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
selfinschemaformat.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest.
InfiniteSitesCharacterMatrix : Infinite Sites Data¶
- class dendropy.datamodel.charmatrixmodel.InfiniteSitesCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrixfor infinite sites data.- __delitem__(key)¶
Removes sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence) – A sequence associated with theTaxoninstance referenced bykey.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
valuesto taxon specified bykey, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to add sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toselfas a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
InfiniteSitesCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_setof top-level object and memberAnnotationobjects: these are full, independent instances (though any complex objects in thevaluefield ofAnnotationobjects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxoninstances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)¶
Converts elements of
valuesto type of matrix.This method is called by
CharacterMatrix.from_dictto create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvaluesconsists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix) would dereference the string elements ofvaluesto return a list ofStateIdentityobjects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvaluesshould be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other, which must be ofAnnotabletype.Copies are deep-copies, in that the
Annotationobjects added to theannotation_setAnnotationSetcollection ofselfare independent copies of those in theannotate_setcollection ofother. However, dynamic bound-attribute annotations retain references to the original objects as given inother, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper. In dynamic bound-attribute annotations, the_valueattribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs theAnnotationobject to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which theAnnotationSetbelongs (i.e.,self). When a copy ofAnnotationis created, the object reference given in the first element of the_valuetuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memoof__deepcopy__, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotationgives objectxas the parent or owner of the attribute (that is, the first element of theAnnotation._valuetuple isother) andid(x)is found inattribute_object_mapper, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]. Ifattribute_object_mapperisNone(default), then the following mapping is automatically inserted:id(other): self. That is, any references tootherin anyAnnotationobject will be remapped toself. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
otherin any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxaif they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appending to the sequence currently associated with thatTaxonreference inself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill replace the sequence currently associated with thatTaxonreference inself.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appended to the sequence currently associated with thatTaxonreference inself.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
selfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxoninstance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxonobjects and sequences as needed.Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. If key is specified as string, then it will be dereferenced to the first existingTaxonobject in the current taxon namespace with the same label. If no suchTaxonobject can be found, then a newTaxonobject is created and added to the current namespace. If a key is specified as aTaxonobject, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence, then they are added as-is. OtherwiseCharacterDataSequenceinstances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()will be called for this.Examples
The following creates a
DnaCharacterMatrixinstance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxonobjects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix) – Instance ofCharacterMatrixto populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs.case_sensitive_taxon_labels (boolean) – If
True, matching of string labels specified as keys indwill be matched toTaxonobjects in current taxon namespace with case being respected. IfFalse, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrixwhen creating new instance to populate, if no target instance is provided viachar_matrix.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrixpopulated by data fromd.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
Noneotherwise.taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinstance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespacevalue will be replaced with value given intaxon_namespaceif this is notNone, or a newTaxonNamespaceobject. Following this,reconstruct_taxon_namespace()will be called: each distinctTaxonobject associated withselfor members ofselfthat is not alread intaxon_namespacewill be replaced with a newTaxonobject that will be created with the same label and added toself.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitivesetting. IfFalseandunify_taxa_by_labelis alsoTrue, then the establishment of correspondence betweenTaxonobjects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxonobjects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxonobject in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue: ifunify_taxa_by_labelisTrue,Taxonobjects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinto the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True, then references to distinctTaxonobjects with identical labels in the current namespace will be replaced with a reference to a singleTaxonobject in the new namespace. IfFalse: references to distinctTaxonobjects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memoof deepcopy, this is a dictionary that mapsTaxonobjects in the old namespace to correspondingTaxonobjects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxonobject in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequenceassociated withTaxontaxon, and populates it with values invalues.- Parameters:
- Returns:
s (
CharacterDataSequence) – A newCharacterDataSequenceassociated withTaxontaxon.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxoninstances in current namespace, and then pads out all sequences inselfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified. A combination ofCharacterMatrix.fill_taxaandCharacterMatrix.fill.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxoninstances associated withself.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self.
- purge_taxon_namespace()¶
Remove all
Taxoninstances inself.taxon_namespacethat are not associated withselfor any item inself.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()instead. Rebuildstaxon_namespacefrom scratch, or assignsTaxonobjects from givenTaxonNamespaceobjecttaxon_namespacebased on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insathat has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxa. A KeyError is raised if aTaxoninstance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to replace sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequenceobjects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxonobjects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None[default], then all characters are included.gaps_as_missing (boolean) – If
True[default] then gap characters will be treated as missing data values. IfFalse, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxonobjects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to update sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.
- update_taxon_namespace()¶
All
Taxonobjects inselfthat are not inself.taxon_namespacewill be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
selfinschemaformat.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest.
StandardCharacterMatrix: “Standard” Data¶
- class dendropy.datamodel.charmatrixmodel.StandardCharacterMatrix(*args, **kwargs)[source]¶
Specializes
CharacterMatrixfor “standard” data (i.e., generic discrete character data).A default state alphabet consisting of state symbols of 0-9 will automatically be created unless the
default_state_alphabet=Noneis passed in. To specify a different default state alphabet:default_state_alphabet=dendropy.new_standard_state_alphabet("abc") default_state_alphabet=dendropy.new_standard_state_alphabet("ij")
- __delitem__(key)¶
Removes sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- __getitem__(key)¶
Retrieves sequence for
key, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.- Returns:
s (
CharacterDataSequence) – A sequence associated with theTaxoninstance referenced bykey.
- __iter__()¶
Returns an iterator over character map’s ordered keys.
- __len__()¶
Number of sequences in matrix.
- Returns:
n (Number of sequences in matrix.)
- __setitem__(key, values)¶
Assigns sequence
valuesto taxon specified bykey, which can be a index or a label of aTaxoninstance in the current taxon namespace, or aTaxoninstance directly.If no sequence is currently associated with specified
Taxon, a new one will be created. Note that theTaxonobject must have already been defined in the curent taxon namespace.- Parameters:
key (integer, string, or
Taxon) – If an integer, assumed to be an index of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. If a string, assumed to be a label of aTaxonobject in the currentTaxonNamespaceobject ofself.taxon_namespace. Otherwise, assumed to beTaxoninstance directly. In all cases, theTaxonobject must be (already) defined in the current taxon namespace.
- add_character_subset(char_subset)¶
Adds a CharacterSubset object. Raises an error if one already exists with the same label.
- add_sequences(other_matrix)¶
Adds sequences for
Taxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to add sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toselfas a shallow-copy.All other sequences will be ignored.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- character_sequence_type¶
alias of
StandardCharacterDataSequence
- clear()¶
Removes all sequences from matrix.
- clone(depth=1)¶
Creates and returns a copy of
self.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_setof top-level object and memberAnnotationobjects: these are full, independent instances (though any complex objects in thevaluefield ofAnnotationobjects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxoninstances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- coerce_values(values)[source]¶
Converts elements of
valuesto type of matrix.This method is called by
CharacterMatrix.from_dictto create sequences from iterables of values. This method should be overridden by derived classes to ensure thatvaluesconsists of types compatible with the particular type of matrix. For example, a CharacterMatrix type with a fixed state alphabet (such asDnaCharacterMatrix) would dereference the string elements ofvaluesto return a list ofStateIdentityobjects corresponding to the symbols represented by the strings. If there is no value-type conversion done, thenvaluesshould be returned as-is. If no value-type conversion is possible (e.g., when the type of a value is dependent on positionaly information), then a TypeError should be raised.- Parameters:
values (iterable) – Iterable of values to be converted.
- Returns:
v (list of values.)
- classmethod concatenate(char_matrices)¶
Creates and returns a single character matrix from multiple CharacterMatrix objects specified as a list, ‘char_matrices’. All the CharacterMatrix objects in the list must be of the same type, and share the same TaxonNamespace reference. All taxa must be present in all alignments, all all alignments must be of the same length. Component parts will be recorded as character subsets.
- classmethod concatenate_from_paths(paths, schema, **kwargs)¶
Read a character matrix from each file path given in
paths, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the and return the combined character matrix. Component parts will be recorded as character subsets.
- classmethod concatenate_from_streams(streams, schema, **kwargs)¶
Read a character matrix from each file object given in
streams, assuming data format/schemaschema, and passing any keyword arguments down to the underlying specialized reader. Merge the character matrices and return the combined character matrix. Component parts will be recorded as character subsets.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other, which must be ofAnnotabletype.Copies are deep-copies, in that the
Annotationobjects added to theannotation_setAnnotationSetcollection ofselfare independent copies of those in theannotate_setcollection ofother. However, dynamic bound-attribute annotations retain references to the original objects as given inother, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper. In dynamic bound-attribute annotations, the_valueattribute of the annotations object (Annotation._value) is a tuple consisting of “(obj, attr_name)”, which instructs theAnnotationobject to return “getattr(obj, attr_name)” (via: “getattr(*self._value)”) when returning the value of the Annotation. “obj” is typically the object to which theAnnotationSetbelongs (i.e.,self). When a copy ofAnnotationis created, the object reference given in the first element of the_valuetuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memoof__deepcopy__, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotationgives objectxas the parent or owner of the attribute (that is, the first element of theAnnotation._valuetuple isother) andid(x)is found inattribute_object_mapper, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]. Ifattribute_object_mapperisNone(default), then the following mapping is automatically inserted:id(other): self. That is, any references tootherin anyAnnotationobject will be remapped toself. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
otherin any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- description(depth=1, indent=0, itemize='', output=None)¶
Returns description of object, up to level
depth.
- discard_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxaif they exist.
- export_character_indices(indices)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the 0-based indices in
indices. Note that this new matrix will still reference the same taxon set.
- export_character_subset(character_subset)¶
Returns a new CharacterMatrix (of the same type) consisting only of columns given by the CharacterSubset,
character_subset. Note that this new matrix will still reference the same taxon set.
- extend_matrix(other_matrix)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appending to the sequence currently associated with thatTaxonreference inself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill replace the sequence currently associated with thatTaxonreference inself.
- extend_sequences(other_matrix, is_add_new_sequences=False)¶
Extends sequences in
selfwith characters associated with correspondingTaxonobjects inother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to extend sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixthat is also inselfwill be appended to the sequence currently associated with thatTaxonreference inself.All other sequences will be ignored.
- fill(value, size=None, append=True)¶
Pads out all sequences in
selfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- fill_taxa()¶
Adds a new (empty) sequence for each
Taxoninstance in current taxon namespace that does not have a sequence.
- folded_site_frequency_spectrum(is_pad_vector_to_unfolded_length=False)¶
Returns the folded or minor site/allele frequency spectrum.
Given $N$ chromosomes, the site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_N)$, where the value $f_i$ is the number of sites where $i$ derived alleles are segregating in the sample: 0 alleles, 1 allele, 2 alleles, etc.
The folded site frequency spectrum is a vector $(f_0, f_1, f_2, …, f_m), m = ceil{frac{N}{2}}$, where the values are the number of minor alleles in the site.
- Parameters:
is_pad_vector_to_unfolded_length (bool) – If False, then the vector length will be $ceil{frac{N}{2}}$, where $N$ is the number of taxa. Otherwise, by default, True, length of vector will be number of taxa + 1, with the first element the number of monomorphic sites not contributing to the site frequency spectrum.
- Returns:
v (list[int]) – A vector of integers representing the folded site frequency spectrum.
- classmethod from_dict(source_dict, char_matrix=None, case_sensitive_taxon_labels=False, **kwargs)¶
Populates character matrix from dictionary (or similar mapping type), creating
Taxonobjects and sequences as needed.Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. If key is specified as string, then it will be dereferenced to the first existingTaxonobject in the current taxon namespace with the same label. If no suchTaxonobject can be found, then a newTaxonobject is created and added to the current namespace. If a key is specified as aTaxonobject, then this is used directly. If it is not in the current taxon namespace, it will be added.Values are the sequences (more generally, iterable of values). If values are of type
CharacterDataSequence, then they are added as-is. OtherwiseCharacterDataSequenceinstances are created for them. Values may be coerced into types compatible with particular matrices. The classmethodcoerce_values()will be called for this.Examples
The following creates a
DnaCharacterMatrixinstance with three sequences:d = { "s1" : "TCCAA", "s2" : "TGCAA", "s3" : "TG-AA", } dna = DnaCharacterMatrix.from_dict(d)
Three
Taxonobjects will be created, corresponding to the labels ‘s1’, ‘s2’, ‘s3’. Each associated string sequence will be converted to aCharacterDataSequence, with each symbol (“A”, “C”, etc.) being replaced by the DNA state represented by the symbol.- Parameters:
source_dict (dict or other mapping type) – Keys must be strings representing labels
Taxonobjects orTaxonobjects directly. Values are sequences. See above for details.char_matrix (
CharacterMatrix) – Instance ofCharacterMatrixto populate with data. If not specified, a new one will be created using keyword arguments specified bykwargs.case_sensitive_taxon_labels (boolean) – If
True, matching of string labels specified as keys indwill be matched toTaxonobjects in current taxon namespace with case being respected. IfFalse, then case will be ignored.**kwargs (keyword arguments, optional) – Keyword arguments to be passed to constructor of
CharacterMatrixwhen creating new instance to populate, if no target instance is provided viachar_matrix.
- Returns:
char_matrix (|CharacterMatrix|) –
CharacterMatrixpopulated by data fromd.
- classmethod get(**kwargs)¶
Instantiate and return a new character matrix object from a data source.
Mandatory Source-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file”, “path”, “data”, or “url” argument specified above: “fasta”, “nexus”, or “nexml”, “phylip”, etc. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
label (str) – Name or identifier to be assigned to the new object; if not given, will be assigned the one specified in the data source, or
Noneotherwise.taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinstance to use to manage the taxon names. If not specified, a new one will be created.matrix_offset (int) – 0-based index of character block or matrix in source to be parsed. If not specified then the first matrix (offset = 0) is assumed.
ignore_unrecognized_keyword_arguments (bool) – If
True, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dna1 = dendropy.DnaCharacterMatrix.get( file=open("pythonidae.fasta"), schema="fasta") dna2 = dendropy.DnaCharacterMatrix.get( url="http://purl.org/phylo/treebase/phylows/matrix/TB2:M2610?format=nexus", schema="nexus") aa1 = dendropy.ProteinCharacterMatrix.get( file=open("pythonidae.dat"), schema="phylip") std1 = dendropy.StandardCharacterMatrix.get( path="python_morph.nex", schema="nexus") std2 = dendropy.StandardCharacterMatrix.get( data=">t1\n01011\n\n>t2\n11100", schema="fasta")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- items()¶
Returns character map key, value pairs in key-order.
- property max_sequence_size¶
Maximum number of characters across all sequences in matrix.
- Returns:
n (integer) – Maximum number of characters across all sequences in matrix.
- migrate_taxon_namespace(taxon_namespace, unify_taxa_by_label=True, taxon_mapping_memo=None)¶
Move this object and all members to a new operational taxonomic unit concept namespace scope.
Current
self.taxon_namespacevalue will be replaced with value given intaxon_namespaceif this is notNone, or a newTaxonNamespaceobject. Following this,reconstruct_taxon_namespace()will be called: each distinctTaxonobject associated withselfor members ofselfthat is not alread intaxon_namespacewill be replaced with a newTaxonobject that will be created with the same label and added toself.taxon_namespace. Calling this method results in the object (and all its member objects) being associated with a new, independent taxon namespace.Label mapping case sensitivity follows the
self.taxon_namespace.is_case_sensitivesetting. IfFalseandunify_taxa_by_labelis alsoTrue, then the establishment of correspondence betweenTaxonobjects in the old and new namespaces with be based on case-insensitive matching of labels. E.g., if there are fourTaxonobjects with labels ‘Foo’, ‘Foo’, ‘FOO’, and ‘FoO’ in the old namespace, then all objects that reference these will reference a single newTaxonobject in the new namespace (with a label some existing casing variant of ‘foo’). IfTrue: ifunify_taxa_by_labelisTrue,Taxonobjects with labels identical except in case will be considered distinct.- Parameters:
taxon_namespace (
TaxonNamespace) – TheTaxonNamespaceinto the scope of which this object will be moved.unify_taxa_by_label (boolean, optional) – If
True, then references to distinctTaxonobjects with identical labels in the current namespace will be replaced with a reference to a singleTaxonobject in the new namespace. IfFalse: references to distinctTaxonobjects will remain distinct, even if the labels are the same.taxon_mapping_memo (dictionary) – Similar to
memoof deepcopy, this is a dictionary that mapsTaxonobjects in the old namespace to correspondingTaxonobjects in the new namespace. Mostly for interal use when migrating complex data to a new namespace. Note that any mappings here take precedence over all other options: if aTaxonobject in the old namespace is found in this dictionary, the counterpart in the new namespace will be whatever value is mapped, regardless of, e.g. label values.
Examples
Use this method to move an object from one taxon namespace to another.
For example, to get a copy of an object associated with another taxon namespace and associate it with a different namespace:
# Get handle to the new TaxonNamespace other_taxon_namespace = some_other_data.taxon_namespace # Get a taxon-namespace scoped copy of a tree # in another namespace t2 = Tree(t1) # Replace taxon namespace of copy t2.migrate_taxon_namespace(other_taxon_namespace)
You can also use this method to get a copy of a structure and then move it to a new namespace:
t2 = Tree(t1) t2.migrate_taxon_namespace(TaxonNamespace())
# Note: the same effect can be achived by: t3 = copy.deepcopy(t1)
See also
- new_character_subset(label, character_indices)¶
Defines a set of character (columns) that make up a character set. Raises an error if one already exists with the same label. Column indices are 0-based.
- new_sequence(taxon, values=None)¶
Creates a new
CharacterDataSequenceassociated withTaxontaxon, and populates it with values invalues.- Parameters:
- Returns:
s (
CharacterDataSequence) – A newCharacterDataSequenceassociated withTaxontaxon.
- pack(value=None, size=None, append=True)¶
Adds missing sequences for all
Taxoninstances in current namespace, and then pads out all sequences inselfby addingvalueto each sequence until its length issizelong or equal to the length of the longest sequence ifsizeis not specified. A combination ofCharacterMatrix.fill_taxaandCharacterMatrix.fill.- Parameters:
value (object) – A valid value (e.g., a numeric value for continuous characters, or a
StateIdentityfor discrete character).size (integer or None) – The size (length) up to which the sequences will be padded. If
None, then the maximum (longest) sequence size will be used.append (boolean) – If
True(default), then new values will be added to the end of each sequence. IfFalse, then new values will be inserted to the front of each sequence.
- poll_taxa(taxa=None)¶
Returns a set populated with all of
Taxoninstances associated withself.- Parameters:
taxa (set()) – Set to populate. If not specified, a new one will be created.
- Returns:
taxa (set[|Taxon|]) – Set of taxa associated with
self.
- purge_taxon_namespace()¶
Remove all
Taxoninstances inself.taxon_namespacethat are not associated withselfor any item inself.
- reconstruct_taxon_namespace(unify_taxa_by_label=True, taxon_mapping_memo=None)¶
- reindex_taxa(taxon_namespace=None, clear=False)¶
DEPRECATED: Use
migrate_taxon_namespace()instead. Rebuildstaxon_namespacefrom scratch, or assignsTaxonobjects from givenTaxonNamespaceobjecttaxon_namespacebased on label values.
- remap_to_default_state_alphabet_by_symbol(purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to the default state alphabet, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element in the default state alphabet that has the same symbol. Raises ValueError if no matching symbol can be found.
- remap_to_state_alphabet_by_symbol(state_alphabet, purge_other_state_alphabets=True)¶
All entities with any reference to a state alphabet will be have the reference reassigned to state alphabet
sa, and all entities with any reference to a state alphabet element will be have the reference reassigned to any state alphabet element insathat has the same symbol. Raises KeyError if no matching symbol can be found.
- remove_sequences(taxa)¶
Removes sequences associated with
Taxoninstances specified intaxa. A KeyError is raised if aTaxoninstance is specified for which there is no associated sequences.
- replace_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrix.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to replace sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.All other sequences will be ignored.
- property sequence_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- sequences()¶
List of all sequences in self.
- Returns:
s (list of
CharacterDataSequenceobjects in self)
- taxon_namespace_scoped_copy(memo=None)¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespaceandTaxonobjects: these are preserved as references.
- taxon_state_sets_map(char_indices=None, gaps_as_missing=True, gap_state=None, no_data_state=None)¶
Returns a dictionary that maps taxon objects to lists of sets of fundamental state indices.
- Parameters:
char_indices (iterable of ints) – An iterable of indexes of characters to include (by column). If not given or
None[default], then all characters are included.gaps_as_missing (boolean) – If
True[default] then gap characters will be treated as missing data values. IfFalse, then they will be treated as an additional (fundamental) state.`
- Returns:
d (dict) – A dictionary with class:
Taxonobjects as keys and a list of sets of fundamental state indexes as values.E.g., Given the following matrix of DNA characters:
T1 AGN T2 C-T T3 GC?
Return with
gaps_as_missing==True{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([0,1,2,3]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3]) ], }
Return with
gaps_as_missing==False{ <T1> : [ set([0]), set([2]), set([0,1,2,3]) ], <T2> : [ set([1]), set([4]), set([3]) ], <T3> : [ set([2]), set([1]), set([0,1,2,3,4]) ], }
Note that when gaps are treated as a fundamental state, not only does ‘-’ map to a distinct and unique state (4), but ‘?’ (missing data) maps to set consisting of all bases and the gap state, whereas ‘N’ maps to a set of all bases but not including the gap state.
When gaps are treated as missing, on the other hand, then ‘?’ and ‘N’ and ‘-’ all map to the same set, i.e. of all the bases.
- update_sequences(other_matrix)¶
Replaces sequences for
Taxonobjects shared betweenselfandother_matrixand adds sequences forTaxonobjects that are inother_matrixbut not inself.- Parameters:
other_matrix (
CharacterMatrix) – Matrix from which to update sequences.
Notes
other_matrixmust be of same type asself.other_matrixmust have the sameTaxonNamespaceasself.Each sequence associated with a
Taxonreference inother_matrixbut not inselfwill be added toself.Each sequence in
selfassociated with aTaxonthat is also represented inother_matrixwill be replaced with a shallow-copy of the corresponding sequence fromother_matrix.
- update_taxon_namespace()¶
All
Taxonobjects inselfthat are not inself.taxon_namespacewill be added.
- values()¶
Iterates values (i.e. sequences) in this matrix.
- property vector_size¶
Number of characters in first sequence in matrix.
- Returns:
n (integer) – Number of sequences in matrix.
- write(**kwargs)¶
Writes out
selfinschemaformat.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest.


