dendropy.datamodel.datasetmodel
: Datasets – Aggregate Collections of Taxon, Character, and Tree Data¶
- class dendropy.datamodel.datasetmodel.DataSet(*args, **kwargs)[source]¶
A phylogenetic data object that coordinates collections of
TaxonNamespace
,TreeList
, and (various kinds of)CharacterMatrix
objects.A
DataSet
has three attributes:taxon_namespaces
A list of
TaxonNamespace
objects, each representing a distinct namespace for operational taxononomic unit concept definitions.tree_lists
A list of
TreeList
objects, each representing a collection ofTree
objects.char_matrices
A list of
CharacterMatrix
-derived objects (e.g.DnaCharacterMatrix
).
Multiple
TaxonNamespace
objects within aDataSet
are allowed so as to support reading/loading of data from external sources that have multiple independent taxon namespaces defined within the same source or document (e.g., a Mesquite file with multiple taxa blocks, or a NeXML file with multiple OTU sections). Ideally, however, this would not be how data is managed. Recommended idiomatic usage would be to use aDataSet
to manage multiple types of data that all share and reference the same, single taxon namespace.This convention can be enforced by setting the DataSet instance to “attached taxon namespace” mode:
ds = dendropy.DataSet() tns = dendropy.TaxonNamespace() ds.attach_taxon_namespace(tns)
After setting this mode, all subsequent data read or created will be coerced to use the same, common operational taxonomic unit concept namespace.
Note that unless there is a need to collect and serialize a collection of data to the same file or external source, it is probably better semantically to use more specific data structures (e.g., a
TreeList
object for trees or aDnaCharacterMatrix
object for an alignment). Similarly, when deserializing an external data source, if just a single type or collection of data is needed (e.g., the collection of trees from a file that includes both trees and an alignment), then it is semantically cleaner to deserialize the data into a more specific structure (e.g., aTreeList
to get all the trees). However, when deserializing a mixed external data source with, e.g. multiple alignments or trees and one or more alignments, and you need to access/use more than a single collection, it is more efficient to read the entire data source at once into aDataSet
object and then independently extract the data objects as you need them from the various collections.The constructor can take one argument. This can either be another
DataSet
instance or an iterable ofTaxonNamespace
,TreeList
, orCharacterMatrix
-derived instances.In the former case, the newly-constructed
DataSet
will be a shallow-copy clone of the argument.In the latter case, the newly-constructed
DataSet
will have the elements of the iterable added to the respective collections (taxon_namespaces
,tree_lists
, orchar_matrices
, as appropriate). This is essentially like callingDataSet.add
on each element separately.- add(data_object, **kwargs)[source]¶
Generic add for TaxonNamespace, TreeList or CharacterMatrix objects.
- add_char_matrix(char_matrix)[source]¶
Adds a
CharacterMatrix
orCharacterMatrix
-derived instance to this dataset if it is not already there.- Parameters:
char_matrix (
CharacterMatrix
) – TheCharacterMatrix
object to be added.
- add_taxon_namespace(taxon_namespace)[source]¶
Adds a taxonomic unit concept namespace represented by a
TaxonNamespace
instance to this dataset if it is not already there.- Parameters:
taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
object to be added.
- add_taxon_set(taxon_set)[source]¶
DEPRECATED: Use
add_taxon_namespace()
instead.
- add_tree_list(tree_list)[source]¶
Adds a
TreeList
instance to this dataset if it is not already there.
- as_string(schema, **kwargs)¶
Composes and returns string representation of the data.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
- attach_taxon_namespace(taxon_namespace=None)[source]¶
Forces all read() calls of this DataSet to use the same
TaxonNamespace
. Iftaxon_namespace
Iftaxon_namespace
is None, then a newTaxonNamespace
will be created, added toself.taxon_namespaces
, and that is theTaxonNamespace
that will be attached.
- attach_taxon_set(taxon_set=None)[source]¶
DEPRECATED: Use
attach_taxon_namespace()
instead.
- clone(depth=1)¶
Creates and returns a copy of
self
.- Parameters:
depth (integer) –
The depth of the copy:
0: shallow-copy: All member objects are references, except for :attr:
annotation_set
of top-level object and memberAnnotation
objects: these are full, independent instances (though any complex objects in thevalue
field ofAnnotation
objects are also just references).1: taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
instances: these are references.2: Exhaustive deep-copy: all objects are cloned.
- copy_annotations_from(other, attribute_object_mapper=None)¶
Copies annotations from
other
, which must be ofAnnotable
type.Copies are deep-copies, in that the
Annotation
objects added to theannotation_set
AnnotationSet
collection ofself
are independent copies of those in theannotate_set
collection ofother
. However, dynamic bound-attribute annotations retain references to the original objects as given inother
, which may or may not be desirable. This is handled by updated the objects to which attributes are bound via mappings found inattribute_object_mapper
. In dynamic bound-attribute annotations, the_value
attribute of the annotations object (Annotation._value
) is a tuple consisting of “(obj, attr_name)
”, which instructs theAnnotation
object to return “getattr(obj, attr_name)
” (via: “getattr(*self._value)
”) when returning the value of the Annotation. “obj
” is typically the object to which theAnnotationSet
belongs (i.e.,self
). When a copy ofAnnotation
is created, the object reference given in the first element of the_value
tuple of dynamic bound-attribute annotations are unchanged, unless the id of the object reference is fo- Parameters:
other (
Annotable
) – Source of annotations to copy.attribute_object_mapper (dict) – Like the
memo
of__deepcopy__
, maps object id’s to objects. The purpose of this is to update the parent or owner objects of dynamic attribute annotations. If a dynamic attributeAnnotation
gives objectx
as the parent or owner of the attribute (that is, the first element of theAnnotation._value
tuple isother
) andid(x)
is found inattribute_object_mapper
, then in the copy the owner of the attribute is changed toattribute_object_mapper[id(x)]
. Ifattribute_object_mapper
isNone
(default), then the following mapping is automatically inserted:id(other): self
. That is, any references toother
in anyAnnotation
object will be remapped toself
. If really no reattribution mappings are desired, then an empty dictionary should be passed instead.
- deep_copy_annotations_from(other, memo=None)¶
Note that all references to
other
in any annotation value (and sub-annotation, and sub-sub-sub-annotation, etc.) will be replaced with references toself
. This may not always make sense (i.e., a reference to a particular entity may be absolute regardless of context).
- detach_taxon_set()[source]¶
DEPRECATED: Use
detach_taxon_namespace()
instead.
- classmethod get(**kwargs)[source]¶
Instantiate and return a new
TreeList
object from a data source.Mandatory Source-Specification Keyword Argument (Exactly One Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
Optional General Keyword Arguments:
exclude_trees (bool) – If
True
, then all tree data in the data source will be skipped.exclude_chars (bool) – If
True
, then all character data in the data source will be skipped.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created.ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
dataset1 = dendropy.DataSet.get( path="pythonidae.chars_and_trees.nex", schema="nexus") dataset2 = dendropy.DataSet.get( url="http://purl.org/phylo/treebase/phylows/study/TB2:S1925?format=nexml", schema="nexml")
- classmethod get_from_path(src, schema, **kwargs)¶
Factory method to return new object of this class from file specified by string
src
.- Parameters:
src (string) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_stream(src, schema, **kwargs)¶
Factory method to return new object of this class from file-like object
src
.- Parameters:
src (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_string(src, schema, **kwargs)¶
Factory method to return new object of this class from string
src
.- Parameters:
src (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- classmethod get_from_url(src, schema, strip_markup=False, **kwargs)¶
Factory method to return a new object of this class from URL given by
src
.- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
pdo (phylogenetic data object) – New instance of object, constructed and populated from data given in source.
- new_char_matrix(char_matrix_type, *args, **kwargs)[source]¶
Creation and accession of new
CharacterMatrix
(of classchar_matrix_type
) intochars
of self.”
- new_taxon_namespace(*args, **kwargs)[source]¶
Creates a new
TaxonNamespace
object, according to the arguments given (passed toTaxonNamespace()
), and adds it to thisDataSet
.
- new_taxon_set(*args, **kwargs)[source]¶
DEPRECATED: Use
new_taxon_namespace()
instead.
- read(**kwargs)[source]¶
Add data to
self
from data source.Mandatory Source-Specification Keyword Argument (Exactly One Required):
file (file) – File or file-like object of data opened for reading.
path (str) – Path to file of data.
url (str) – URL of data.
data (str) – Data given directly.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data given by the “
file
”, “path
”, “data
”, or “url
” argument specified above: “newick”, “nexus”, or “nexml”. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional General Keyword Arguments:
exclude_trees (bool) – If
True
, then all tree data in the data source will be skipped.exclude_chars (bool) – If
True
, then all character data in the data source will be skipped.taxon_namespace (
TaxonNamespace
) – TheTaxonNamespace
instance to use to manage the taxon names. If not specified, a new one will be created unless the DataSet object is in attached taxon namespace mode (self.attached_taxon_namespace
is notNone
but assigned to a specificTaxonNamespace
instance).ignore_unrecognized_keyword_arguments (bool) – If
True
, then unsupported or unrecognized keyword arguments will not result in an error. Default isFalse
: unsupported keyword arguments will result in an error.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is interpreted and processed, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples:
ds = dendropy.DataSet() ds.read( path="pythonidae.chars_and_trees.nex", schema="nexus") ds.read( url="http://purl.org/phylo/treebase/phylows/study/TB2:S1925?format=nexml", schema="nexml")
- read_from_path(src, schema, **kwargs)¶
Reads data from file specified by
filepath
.- Parameters:
filepath (file or file-like) – Full file path to source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple
[integer]) – A value indicating size of data read, where “size” depends on the object:Tree
: undefinedTreeList
: number of treesCharacterMatrix
: number of sequencesDataSet
:tuple
(number of taxon namespaces, number of tree lists, number of matrices)
- read_from_stream(src, schema, **kwargs)¶
Reads from file (exactly equivalent to just
read()
, provided here as a separate method for completeness.- Parameters:
fileobj (file or file-like) – Source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple
[integer]) – A value indicating size of data read, where “size” depends on the object:Tree
: undefinedTreeList
: number of treesCharacterMatrix
: number of sequencesDataSet
:tuple
(number of taxon namespaces, number of tree lists, number of matrices)
- read_from_string(src, schema, **kwargs)¶
Reads a string.
- Parameters:
src_str (string) – Data as a string.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple
[integer]) – A value indicating size of data read, where “size” depends on the object:Tree
: undefinedTreeList
: number of treesCharacterMatrix
: number of sequencesDataSet
:tuple
(number of taxon namespaces, number of tree lists, number of matrices)
- read_from_url(src, schema, **kwargs)¶
Reads a URL source.
- Parameters:
src (string) – URL of location providing source of data.
schema (string) – Specification of data format (e.g., “nexus”).
kwargs (keyword arguments, optional) – Arguments to customize parsing, instantiation, processing, and accession of objects read from the data source, including schema- or format-specific handling. These will be passed to the underlying schema-specific reader for handling.
- Returns:
n (
tuple
[integer]) – A value indicating size of data read, where “size” depends on the object:Tree
: undefinedTreeList
: number of treesCharacterMatrix
: number of sequencesDataSet
:tuple
(number of taxon namespaces, number of tree lists, number of matrices)
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
- unify_taxon_namespaces(taxon_namespace=None, case_sensitive_label_mapping=True, attach_taxon_namespace=True)[source]¶
Reindices taxa across all subcomponents, mapping to single taxon set.
- write(**kwargs)¶
Writes out
self
inschema
format.Mandatory Destination-Specification Keyword Argument (Exactly One of the Following Required):
file (file) – File or file-like object opened for writing.
path (str) – Path to file to which to write.
Mandatory Schema-Specification Keyword Argument:
schema (str) – Identifier of format of data. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.
Optional Schema-Specific Keyword Arguments:
These provide control over how the data is formatted, and supported argument names and values depend on the schema as specified by the value passed as the “
schema
” argument. See “DendroPy Schemas: Phylogenetic and Evolutionary Biology Data Formats” for more details.Examples
# Using a file path: d.write(path="path/to/file.dat", schema="nexus") # Using an open file: with open("path/to/file.dat", "w") as f: d.write(file=f, schema="nexus")
- write_to_path(dest, schema, **kwargs)¶
Writes to file specified by
dest
.
- write_to_stream(dest, schema, **kwargs)¶
Writes to file-like object
dest
.