dendropy.datamodel.charstatemodel
: Character State Identities and Alphabets¶
The StateAlphabet
Class¶
- class dendropy.datamodel.charstatemodel.StateAlphabet(fundamental_states=None, ambiguous_states=None, polymorphic_states=None, symbol_synonyms=None, no_data_symbol=None, gap_symbol=None, label=None, case_sensitive=True)[source]¶
A master registry mapping state symbols to their definitions.
There are two classes or “denominations” of states:
- fundamental states
These are the basic, atomic, self-contained states of the alphabet, distinct and mutually-exclusive from every other fundamental state. E.g., for DNA: adenine, guanine, cytosine, and thymine.
- multi-state states
The states are second-level or “pseudo-states”, in that they are not properly states in and of themselves, but rather each consist of a set of other states. That is, a multi-state state is a set of two or more fundamental states. Multi-state states are of one of two types: “ambiguous” and “polymorphic” states. “Ambiguous” states represent states in which the true fundamental state is unknown, but consists of one of the fundamental states to which the ambiguous states map. “Polymorphic” states represent states in which the entity actually has multiple fundamental states simultaneously. “Ambiguous” states are an expression of uncertainty or lack of knowledge about the identity of state. With “polymorphic” states, on the other hand, there is no uncertaintly or lack of knowledge about the state: the state is known definitively, and it consists of multiple fundamental states. An example of an ambiguous state would be ‘N’, representing any base in molecular sequence data. An example of a polymorphic state would be the range of a widespread species found in multiple geographic units. Note that multi-state states can be specified in terms of other multi-state states, but that upon instantiation, these member multi-states will be expanded to their fundamental states.
State definitions or identities are immutable: their symbology and mappings cannot be changed after creation/initialization. State definitions and identities, however, can be added/removed from a state alphabet.
- Parameters:
label (string, optional) – The name for this state alphabet.
fundamental_states (iterable of strings) – An iterable of symbols defining the fundamental (i.e., non-ambiguous and non-polymorphic states of this alphabet), with a 1-to-1 correspodence between symbols and states. Each state will also be automatically indexed base on its position in this list. For DNA, this would be something like:
'ACGT'
or('A', 'C', 'G', T')
. For “standard” characters, this would be something like'01'
or('0', '1')
.no_data_symbol (string) – If specified, automatically creates a “no data” ambiguous state, represented by the (canonical, or primary) symbol “no_data_symbol”, which maps to all fundamental states. This will also insert
None
into all symbol look-up maps, which, when dereferenced will return this state. Furthermore, the attributeself.no_data_symbol
will return this symbol andself.no_data_state
will return this state. The ‘no data’ state will be an ambiguous multistate type.ambiguous_states (iterable of tuples) – An iterable consisting of tuples expressing ambiguous state symbols and the set of symbols representing the fundamental states to which they map. The first element in the tuple is the symbol used to represent the ambiguous state; this can be blank (“”), but if not blank it needs to be unique across all symbols (including case-variants if the state alphabet is case-insensitive). The second element is an iterable of fundamental state symbols to which this ambiguous state maps. The fundamental state symbols must have already been defined, i.e. given in the value passed to
fundamental_states
. Note: a dictionary may seem like a more tractable structure than iterable of tuples, but we may need to specify multiple anonymous or blank ambiguous states.polymorphic_states (iterable of tuples) – An iterable consisting of tuples expressing polymorphic state symbols and the set of symbols representing the fundamental states to which they map. The first element in the tuple is the symbol used to represent the polymorphic state; this can be blank (“”), but if not blank it needs to be unique across all symbols (including case-variants if the state alphabet is case-insensitive). The second element is an iterable of fundamental state symbols to which this polymorphic state maps. The fundamental state symbols must have already been defined, i.e. given in the value passed to
fundamental_states
. Note: a dictionary may seem like a more tractable structure than iterable of tuples, but we may need to specify multiple anonymous or blank polymorphic states.symbol_synonyms (dictionary) – A mapping of symbols, with keys being the new symbols and values being (already-defined) symbols of states to which they map. This provides a mechanism by which states with multiple symbols can be managed. For example, an ambiguous state, “unknown”, representing all fundamental states might be defined with ‘?’ as its primary symbol, and a synonym symbol for this state might be ‘X’.
- __getitem__(key)[source]¶
Returns state identity corresponding to
key
.- Parameters:
key (integer or string) – If and integer value, looks up and returns state identity by index. If a string value, looks up and returns state identity by symbol.
- Returns:
s (|StateIdentity| instance) – Returns a
StateIdentity
corresponding tokey
.- Raises:
KeyError if key is not valid. –
- __iter__()[source]¶
Returns
StateAlphabet.state_iter
: iterator over all state identities.
- ambiguous_symbol_iter(include_synonyms=True)[source]¶
Returns an iterator over all symbols (including synonyms, unless
include_synonyms
isFalse
) that map to ambiguous states.
- property canonical_symbol_state_map¶
Dictionary with state symbols as keys and states as values. Does not include symbol synonyms or case variations.
- compile_lookup_mappings()[source]¶
Builds lookup tables/mappings for quick referencing and dereferencing of symbols/states.
- compile_member_states_lookup_mappings()[source]¶
Builds lookup tables/mappings for quick referencing and dereferencing of ambiguous/polymorphic states based on the fundamental states to which they map.
- compile_symbol_lookup_mappings()[source]¶
Builds lookup tables/mappings for quick referencing and dereferencing of state symbology.
- property full_symbol_state_map¶
Dictionary with state symbols as keys and states as values. Includes symbol synonyms or case variations.
- fundamental_symbol_iter(include_synonyms=True)[source]¶
Returns an iterator over all symbols (including synonyms, unless
include_synonyms
isFalse
) that map to fundamental states.
- get_canonical_symbol_for_symbol(symbol)[source]¶
Returns the canonical state symbol for the state to which
symbol
maps. E.g., in a DNA alphabet, return ‘A’ for ‘a’.- Parameters:
symbol (string) –
- Returns:
s (string) – Canonical symbol for state with symbol or synonym symbol of
symbol
.
- get_fundamental_states_for_symbols(symbols)[source]¶
Returns list of fundamental states corresponding to symbols.
- Parameters:
symbols (iterable of symbols) –
- Returns:
s (list of |StateIdentity|) – A list of fundamental
StateIdentity
instances corresponding to symbols given insymbols
, with multi-state states expanded into their fundamental symbols.
- get_states_for_symbols(symbols)[source]¶
Returns list of states corresponding to symbols.
- Parameters:
symbols (iterable of symbols) –
- Returns:
s (list of |StateIdentity|) – A list of
StateIdentity
instances corresponding to symbols given insymbols
.
- match_ambiguous_state(symbols)[source]¶
Returns ambiguous state with fundamental member states represented by symbols given in
symbols
.- Parameters:
symbols (iterable of symbols) –
- Returns:
s (|StateIdentity| instance)
- match_polymorphic_state(symbols)[source]¶
Returns polymorphic state with fundamental member states represented by symbols given in
symbols
.- Parameters:
symbols (iterable of symbols) –
- Returns:
s (|StateIdentity| instance)
- match_state(symbols, state_denomination)[source]¶
Returns ambiguous or polymorphic state with fundamental member states represented by symbols given in
symbols
.- Parameters:
symbols (iterable of string symbols) – Symbols representing states to be dereferenced.
state_denomination ({StateAlphabet.AMBIGUOUS or StateAlphabet.POLYPMORPHIC_STATE}) –
- Returns:
s (|StateIdentity| instance)
- multistate_state_iter()[source]¶
Returns an iterator over all ambiguous and polymorphic state identities.
- multistate_symbol_iter(include_synonyms=True)[source]¶
Returns an iterator over all symbols (including synonyms, unless
include_synonyms
isFalse
) that map to multistate states.
- new_ambiguous_state(symbol, **kwargs)[source]¶
Adds a new ambiguous state to the collection of states in this alphabet.
- Parameters:
symbol (string or None) – The symbol used to represent this state. Cannot have previously been used to refer to any other state, fundamental or otherwise, as a primary or synonymous symbol (including implicit synonyms given by case-variants if the state alphabet is not case-sensitive). Can be blank (“”) or
None
if there.**kwargs (keyword arguments, mandatory) –
Exactly one of the following must be specified:
- member_state_symbolsiterable of strings
List of symbols representing states to which this state maps. Symbols representing multistates will taken to refer to the set of fundamental states to which they, in turn, map.
- member_statesiterable of
StateIdentity
objects List of
StateIdentity
representing states to which this state maps.
- Returns:
s (|StateIdentity|) – The new state created and added.
- new_fundamental_state(symbol)[source]¶
Adds a new fundamental state to the collection of states in this alphabet.
- Parameters:
symbol (string) – The symbol used to represent this state. Cannot have previously been used to refer to any other state, fundamental or otherwise, as a primary or synonymous symbol (including implicit synonyms given by case-variants if the state alphabet is not case-sensitive). Cannot be blank (“”) or
None
.- Returns:
s (|StateIdentity|) – The new state created and added.
- new_multistate(symbol, state_denomination, **kwargs)[source]¶
Adds a new polymorphic or ambiguous state to the collection of states in this alphabet.
- Parameters:
symbol (string or None) – The symbol used to represent this state. Cannot have previously been used to refer to any other state, fundamental or otherwise, as a primary or synonymous symbol (including implicit synonyms given by case-variants if the state alphabet is not case-sensitive). Can be blank (“”) or
None
if there.state_denomination (enum) – StateAlphabet.POLYMORPHIC_STATE or StateAlphabet.AMBIGUOUS_STATE
**kwargs (keyword arguments, mandatory) –
Exactly one of the following must be specified:
- member_state_symbolsiterable of strings
List of symbols representing states to which this state maps. Symbols representing multistates will taken to refer to the set of fundamental states to which they, in turn, map.
- member_statesiterable of
StateIdentity
objects List of
StateIdentity
representing states to which this state maps.
- Returns:
s (|StateIdentity|) – The new state created and added.
- new_polymorphic_state(symbol, **kwargs)[source]¶
Adds a new polymorphic state to the collection of states in this alphabet.
- Parameters:
symbol (string or None) – The symbol used to represent this state. Cannot have previously been used to refer to any other state, fundamental or otherwise, as a primary or synonymous symbol (including implicit synonyms given by case-variants if the state alphabet is not case-sensitive). Can be blank (“”) or
None
if there.**kwargs (keyword arguments, mandatory) –
Exactly one of the following must be specified:
- member_state_symbolsiterable of strings
List of symbols representing states to which this state maps. Symbols representing multistates will taken to refer to the set of fundamental states to which they, in turn, map.
- member_statesiterable of
StateIdentity
objects List of
StateIdentity
representing states to which this state maps.
- Returns:
s (|StateIdentity|) – The new state created and added.
- new_symbol_synonym(symbol_synonym, referenced_symbol)[source]¶
Defines an alternative symbol mapping for an existing state.
- Parameters:
symbol_synonym (string) – The (new) alternative symbol.
referenced_symbol (string) – The symbol for the state to which the alternative symbol will also map.
- Returns:
s (|StateIdentity|) – The state to which this synonym maps.
——
- polymorphic_symbol_iter(include_synonyms=True)[source]¶
Returns an iterator over all symbols (including synonyms, unless
include_synonyms
isFalse
) that map to polymorphic states.
- set_state_as_attribute(state, attr_name=None)[source]¶
Sets the given state as an attribute of this alphabet. The name of the attribute will be
attr_name
if specified, or the state symbol otherwise.- Parameters:
state (
StateIdentity
) – The state to be made an attribute of this alphabet.attr_name (string) – The name of the attribute. If not specified, the state symbol will be used.
- property states¶
Tuple of all state identities in this alphabet.
- symbol_state_pair_iter(include_synonyms=True)[source]¶
Returns an iterator over all symbols paired with the state to which the they symbols map.
- property symbols¶
Tuple of all state symbols in this alphabet.
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.
The StateIdentity
Class¶
- class dendropy.datamodel.charstatemodel.StateIdentity(symbol=None, index=None, state_denomination=0, member_states=None)[source]¶
A character state definition, which can either be a fundamental state or a mapping to a set of other character states (for polymorphic or ambiguous characters).
A state is immutable with respect to its definition and identity. Specifically, it ‘symbol’, ‘index’, ‘multistate’, and ‘member_states’ properties are set upon definition/creation, and after that are read-only.
- Parameters:
symbol (string) – A text symbol or token representation of this character state. E.g., ‘G’ for the base guanine in a DNA state alphabet, or ‘1’ for presence of a wing in a morphological data set.
index (integer) – The (0-based) numeric index for this state in the state alphabet. E.g., for a DNA alphabet: 0 = ‘A’/adenine, 1 = ‘C’/cytosine, 2 = ‘G’/guanine, 3 = ‘T’/thymine. Or for a “standard” alphabet: 0 = ‘0’, 1 = ‘1’. Note that ambiguous and polymorphic state definitions typically are not indexed.
state_denomination ('enum') – One of:
StateAlphabet.FUNDAMENTAL_STATE
,StateAlphabet.AMBIGUOUS_STATE
, orStateAlphabet.POLYMORPHIC_STATE
.member_states (iterable of
StateIdentity
instances.) – If a multi-state, then a collection ofStateIdentity
instances to which this state maps.
- property fundamental_indexes¶
Returns a tuple of fundamental state indexes (i.e., tuple of index values of single states) to which this state maps.
- property fundamental_indexes_with_gaps_as_missing¶
Returns a tuple of fundamental state indexes (i.e., tuple of index values of single states) to which this state maps, with gaps being substituted with missing (no-data) states.
- property fundamental_states¶
Returns a tuple of fundamental states (i.e., tupe of single states) to which this state maps.
- property fundamental_symbols¶
Returns a tuple of fundamental state symbols (i.e., tuple of symbols representing single states) to which this state maps.
- is_exact_correspondence(other)[source]¶
Tries to determine if two StateIdentity definitions are equivalent by matching symbols.
- property is_fundamental_state¶
True
if a FUNDAMENTAL state.
- property is_single_state¶
True
if a FUNDAMENTAL state.
- property member_states¶
Returns the (fundamental) member states that this state maps to if not itself a fundamental state.
- property member_states_str¶
Representation of member states of self.
- property state_denomination¶
Type of multi-statedness: FUNDAMENTAL (not a multistate), AMBIGUOUS, or POLYMORPHIC.
- property symbol¶
Canonical (primary) symbol of this state.
- property symbol_synonyms¶
The collection of symbol synonyms (alternatives/equivalents to the canonical symbol) which also map to this state.
- taxon_namespace_scoped_copy(memo=None)[source]¶
Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for
TaxonNamespace
andTaxon
objects: these are preserved as references.