dendropy.datamodel.charstatemodel: Character State Identities and Alphabets

The StateAlphabet Class

class dendropy.datamodel.charstatemodel.StateAlphabet(fundamental_states=None, ambiguous_states=None, polymorphic_states=None, symbol_synonyms=None, no_data_symbol=None, gap_symbol=None, label=None, case_sensitive=True)[source]

A master registry mapping state symbols to their definitions.

There are two classes or “denominations” of states:

  • fundamental states

    These are the basic, atomic, self-contained states of the alphabet, distinct and mutually-exclusive from every other fundamental state. E.g., for DNA: adenine, guanine, cytosine, and thymine.

  • multi-state states

    The states are second-level or “pseudo-states”, in that they are not properly states in and of themselves, but rather each consist of a set of other states. That is, a multi-state state is a set of two or more fundamental states. Multi-state states are of one of two types: “ambiguous” and “polymorphic” states. “Ambiguous” states represent states in which the true fundamental state is unknown, but consists of one of the fundamental states to which the ambiguous states map. “Polymorphic” states represent states in which the entity actually has multiple fundamental states simultaneously. “Ambiguous” states are an expression of uncertainty or lack of knowledge about the identity of state. With “polymorphic” states, on the other hand, there is no uncertaintly or lack of knowledge about the state: the state is known definitively, and it consists of multiple fundamental states. An example of an ambiguous state would be ‘N’, representing any base in molecular sequence data. An example of a polymorphic state would be the range of a widespread species found in multiple geographic units. Note that multi-state states can be specified in terms of other multi-state states, but that upon instantiation, these member multi-states will be expanded to their fundamental states.

State definitions or identities are immutable: their symbology and mappings cannot be changed after creation/initialization. State definitions and identities, however, can be added/removed from a state alphabet.

Parameters:
  • label (string, optional) – The name for this state alphabet.

  • fundamental_states (iterable of strings) – An iterable of symbols defining the fundamental (i.e., non-ambiguous and non-polymorphic states of this alphabet), with a 1-to-1 correspodence between symbols and states. Each state will also be automatically indexed base on its position in this list. For DNA, this would be something like: 'ACGT' or ('A', 'C', 'G', T'). For “standard” characters, this would be something like '01' or ('0', '1').

  • no_data_symbol (string) – If specified, automatically creates a “no data” ambiguous state, represented by the (canonical, or primary) symbol “no_data_symbol”, which maps to all fundamental states. This will also insert None into all symbol look-up maps, which, when dereferenced will return this state. Furthermore, the attribute self.no_data_symbol will return this symbol and self.no_data_state will return this state. The ‘no data’ state will be an ambiguous multistate type.

  • ambiguous_states (iterable of tuples) – An iterable consisting of tuples expressing ambiguous state symbols and the set of symbols representing the fundamental states to which they map. The first element in the tuple is the symbol used to represent the ambiguous state; this can be blank (“”), but if not blank it needs to be unique across all symbols (including case-variants if the state alphabet is case-insensitive). The second element is an iterable of fundamental state symbols to which this ambiguous state maps. The fundamental state symbols must have already been defined, i.e. given in the value passed to fundamental_states. Note: a dictionary may seem like a more tractable structure than iterable of tuples, but we may need to specify multiple anonymous or blank ambiguous states.

  • polymorphic_states (iterable of tuples) – An iterable consisting of tuples expressing polymorphic state symbols and the set of symbols representing the fundamental states to which they map. The first element in the tuple is the symbol used to represent the polymorphic state; this can be blank (“”), but if not blank it needs to be unique across all symbols (including case-variants if the state alphabet is case-insensitive). The second element is an iterable of fundamental state symbols to which this polymorphic state maps. The fundamental state symbols must have already been defined, i.e. given in the value passed to fundamental_states. Note: a dictionary may seem like a more tractable structure than iterable of tuples, but we may need to specify multiple anonymous or blank polymorphic states.

  • symbol_synonyms (dictionary) – A mapping of symbols, with keys being the new symbols and values being (already-defined) symbols of states to which they map. This provides a mechanism by which states with multiple symbols can be managed. For example, an ambiguous state, “unknown”, representing all fundamental states might be defined with ‘?’ as its primary symbol, and a synonym symbol for this state might be ‘X’.

__getitem__(key)[source]

Returns state identity corresponding to key.

Parameters:

key (integer or string) – If and integer value, looks up and returns state identity by index. If a string value, looks up and returns state identity by symbol.

Returns:

s (|StateIdentity| instance) – Returns a StateIdentity corresponding to key.

Raises:

KeyError if key is not valid.

__iter__()[source]

Returns StateAlphabet.state_iter: iterator over all state identities.

__len__()[source]

Number of states.

ambiguous_state_iter()[source]

Returns an iterator over all ambiguous state identities.

ambiguous_symbol_iter(include_synonyms=True)[source]

Returns an iterator over all symbols (including synonyms, unless include_synonyms is False) that map to ambiguous states.

property canonical_symbol_state_map

Dictionary with state symbols as keys and states as values. Does not include symbol synonyms or case variations.

compile_lookup_mappings()[source]

Builds lookup tables/mappings for quick referencing and dereferencing of symbols/states.

compile_member_states_lookup_mappings()[source]

Builds lookup tables/mappings for quick referencing and dereferencing of ambiguous/polymorphic states based on the fundamental states to which they map.

compile_symbol_lookup_mappings()[source]

Builds lookup tables/mappings for quick referencing and dereferencing of state symbology.

property full_symbol_state_map

Dictionary with state symbols as keys and states as values. Includes symbol synonyms or case variations.

fundamental_state_iter()[source]

Returns an iterator over all fundamental state identities.

fundamental_symbol_iter(include_synonyms=True)[source]

Returns an iterator over all symbols (including synonyms, unless include_synonyms is False) that map to fundamental states.

get_canonical_symbol_for_symbol(symbol)[source]

Returns the canonical state symbol for the state to which symbol maps. E.g., in a DNA alphabet, return ‘A’ for ‘a’.

Parameters:

symbol (string) –

Returns:

s (string) – Canonical symbol for state with symbol or synonym symbol of symbol.

get_fundamental_states_for_symbols(symbols)[source]

Returns list of fundamental states corresponding to symbols.

Parameters:

symbols (iterable of symbols) –

Returns:

s (list of |StateIdentity|) – A list of fundamental StateIdentity instances corresponding to symbols given in symbols, with multi-state states expanded into their fundamental symbols.

get_states_for_symbols(symbols)[source]

Returns list of states corresponding to symbols.

Parameters:

symbols (iterable of symbols) –

Returns:

s (list of |StateIdentity|) – A list of StateIdentity instances corresponding to symbols given in symbols.

match_ambiguous_state(symbols)[source]

Returns ambiguous state with fundamental member states represented by symbols given in symbols.

Parameters:

symbols (iterable of symbols) –

Returns:

s (|StateIdentity| instance)

match_polymorphic_state(symbols)[source]

Returns polymorphic state with fundamental member states represented by symbols given in symbols.

Parameters:

symbols (iterable of symbols) –

Returns:

s (|StateIdentity| instance)

match_state(symbols, state_denomination)[source]

Returns ambiguous or polymorphic state with fundamental member states represented by symbols given in symbols.

Parameters:
  • symbols (iterable of string symbols) – Symbols representing states to be dereferenced.

  • state_denomination ({StateAlphabet.AMBIGUOUS or StateAlphabet.POLYPMORPHIC_STATE}) –

Returns:

s (|StateIdentity| instance)

multistate_state_iter()[source]

Returns an iterator over all ambiguous and polymorphic state identities.

multistate_symbol_iter(include_synonyms=True)[source]

Returns an iterator over all symbols (including synonyms, unless include_synonyms is False) that map to multistate states.

new_ambiguous_state(symbol, **kwargs)[source]

Adds a new ambiguous state to the collection of states in this alphabet.

Parameters:
  • symbol (string or None) – The symbol used to represent this state. Cannot have previously been used to refer to any other state, fundamental or otherwise, as a primary or synonymous symbol (including implicit synonyms given by case-variants if the state alphabet is not case-sensitive). Can be blank (“”) or None if there.

  • **kwargs (keyword arguments, mandatory) –

    Exactly one of the following must be specified:

    member_state_symbolsiterable of strings

    List of symbols representing states to which this state maps. Symbols representing multistates will taken to refer to the set of fundamental states to which they, in turn, map.

    member_statesiterable of StateIdentity objects

    List of StateIdentity representing states to which this state maps.

Returns:

s (|StateIdentity|) – The new state created and added.

new_fundamental_state(symbol)[source]

Adds a new fundamental state to the collection of states in this alphabet.

Parameters:

symbol (string) – The symbol used to represent this state. Cannot have previously been used to refer to any other state, fundamental or otherwise, as a primary or synonymous symbol (including implicit synonyms given by case-variants if the state alphabet is not case-sensitive). Cannot be blank (“”) or None.

Returns:

s (|StateIdentity|) – The new state created and added.

new_multistate(symbol, state_denomination, **kwargs)[source]

Adds a new polymorphic or ambiguous state to the collection of states in this alphabet.

Parameters:
  • symbol (string or None) – The symbol used to represent this state. Cannot have previously been used to refer to any other state, fundamental or otherwise, as a primary or synonymous symbol (including implicit synonyms given by case-variants if the state alphabet is not case-sensitive). Can be blank (“”) or None if there.

  • state_denomination (enum) – StateAlphabet.POLYMORPHIC_STATE or StateAlphabet.AMBIGUOUS_STATE

  • **kwargs (keyword arguments, mandatory) –

    Exactly one of the following must be specified:

    member_state_symbolsiterable of strings

    List of symbols representing states to which this state maps. Symbols representing multistates will taken to refer to the set of fundamental states to which they, in turn, map.

    member_statesiterable of StateIdentity objects

    List of StateIdentity representing states to which this state maps.

Returns:

s (|StateIdentity|) – The new state created and added.

new_polymorphic_state(symbol, **kwargs)[source]

Adds a new polymorphic state to the collection of states in this alphabet.

Parameters:
  • symbol (string or None) – The symbol used to represent this state. Cannot have previously been used to refer to any other state, fundamental or otherwise, as a primary or synonymous symbol (including implicit synonyms given by case-variants if the state alphabet is not case-sensitive). Can be blank (“”) or None if there.

  • **kwargs (keyword arguments, mandatory) –

    Exactly one of the following must be specified:

    member_state_symbolsiterable of strings

    List of symbols representing states to which this state maps. Symbols representing multistates will taken to refer to the set of fundamental states to which they, in turn, map.

    member_statesiterable of StateIdentity objects

    List of StateIdentity representing states to which this state maps.

Returns:

s (|StateIdentity|) – The new state created and added.

new_symbol_synonym(symbol_synonym, referenced_symbol)[source]

Defines an alternative symbol mapping for an existing state.

Parameters:
  • symbol_synonym (string) – The (new) alternative symbol.

  • referenced_symbol (string) – The symbol for the state to which the alternative symbol will also map.

Returns:

  • s (|StateIdentity|) – The state to which this synonym maps.

  • ——

polymorphic_state_iter()[source]

Returns an iterator over all polymorphic state identities.

polymorphic_symbol_iter(include_synonyms=True)[source]

Returns an iterator over all symbols (including synonyms, unless include_synonyms is False) that map to polymorphic states.

set_state_as_attribute(state, attr_name=None)[source]

Sets the given state as an attribute of this alphabet. The name of the attribute will be attr_name if specified, or the state symbol otherwise.

Parameters:
  • state (StateIdentity) – The state to be made an attribute of this alphabet.

  • attr_name (string) – The name of the attribute. If not specified, the state symbol will be used.

state_iter()[source]

Returns an iterator over all state identities.

property states

Tuple of all state identities in this alphabet.

symbol_state_pair_iter(include_synonyms=True)[source]

Returns an iterator over all symbols paired with the state to which the they symbols map.

property symbols

Tuple of all state symbols in this alphabet.

taxon_namespace_scoped_copy(memo=None)[source]

Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for TaxonNamespace and Taxon objects: these are preserved as references.

The StateIdentity Class

class dendropy.datamodel.charstatemodel.StateIdentity(symbol=None, index=None, state_denomination=0, member_states=None)[source]

A character state definition, which can either be a fundamental state or a mapping to a set of other character states (for polymorphic or ambiguous characters).

A state is immutable with respect to its definition and identity. Specifically, it ‘symbol’, ‘index’, ‘multistate’, and ‘member_states’ properties are set upon definition/creation, and after that are read-only.

Parameters:
  • symbol (string) – A text symbol or token representation of this character state. E.g., ‘G’ for the base guanine in a DNA state alphabet, or ‘1’ for presence of a wing in a morphological data set.

  • index (integer) – The (0-based) numeric index for this state in the state alphabet. E.g., for a DNA alphabet: 0 = ‘A’/adenine, 1 = ‘C’/cytosine, 2 = ‘G’/guanine, 3 = ‘T’/thymine. Or for a “standard” alphabet: 0 = ‘0’, 1 = ‘1’. Note that ambiguous and polymorphic state definitions typically are not indexed.

  • state_denomination ('enum') – One of: StateAlphabet.FUNDAMENTAL_STATE, StateAlphabet.AMBIGUOUS_STATE, or StateAlphabet.POLYMORPHIC_STATE.

  • member_states (iterable of StateIdentity instances.) – If a multi-state, then a collection of StateIdentity instances to which this state maps.

property fundamental_indexes

Returns a tuple of fundamental state indexes (i.e., tuple of index values of single states) to which this state maps.

property fundamental_indexes_with_gaps_as_missing

Returns a tuple of fundamental state indexes (i.e., tuple of index values of single states) to which this state maps, with gaps being substituted with missing (no-data) states.

property fundamental_states

Returns a tuple of fundamental states (i.e., tupe of single states) to which this state maps.

property fundamental_symbols

Returns a tuple of fundamental state symbols (i.e., tuple of symbols representing single states) to which this state maps.

is_exact_correspondence(other)[source]

Tries to determine if two StateIdentity definitions are equivalent by matching symbols.

property is_fundamental_state

True if a FUNDAMENTAL state.

property is_single_state

True if a FUNDAMENTAL state.

property member_states

Returns the (fundamental) member states that this state maps to if not itself a fundamental state.

property member_states_str

Representation of member states of self.

property state_denomination

Type of multi-statedness: FUNDAMENTAL (not a multistate), AMBIGUOUS, or POLYMORPHIC.

property symbol

Canonical (primary) symbol of this state.

property symbol_synonyms

The collection of symbol synonyms (alternatives/equivalents to the canonical symbol) which also map to this state.

taxon_namespace_scoped_copy(memo=None)[source]

Cloning level: 1. Taxon-namespace-scoped copy: All member objects are full independent instances, except for TaxonNamespace and Taxon objects: these are preserved as references.