Working with GenBank Molecular Sequence Databases¶

The genbank module provides the classes and methods to download sequences from GenBank and instantiate them into DendroPy phylogenetic data objects. Three classes are provided, all of which have an identical interface, varying only in the type of data retrieved:

GenBankDna

Acquire and manage DNA sequence data from the GenBank Nucleotide database.

GenBankRna

Acquire and manage RNA sequence data from the GenBank Nucleotide database.

GenBankProtein

Acquire and manage AA sequence data from the GenBank Protein database.

Quick Start¶

The basic way to retrieve sequence data is create a GenBankDna, GenBankRna, or GenBankProtein object, and pass in a list of identifiers to be retrieved using the “ids” argument. The value of this argument should be a container with either GenBank accession identifiers or GI numbers:

>>> from dendropy.interop import genbank
>>> gb_dna = genbank.GenBankDna(ids=['EU105474', 'EU105475'])
>>> for gb in gb_dna:
...     print(gb)
gi|158930545|gb|EU105474.1| Homo sapiens Ache non-coding region T864 genomic sequence
gi|158930546|gb|EU105475.1| Homo sapiens Arara non-coding region T864 genomic sequence

The records are stored as GenBankAccessionRecord objects. These records store the full information available in a GenBank record, including the references, feature table, qualifiers, and other details, and these are available as attributes of the GenBankAccessionRecord objects (e.g., “primary_accession”, “taxonomy”, “feature_table” and so on).

To generate a CharacterMatrix object from the collection of sequences, call the generate_char_matrix method:

>>> from dendropy.interop import genbank
>>> gb_dna = genbank.GenBankDna(ids=['EU105474', 'EU105475'])
>>> char_matrix = gb_dna.generate_char_matrix()
>>> print(char_matrix.as_string("nexus"))
#NEXUS
BEGIN TAXA;

    DIMENSIONS NTAX=2;
    TAXLABELS
        EU105474
        EU105475
;
END;
BEGIN CHARACTERS;
    DIMENSIONS  NCHAR=494;
    FORMAT DATATYPE=DNA GAP=- MISSING=? MATCHCHAR=.;
    MATRIX
EU105474    TCTCTTATCA...
EU105475    TCTCTTATCA...
;
END;

As can be seen, by default the taxon labels assigned to the sequences are set to the identifier used to request the sequences. This, and many other aspects of the character matrix generation, including annotation of taxa and sequences, can be customized, as discussed in detail below.

Acquiring Data from GeneBank¶

The GenBankDna, GenBankRna, and GenBankProtein classes provide for the downloading and management of DNA, RNA, and protein (AA) sequences from GenBank. The first two of these query the “nucleotide” or “nuccore” database, while the last queries the “protein” database. The constructors of these classes accept the following arguments:

ids

A list of accession identifiers of GI numbers of the records to be downloaded. E.g. “ids=['EU105474', 'EU105475']”, “ids=['158930545', 'EU105475']”, or “ids=['158930545', '158930546']”. If “prefix” is specified, this string will be pre-pended to all values in the list.

id_range
A tuple of integers that specify the first and last values (inclusive) of accession or GI numbers of the records to be downloaded. If “prefix” is specified, this string will be prepended to all numbers in this range. Thus specifying “id_range=(158930545, 158930550)” is exactly equivalent to specifying “ids=[158930545, 158930546, 158930547, 158930548, 158930549, 158930550]”, while specifying “id_range=(105474, 105479), prefix="EU"” is exactly equivalent tp specifying “ids=["EU105474", "EU105475", "EU105476", "EU105477", "EU105478", "EU105479"]”.

prefix
This string will be prepended to all values resulting from the “ids” and “id_range”.

verify
By default, the results of the download are checked to make sure there is a one-to-one correspondence between requested id’s and retrieved records. Setting “verify=False” skips this checking.

So, for example, the following are all different ways of instantiating GenBank resource data store:

>>> from dendropy.interop import genbank
>>> gb_dna = genbank.GenBankDna(ids=['EU105474', 'EU105475'])
>>> gb_dna = genbank.GenBankDna(ids=['158930545', 'EU105475'])
>>> gb_dna = genbank.GenBankDna(ids=['158930545', '158930546'])
>>> gb_dna = genbank.GenBankDna(ids=['105474', '105475'], prefix="EU")
>>> gb_dna = genbank.GenBankDna(id_range=(105474, 105478), prefix="EU")
>>> gb_dna = genbank.GenBankDna(id_range=(158930545, 158930546))

You can add more records to an existing instance of GenBankDna, GenBankRna, or GenBankProtein objects by using the “acquire” or “acquire_range” methods. The “acquire” method takes a sequence of accession identifiers or GI numbers for the first argument (”ids”), and, in addition an optional string prefix to be prepended can be supplied using the second argument, “prefix”, while verification can be disabled by specifying False for the third argument, “verify”. The “acquire_range” method takes two mandatory integer arguments: the first and last value of the range of accession or GI numbers of the records to be downloaded. As with the other method, a string prefix to be prepended can be optionally supplied using the argument “prefix”, while verification can be disabled by specifying “verify=|False|”. For example:

>>> from dendropy.interop import genbank
>>> gb_dna = genbank.GenBankDna(['EU105474', 'EU105475'])
>>> print(len(gb_dna))
>>> gb_dna.acquire([158930547, 158930548])
>>> print(len(gb_dna))
>>> gb_dna.acquire_range(105479, 105480, prefix="EU")
>>> print(len(gb_dna))
2
4
6

Accessing GenBank Records¶

The GenBank records accumulated in GenBankDna, GenBankRna, and GenBankProtein objects are represented by collections of GenBankAccessionRecord objects. Each of these GenBankAccessionRecord objects represent the full information from the GenBank source as a rich Python object.

>>> from dendropy.interop import genbank
>>> gb_dna = genbank.GenBankDna(['EU105474', 'EU105475'])
>>> for gb_rec in gb_dna:
...    print(gb_rec.gi)
...    print(gb_rec.locus)
...    print(gb_rec.length)
...    print(gb_rec.moltype)
...    print(gb_rec.topology)
...    print(gb_rec.strandedness)
...    print(gb_rec.division)
...    print(gb_rec.update_date)
...    print(gb_rec.create_date)
...    print(gb_rec.definition)
...    print(gb_rec.primary_accession)
...    print(gb_rec.accession_version)
...    print("(other seq ids)")
...    for osi_key, osi_value in gb_rec.other_seq_ids.items():
...        print("    ", osi_key, osi_value)
...    print(gb_rec.source)
...    print(gb_rec.organism)
...    print(gb_rec.taxonomy)
...    print("(references)")
...    for ref in gb_rec.references:
...        print("    ", ref.number , ref.position , ref.authors , ref.consrtm , ref.title , ref.journal , ref.medline_id , ref.pubmed_id , ref.remark)
...    print("(feature_table)")
...    for feature in gb_rec.feature_table:
...        print("    ", feature.key, feature.location)
...        for qualifier in feature.qualifiers:
...            print("        ", qualifier.name, qualifier.value)
...
158930545
EU105474
494
DNA
linear
double
PRI
27-NOV-2007
27-NOV-2007
Homo sapiens Ache non-coding region T864 genomic sequence
EU105474
EU105474.1
(other seq ids)
    gb EU105474.1
    gi 158930545
Homo sapiens (human)
Homo sapiens
Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Eutel...
(references)
    1 1..494 [] None Statistical evaluation of alternativ...
    2 1..494 [] None Direct Submission Submitted (17-AUG-...
(feature_table)
    source 1..494
        organism Homo sapiens
        mol_type genomic DNA
        db_xref taxon:9606
        chromosome 18
        note Ache
    misc_feature 1..494
        note non-coding region T864
.
.
.
(etc.)

Generating Character Matrix Objects from GenBank Data¶

The “generate_char_matrix()” method of GenBankDna, GenBankRna, and GenBankProtein objects creates and returns a CharacterMatrix object of the appropriate type out of the data collected in them. When called without any arguments, it generates a new TaxonNamespace block, creating one new Taxon object for every sequence in the collection with a label corresponding to the identifier used to request the sequence:

>>> from dendropy.interop import genbank
>>> gb_dna = genbank.GenBankDna(ids=[158930545, 'EU105475'])
>>> char_matrix = gb_dna.generate_char_matrix()
>>> print(char_matrix.as_string("nexus"))
#NEXUS

BEGIN TAXA;

    DIMENSIONS NTAX=2;
    TAXLABELS
        158930545
        EU105475
;
END;

BEGIN CHARACTERS;
    DIMENSIONS  NCHAR=494;
    FORMAT DATATYPE=DNA GAP=- MISSING=? MATCHCHAR=.;
    MATRIX
158930545    TCTCTTATCAAACTA...
EU105475     TCTCTTATCAAACTA...
    ;
END;


BEGIN SETS;
END;

Customizing/Controlling Sequence Taxa¶

The taxon assignment can be controlled in one of two ways:

Using the “label_components” and optionally the “label_component_separator” arguments.

Specifying a custom function using the “gb_to_taxon_fn” argument that takes two arguments, a GenBankAccessionRecord object and a TaxonNamespace argument, and returns the Taxon object to be assigned to the sequence; this approach requires specification of a TaxonNamespace object passed using the “taxon_namespace” argument.

Changed in version 4.4.0: (previous versions took just the first argument, the GenBankAccessionRecord object; this meant client code had to independently keep track and use the correct TaxonNamespace object to create Taxon objects. Now, the function has direct access to it).

Specifying a Custom Label for Sequence Taxa¶

The “label_components” and the “label_component_separator” arguments allow for customization of the taxon labels of the Taxon objects created for each sequence. The “label_components” argument should be assigned an ordered container (e.g., a list) of strings that correspond to attributes of objects of the GenBankAccessionRecord class. The values of these attributes will be concatenated to compose the Taxon object label. By default, the components will be separated by spaces, but you can override this by passing the string to be used by the “label_component_separator” argument. For example:

>>> from dendropy.interop import genbank
>>> gb_dna = genbank.GenBankDna(ids=[158930545, 'EU105475'])
>>> char_matrix = gb_dna.generate_char_matrix(
... label_components=["accession", "organism", ],
... label_component_separator="_")
>>> print([t.label for t in char_matrix.taxon_namespace])
['EU105474_Homo_sapiens', 'EU105475_Homo_sapiens']
>>> char_matrix = gb_dna.generate_char_matrix(
... label_components=["organism", "moltype", "gi"],
... label_component_separator=".")
>>> print([t.label for t in char_matrix.taxon_namespace])
['Homo.sapiens.DNA.158930545', 'Homo.sapiens.DNA.158930546']

Specifying a Custom Taxon-Discovery Function¶

Full control over the Taxon object assignment process is given by using the “gb_to_taxon_fn” argument. This should be used to specify a function that takes a GenBankAccessionRecord object and returns the Taxon object to be assigned to the sequence. The specification of a TaxonNamespace object passed using the “taxon_namespace” argument is also required, so that this can be assigned to the CharacterMatrix object.

A simple example that illustrates the usage of the “gb_to_taxon_fn” argument by creating a custom label:

#! /usr/bin/env python

import dendropy
from dendropy.interop import genbank

def gb_to_taxon(gb, taxon_namespace):
    locality = gb.feature_table.find("source").qualifiers.find("note").value
    label = "GI" + gb.gi + "." + locality
    taxon = taxon_namespace.require_taxon(label=label)
    return taxon

taxon_namespace = dendropy.TaxonNamespace()

gb_dna = genbank.GenBankDna(ids=[158930545, 'EU105475'])
char_matrix = gb_dna.generate_char_matrix(
    taxon_namespace=taxon_namespace,
    gb_to_taxon_fn=gb_to_taxon)
print([t.label for t in char_matrix.taxon_namespace])

which results in:

['GI158930545.Ache', 'GI158930546.Arara']

A more complex case might be where you may already have a TaxonNamespace with existing Taxon objects that you may want to associate with the sequences. The following illustrates how to do this:

#! /usr/bin/env python

import dendropy
from dendropy.interop import genbank

tree = dendropy.Tree.get_from_string(
    "(Ache, (Arara, (Bribri, (Guatuso, Guaymi))))",
    "newick")
def gb_to_taxon(gb, taxon_namespace):
    locality = gb.feature_table.find("source").qualifiers.find("note").value
    taxon = tree.taxon_namespace.get_taxon(label=locality)
    assert taxon is not None
    return taxon

gb_ids = [158930545, 158930546, 158930547, 158930548, 158930549]

gb_dna = genbank.GenBankDna(ids=gb_ids)
char_matrix = gb_dna.generate_char_matrix(
    taxon_namespace=tree.taxon_namespace,
    gb_to_taxon_fn=gb_to_taxon)
print([t.label for t in char_matrix.taxon_namespace])
print(tree.taxon_namespace is char_matrix.taxon_namespace)
for taxon in tree.taxon_namespace:
    print("{}: {}".format(
        taxon.label,
        char_matrix[taxon].symbols_as_string()[:10]))

which results in:

True
Ache: TCTCTTATCA
Arara: TCTCTTATCA
Bribri: TCTCTTATCA
Guatuso: TCTCTTATCA
Guaymi: TCTCTTATCA
['Ache', 'Arara', 'Bribri', 'Guatuso', 'Guaymi']

The important thing to note here is the the Taxon objects in the DnaCharacterMatrix do not just have the same labels as the Taxon object in the Tree, “tree”, but actually are the same objects (i.e., reference the same operational taxonomic units within DendroPy).

Adding the GenBank Record as an Attribute¶

It is sometimes useful to maintain a handle on the original GenBank record in the CharacterMatrix resulting from “generate_char_matrix()”. The “set_taxon_attr” and “set_seq_attr” arguments of the “generate_char_matrix()” method allow you to this. The values supplied to these arguments should be strings that specify the name of the attribute that will be created on the Taxon or CharacterDataSequence objects, respectively. The value of this attribute will be the GenBankAccessionRecord that underlies the Taxon or CharacterDataSequence sequence. For example:

#! /usr/bin/env python

import dendropy
from dendropy.interop import genbank
gb_dna = genbank.GenBankDna(ids=[158930545, 'EU105475'])
char_matrix = gb_dna.generate_char_matrix(set_taxon_attr="gb_rec")
for taxon in char_matrix.taxon_namespace:
    print("Data for taxon '{}' is based on GenBank record: {}".format(
        taxon.label,
        taxon.gb_rec.definition))

will result in:

Data for taxon '158930545' is based on GenBank record: Homo sapiens Ache non-coding region T864 genomic sequence
Data for taxon 'EU105475' is based on GenBank record: Homo sapiens Arara non-coding region T864 genomic sequence

Alternatively, the following:

#! /usr/bin/env python

import dendropy
from dendropy.interop import genbank
gb_dna = genbank.GenBankDna(ids=[158930545, 'EU105475'])
char_matrix = gb_dna.generate_char_matrix(set_seq_attr="gb_rec")
for sidx, sequence in enumerate(char_matrix.vectors()):
    print("Sequence {} ('{}') is based on GenBank record: {}".format(
        sidx+1,
        char_matrix.taxon_namespace[sidx].label,
        sequence.gb_rec.defline))

will result in:

Sequence 1 ('158930545') is based on GenBank record: gi|158930545|gb|EU105474.1| Homo sapiens Ache non-coding region T864 genomic sequence
Sequence 2 ('EU105475') is based on GenBank record: gi|158930546|gb|EU105475.1| Homo sapiens Arara non-coding region T864 genomic sequence

Annotating with GenBank Data and Metadata¶

To persist the information in a the GenBankAccessionRecord object through serialization and deserialization, you can request that this information gets added as an Annotation (see “Working with Metadata Annotations”) to the corresponding Taxon or CharacterDataSequence object.

Reference Annotation¶

Specifying “add_ref_annotation_to_taxa=True” will result in a reference-style metadata annotation added to the Taxon object, while specifying “add_ref_annotation_to_seqs=True” will result in a reference-style metadata annotation added to the sequence. The reference-style annotation is brief, single annotation that points to the URL of the original record. As with metadata annotations in general, you really need to be using the NeXML format for full functionality.

So, for example:

#! /usr/bin/env python

import dendropy
from dendropy.interop import genbank
gb_dna = genbank.GenBankDna(ids=[158930545, 'EU105475'])
char_matrix = gb_dna.generate_char_matrix(add_ref_annotation_to_taxa=True)
print(char_matrix.as_string("nexml"))

will result in:

<?xml version="1.0" encoding="ISO-8859-1"?>
<nex:nexml
    version="0.9"
    xsi:schemaLocation="http://www.nexml.org/2009 ../xsd/nexml.xsd"
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns="http://www.nexml.org/2009"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xml="http://www.w3.org/XML/1998/namespace"
    xmlns:nex="http://www.nexml.org/2009"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
>
    <otus id="d4320533416">
        <otu id="d4323884688" label="158930545">
            <meta xsi:type="nex:ResourceMeta" rel="dcterms:source" href="http://www.ncbi.nlm.nih.gov/nucleotide/158930545" id="d4323884752" />
        </otu>
        <otu id="d4323884816" label="EU105475">
            <meta xsi:type="nex:ResourceMeta" rel="dcterms:source" href="http://www.ncbi.nlm.nih.gov/nucleotide/EU105475" id="d4323990736" />
        </otu>
    </otus>
    .
    .
    .
</nex:nexml>

Alternatively:

#! /usr/bin/env python

import dendropy
from dendropy.interop import genbank
gb_dna = genbank.GenBankDna(ids=[158930545, 'EU105475'])
char_matrix = gb_dna.generate_char_matrix(add_ref_annotation_to_seqs=True)
print(char_matrix.as_string("nexml"))

will result in:

<?xml version="1.0" encoding="ISO-8859-1"?>
<nex:nexml
    version="0.9"
    xsi:schemaLocation="http://www.nexml.org/2009 ../xsd/nexml.xsd"
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns="http://www.nexml.org/2009"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xml="http://www.w3.org/XML/1998/namespace"
    xmlns:nex="http://www.nexml.org/2009"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
>
        <matrix>
            <row id="d4320533856" otu="d4322811536">
                <meta xsi:type="nex:ResourceMeta" rel="dcterms:source" href="http://www.ncbi.nlm.nih.gov/nucleotide/158930545" id="d4322811600" />
                <seq>TCTCTTATCAAAC.../seq>
            </row>
            <row id="d4320534384" otu="d4322811664">
                <meta xsi:type="nex:ResourceMeta" rel="dcterms:source" href="http://www.ncbi.nlm.nih.gov/nucleotide/EU105475" id="d4322917584" />
                <seq>TCTCTTATCAAAC...</seq>
            </row>
        </matrix>
    </characters>
</nex:nexml>

Full Annotation¶

Specifying “add_full_annotation_to_taxa=True” or “add_full_annotation_to_seqs=True” will result in the entire GenBank record being added as a set of annotations to the Taxon or CharacterDataSequence object, respectively.

For example:

#! /usr/bin/env python

import dendropy
from dendropy.interop import genbank
gb_dna = genbank.GenBankDna(ids=[158930545, 'EU105475'])
char_matrix = gb_dna.generate_char_matrix(add_full_annotation_to_taxa=True)
print(char_matrix.as_string("nexml"))

will result in the following:

<?xml version="1.0" encoding="ISO-8859-1"?>
<nex:nexml
    version="0.9"
    xsi:schemaLocation="http://www.nexml.org/2009 ../xsd/nexml.xsd"
    xmlns:genbank="http://www.ncbi.nlm.nih.gov/dtd/INSD_INSDSeq.mod.dtd"
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns="http://www.nexml.org/2009"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xml="http://www.w3.org/XML/1998/namespace"
    xmlns:nex="http://www.nexml.org/2009"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
>
    <otus id="d4320533416">
        <otu id="d4323884688" label="158930545">
            <meta xsi:type="nex:ResourceMeta" rel="dcterms:source" href="http://www.ncbi.nlm.nih.gov/nucleotide/158930545" id="d4323884752" >
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_locus" content="EU105474" id="d4323884880" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_length" content="494" id="d4323884944" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_moltype" content="DNA" id="d4323885008" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_topology" content="linear" id="d4323901520" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_strandedness" content="double" id="d4323901584" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_division" content="PRI" id="d4323901648" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_update-date" content="27-NOV-2007" id="d4323901712" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_create-date" content="27-NOV-2007" id="d4323901776" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_definition" content="Homo sapiens Ache non-coding region T864 genomic sequence" id="d4323901840" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_primary-accesison" content="EU105474" id="d4323901904" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_accession-version" content="EU105474.1" id="d4323901968" />
                <meta xsi:type="nex:ResourceMeta" rel="genbank:otherSeqIds" id="d4323902032" >
                    <meta xsi:type="nex:LiteralMeta" property="genbank:gb" content="EU105474.1" id="d4323902160" />
                    <meta xsi:type="nex:LiteralMeta" property="genbank:gi" content="158930545" id="d4323902224" />
                </meta>
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_source" content="Homo sapiens (human)" id="d4323902096" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_organism" content="Homo sapiens" id="d4323902288" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_taxonomy" content="Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; Homo" id="d4323902352" />
                <meta xsi:type="nex:ResourceMeta" rel="genbank:INSDSeq_references" id="d4323902416" >
                    <meta xsi:type="nex:ResourceMeta" rel="genbank:INSDReference_reference" id="d4323902544" >
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDReference_reference" content="1" id="d4323902672" />
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDReference_position" content="1..494" id="d4323902736" />
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDReference_title" content="Statistical evaluation of alternative models of human evolution" id="d4323902800" />
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDReference_journal" content="Proc. Natl. Acad. Sci. U.S.A. 104 (45), 17614-17619 (2007)" id="d4323902864" />
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDReference_pubmed" content="17978179" id="d4323902928" />
                    </meta>
                    <meta xsi:type="nex:ResourceMeta" rel="genbank:INSDReference_reference" id="d4323902608" >
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDReference_reference" content="2" id="d4323903056" />
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDReference_position" content="1..494" id="d4323903120" />
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDReference_title" content="Direct Submission" id="d4323903184" />
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDReference_journal" content="Submitted (17-AUG-2007) Laboratorio de Biologia Genomica e Molecular, Pontificia Universidade Catolica do Rio Grande do Sul, Av Ipiranga 6681, Predio 12C, Sala 172, Porto Alegre, RS 90619-900, Brazil" id="d4323903248" />
                    </meta>
                </meta>
                <meta xsi:type="nex:ResourceMeta" rel="genbank:INSDSeq_feature-table" id="d4323902480" >
                    <meta xsi:type="nex:ResourceMeta" rel="genbank:INSDSeq_feature" id="d4323903312" >
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDFeature_key" content="source" id="d4323903440" />
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDFeature_location" content="1..494" id="d4323903504" />
                        <meta xsi:type="nex:ResourceMeta" rel="genbank:INSDFeature_quals" id="d4323903376" >
                            <meta xsi:type="nex:LiteralMeta" property="genbank:organism" content="Homo sapiens" id="d4323903632" />
                            <meta xsi:type="nex:LiteralMeta" property="genbank:mol_type" content="genomic DNA" id="d4323903696" />
                            <meta xsi:type="nex:LiteralMeta" property="genbank:db_xref" content="taxon:9606" id="d4323903760" />
                            <meta xsi:type="nex:LiteralMeta" property="genbank:chromosome" content="18" id="d4323903824" />
                            <meta xsi:type="nex:LiteralMeta" property="genbank:note" content="Ache" id="d4323903888" />
                        </meta>
                    </meta>
                    <meta xsi:type="nex:ResourceMeta" rel="genbank:INSDSeq_feature" id="d4323903568" >
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDFeature_key" content="misc_feature" id="d4323904016" />
                        <meta xsi:type="nex:LiteralMeta" property="genbank:INSDFeature_location" content="1..494" id="d4323904080" />
                        <meta xsi:type="nex:ResourceMeta" rel="genbank:INSDFeature_quals" id="d4323903952" >
                            <meta xsi:type="nex:LiteralMeta" property="genbank:note" content="non-coding region T864" id="d4323904208" />
                        </meta>
                    </meta>
                </meta>
            </meta>
        </otu>
        <otu id="d4323884816" label="EU105475">
            <meta xsi:type="nex:ResourceMeta" rel="dcterms:source" href="http://www.ncbi.nlm.nih.gov/nucleotide/EU105475" id="d4324005904" >
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_locus" content="EU105475" id="d4324006032" />
                <meta xsi:type="nex:LiteralMeta" property="genbank:INSDSeq_length" content="494" id="d4324006096" />
                .
                .
                .
                (etc.)
            </meta>
        </otu>
    </otus>
    .
    .
    .
    (etc.)
</nex:nexml>

Working with GenBank Molecular Sequence Databases¶

Quick Start¶

Acquiring Data from GeneBank¶

Accessing GenBank Records¶

Generating Character Matrix Objects from GenBank Data¶

Customizing/Controlling Sequence Taxa¶

Specifying a Custom Label for Sequence Taxa¶

Specifying a Custom Taxon-Discovery Function¶

Adding the GenBank Record as an Attribute¶

Annotating with GenBank Data and Metadata¶

Reference Annotation¶

Full Annotation¶

Table of Contents

Previous topic

Next topic

Documentation

Obtaining

Discussion

Announcements

Development