Taxonomy — Graphs
Taxonomy tree graphs
taxon_subtree materialises a rooted taxonomic subtree as an explicit directed graph — a TaxonomyTree wrapping a Graphs.jl SimpleDiGraph. Where child_taxa and parent_taxa return flat name lists, taxon_subtree preserves the full parent–child structure and enables graph algorithms, tree traversals, and subtree visualisation.
using PaleobiologyDB, PaleobiologyDB.Taxonomy
import GraphsBuilding a subtree
# Full descendant subtree of Carnivora (every rank, every node)
tree = taxon_subtree("Carnivora")
Graphs.nv(tree.graph) # → total number of taxa
Graphs.ne(tree.graph) # → total number of parent → child edges
# The root node
root_taxon(tree).name # → "Carnivora"
root_taxon(tree).rank # → "order"Truncating at a leaf rank
Pass leaf_rank to stop the traversal at a given rank. By default (strict_leaf_rank = true), only nodes at exactly leaf_rank become leaves; any taxa at finer ranks that are direct children of a coarser node (an PBDB data pattern common in less well-resolved groups) are excluded. The intermediate ranks between the root and leaf_rank are retained as interior nodes.
# Carnivora subtree truncated at family level (strict default)
# interior nodes: order, suborder, …, superfamily
# leaf nodes: family-rank taxa only; orphaned genera/species excluded
tree = taxon_subtree("Carnivora"; leaf_rank = "family")
Graphs.nv(tree.graph) # order + suborders + superfamilies + families
Graphs.ne(tree.graph) # one edge per parent → child pair
# leaf_taxa returns exclusively family-rank nodes
all(n.rank == "family" for n in leaf_taxa(tree)) # → true
# Genus-level subtree of Canidae (strict default)
# interior: family, tribe, subfamily, …
# leaves: genus-rank taxa only
t2 = taxon_subtree("Canidae"; leaf_rank = "genus")
# The leaf nodes are exactly the genera
leaf_taxa(t2) .|> (n -> n.name)
# → ["Borophagus", "Canis", "Urocyon", "Vulpes", …]
# Non-strict: also include orphaned finer-ranked taxa as leaves
t3 = taxon_subtree("Pterosauria"; leaf_rank = "family", strict_leaf_rank = false)
# Without leaf_rank → full descendant tree to finest available rank
t4 = taxon_subtree("Canis"; leaf_rank = nothing)
leaf_taxa(t4) |> length # number of species under CanisAccessor functions
tree = taxon_subtree("Carnivora"; leaf_rank = "family")
# Root node
r = root_taxon(tree)
r.name # → "Carnivora"
r.rank # → "order"
r.pbdb_id # → PBDB orig_no
# Leaf nodes (sorted by name)
leaves = leaf_taxa(tree)
leaves .|> (n -> n.name) # → ["Ailuridae", "Amphicyonidae", "Canidae", …]
# All nodes at a specific rank (sorted by name)
taxa_at_rank(tree, "family") # same as leaf_taxa when leaf_rank = "family"
# In a full (untruncated) tree, taxa_at_rank selects any rank
full_tree = taxon_subtree("Carnivora")
taxa_at_rank(full_tree, "genus") |> length # number of genera
taxa_at_rank(full_tree, "species") |> length # number of speciesTaxonNode fields
Every node in the tree is a TaxonNode carrying the full set of fields from the PBDB taxa list snapshot:
node = root_taxon(tree)
node.name # accepted taxon name (String)
node.rank # rank string (String, e.g. "order")
node.pbdb_id # PBDB orig_no (Int)
node.accepted_id # PBDB accepted_no (Union{Int,Missing})
# == pbdb_id for accepted (non-synonym) taxa
# != pbdb_id for synonyms — points to the valid name's orig_no
# missing when not recorded in the snapshot
node.parent_id # parent orig_no, or missing for the subtree root (Union{Int,Missing})Using the graph with Graphs.jl
The .graph field is a standard Graphs.SimpleDiGraph{Int}, so any algorithm from Graphs.jl that accepts an AbstractGraph works directly:
import Graphs
tree = taxon_subtree("Carnivora"; leaf_rank = "family")
g = tree.graph
# Basic metrics
Graphs.nv(g) # number of vertices
Graphs.ne(g) # number of edges
Graphs.is_directed(g) # → true (edges run parent → child)
# Neighbours of the root
Graphs.outneighbors(g, tree.root) # vertex indices of root's children
Graphs.inneighbors(g, tree.root) # → Int[] (root has no parent in subtree)
# Traverse from any vertex
v = tree.vertex_of[tree.taxa[1].pbdb_id] # vertex for a node by pbdb_idConnecting the tree to an occurrence DataFrame
A common workflow is to build a clade tree and then map occurrences onto it to see which sub-groups are represented in a dataset:
using PaleobiologyDB, PaleobiologyDB.Taxonomy
import Graphs
# Fetch occurrences and build the family tree
df = pbdb_occurrences(base_name = "Carnivora", interval = "Miocene", show = "full")
tree = taxon_subtree("Carnivora"; leaf_rank = "family")
# Which families appear in the occurrence data?
occ_families = Set(skipmissing(df.family))
sampled = [n for n in leaf_taxa(tree) if n.name in occ_families]
missing_ = [n for n in leaf_taxa(tree) if n.name ∉ occ_families]
println("$(length(sampled)) of $(length(leaf_taxa(tree))) families sampled in the Miocene")PaleobiologyDB.Taxonomy.TaxonNode — Type
TaxonNodeA single node in a taxonomic tree, carrying the full set of identity fields from the PBDB taxa list snapshot.
Fields
name::String— accepted taxon name (as in PBDB)rank::String— taxonomic rank (e.g."genus","family")pbdb_id::Int— PBDBorig_nointeger identifieraccepted_id::Union{Int,Missing}— PBDBaccepted_no; equalspbdb_idfor accepted (non-synonym) taxa, points to the valid name for synonyms,missingwhen not recorded in the snapshotparent_id::Union{Int,Missing}—orig_noof the parent node, ormissingwhen this node is the root of the subtree
Construction
TaxonNode("Canis", "genus", 41045, 41045, 2)
# name rank pbdb_id accepted_id parent_idSee also TaxonomyTree, taxon_subtree.
PaleobiologyDB.Taxonomy.TaxonomyTree — Type
TaxonomyTreeA rooted, directed tree representing a taxonomic subtree extracted from the PBDB taxa list snapshot.
Fields
graph::Graphs.SimpleDiGraph{Int}— directed graph; edges run parent → child. Vertices are integers in1 .. Graphs.nv(graph).taxa::Vector{TaxonNode}—taxa[v]is theTaxonNodefor vertexv.vertex_of::Dict{Int,Int}— maps PBDBorig_no→ vertex index, allowing O(1) lookup by numeric PBDB identifier.root::Int— vertex index of the root node (always1).
Working with Graphs.jl
TaxonomyTree wraps a standard Graphs.SimpleDiGraph, so any function from Graphs.jl that accepts an AbstractGraph works directly on tree.graph:
using Graphs
t = taxon_subtree("Carnivora"; leaf_rank = "family")
Graphs.nv(t.graph) # number of nodes
Graphs.ne(t.graph) # number of edges
Graphs.outneighbors(t.graph, t.root) # vertex indices of root's children
Graphs.is_tree(t.graph) # always true for a valid subtreeSee also taxon_subtree, TaxonNode.
PaleobiologyDB.Taxonomy.taxon_subtree — Function
taxon_subtree(taxon_name; leaf_rank=nothing, strict_leaf_rank=true) -> TaxonomyTreeBuild and return a TaxonomyTree rooted at taxon_name, descending through the taxonomic hierarchy down to (and including) leaf_rank.
The tree is derived from the Scratch-cached PBDB taxa list snapshot; no network requests are made. The snapshot is downloaded on first use and refreshed automatically when older than 30 days.
Arguments
taxon_name::AbstractString— accepted taxon name exactly as it appears in PBDB (e.g."Carnivora","Canidae","Canis").leaf_rank::Union{AbstractString,Nothing}(keyword, defaultnothing) — the rank at which to stop recursing. Must be one of:"subspecies" "species" "subgenus" "genus" "subtribe" "tribe" "subfamily" "family" "superfamily" "infraorder" "suborder" "order" "superorder" "infraclass" "subclass" "class" "superclass" "subphylum" "phylum" "kingdom"When
nothing(default), the entire descendant subtree is collected. When given, nodes atleaf_rankbecome leaves of the returned tree; their children in the full PBDB tree are not included. Intermediate ranks between the root rank andleaf_rankare included as interior nodes.strict_leaf_rank::Bool(keyword, defaulttrue) — controls how nodes at ranks finer thanleaf_rankare handled.In real PBDB data, taxa at finer ranks (e.g. a genus or species) are sometimes direct children of coarse-rank nodes (e.g. an order) without an intervening family. When
strict_leaf_rank = true(default), such nodes are excluded entirely from the returned tree — the leaves will all be at exactlyleaf_rank(or at an unranked-clade rank if no ranked leaf was reachable). Whenstrict_leaf_rank = false, finer-ranked nodes are included as leaf nodes, preserving all PBDB parent–child edges.Has no effect when
leaf_rankisnothing.
Returns
A TaxonomyTree rooted at the named taxon. Returns a single-node tree (root only, no edges) when taxon_name is not found in the snapshot.
Throws ArgumentError if leaf_rank is not a valid PBDB rank string.
Examples
using PaleobiologyDB, PaleobiologyDB.Taxonomy
import Graphs
# Full subtree of Carnivora (every descendant at every rank)
t = taxon_subtree("Carnivora")
Graphs.nv(t.graph) # thousands of nodes
root_taxon(t).rank # "order"
# Strict default: leaves are exactly families; orphaned genera excluded
t2 = taxon_subtree("Carnivora"; leaf_rank = "family")
leaf_taxa(t2) .|> (n -> n.name) # ["Ailuridae", "Canidae", "Felidae", …]
all(n.rank == "family" for n in leaf_taxa(t2)) # true
# Non-strict: orphaned genera/species included as leaves
t3 = taxon_subtree("Pterosauria"; leaf_rank = "family", strict_leaf_rank = false)
# taxa at genus or species rank parented directly to order appear as leaves
# Genus-level subtree of Canidae
t4 = taxon_subtree("Canidae"; leaf_rank = "genus")
# Unknown taxon → single-node tree
t5 = taxon_subtree("INVALID")
Graphs.nv(t5.graph) # 1See also root_taxon, leaf_taxa, taxa_at_rank, child_taxa, TaxonomyTree.
PaleobiologyDB.Taxonomy.root_taxon — Function
root_taxon(tree::TaxonomyTree) -> TaxonNodeReturn the root TaxonNode of tree.
Examples
t = taxon_subtree("Carnivora")
root_taxon(t).name # "Carnivora"
root_taxon(t).rank # "order"See also leaf_taxa, taxa_at_rank.
PaleobiologyDB.Taxonomy.leaf_taxa — Function
leaf_taxa(tree::TaxonomyTree) -> Vector{TaxonNode}Return all leaf nodes of tree — vertices with no outgoing edges (no children in the subtree) — sorted by name.
When taxon_subtree was called with a leaf_rank, these are typically all nodes at exactly leaf_rank. The exception is interior nodes at coarser ranks that have no leaf_rank-level descendants: those nodes are included in the tree but have no children, so they also appear as leaves. Without leaf_rank, they are the most finely resolved taxa included in the tree.
Examples
t = taxon_subtree("Carnivora"; leaf_rank = "family")
leaf_taxa(t) .|> (n -> n.name) # ["Ailuridae", "Canidae", …]See also root_taxon, taxa_at_rank.
PaleobiologyDB.Taxonomy.taxa_at_rank — Function
taxa_at_rank(tree::TaxonomyTree, rank::AbstractString) -> Vector{TaxonNode}Return all nodes in tree whose rank field equals rank, sorted by name.
Returns an empty vector when no nodes at rank are present.
Throws ArgumentError if rank is not a valid PBDB rank string.
Examples
t = taxon_subtree("Carnivora")
taxa_at_rank(t, "family") .|> (n -> n.name) # ["Ailuridae", "Canidae", …]
taxa_at_rank(t, "genus") |> length # number of genera in CarnivoraSee also root_taxon, leaf_taxa.