1. Basics of specialized workflows
  2. Basics of paleobiological fossil collection analyses
  • (Just enough) Julia for scientific informatics, modeling, and reasoning
  • Introduction
  • Basic frameworks and mechanisms
    • Orientation
    • Basics of setting up and running Julia
    • Basics of visualizing mathematical models
    • Basics of working with randomness and probabilities
    • Basics of working with data tables
  • Basics of specialized workflows
    • Basics of paleobiological fossil collection analyses
    • Basics of agent-based modeling: spatial epidemic dynamics with Agents.jl
      • Basics of agent-based modeling: spatial epidemic dynamics with Agents.jl
    • Basics of species distribution modeling
  • Primers
    • Bernoulli trial
    • Pathogen fitness as a function of virulence (Frank, 1996)
    • Virulence-transmission trade-off (Frank, 1996)
    • Julia – Environments – Global vs project
    • Julia: Functions, methods, and signatures
    • Markov property
    • Probabilty distributions–Essential concepts
    • Pseudo-random number generators
    • Pseudo-random number generators: best practices
    • Pseudo-random number generators: continuous values from discrete machines

On this page

  • 1 Walkthrough of an example analysis
    • 1.1 Install packages
    • 1.2 Load packages
    • 1.3 Download data
    • 1.4 2: Prepare the data
    • 1.5 3: Visualize the data
  • 2 Data acquisition and curation
    • 2.1 The Paleobiology Database
    • 2.2 Setting up the environment
    • 2.3 Exploring the package
    • 2.4 Acquiring occurrence data
    • 2.5 Inspecting the data
    • 2.6 Data quality assessment
    • 2.7 Missing values in Julia
  • 3 Visualization of occurrences in space and time
    • 3.1 Geographic visualization
    • 3.2 Temporal visualization
    • 3.3 Putting it together: a reusable workflow
  • 4 Exercises
  1. Basics of specialized workflows
  2. Basics of paleobiological fossil collection analyses

Basics of paleobiological fossil collection analyses

Author

Jeet Sukumaran

Proficiencies
  • Background concepts:
    • What the Paleobiology Database is and what kinds of data it holds
    • What a fossil occurrence record is, and what the fields min_ma / max_ma / direct_ma encode
    • Why data quality has three distinct dimensions: taxonomic resolution, chronological precision, and spatial completeness
    • Why real-world data retrieved from a live API will almost always contain missing values, and why that matters before any analysis or plotting
  • Core skills:
    • Exploring the PaleobiologyDB.jl package API and its built-in documentation system
    • Calling pbdb_occurrences with the parameters base_name, show, vocab, extids, and limit
    • Inspecting column names, types, and missing-value counts returned by an API query
    • Filtering by taxonomic rank using filter and reviewing equivalent approaches with subset and boolean indexing
    • Detecting data coverage across multiple chronological fields using regex column selection
    • Removing rows with missing values using dropmissing on targeted column sets
    • Assessing taxonomic diversity with groupby and combine
    • Computing a derived column (temporal midpoint) with transform
    • Plotting fossil occurrence locations on a world map using GeoMakie.GeoAxis
    • Adding land outlines with poly! and occurrence points with scatter!
    • Encoding age as color using a continuous colormap and Colorbar
    • Plotting temporal distributions with hist! and a reversed time axis

1 Walkthrough of an example analysis

We will use Julia to visualize the distribution of fossil specimens of particular species or groups of species across space and time. We will focus on the Canidae (the dog family) during the Miocene (about 23 to 5 million years ago), a critical time for mammalian diversification.

Proficiencies
  1. Install Julia packages.
  2. Load Julia packages.
  3. Download fossil occurrence data from the Paleobiology Database.
  4. Clean and curate data to to required standards.
  5. Visualize data

1.1 Install packages

You will only need to do this once for each project environment (including your global project environment which is what you typically would start by default when you execute the julia command).

using Pkg
Pkg.add([ "PaleobiologyDB", "DataFrames", "GeoMakie", "CairoMakie"])

1.2 Load packages

using PaleobiologyDB
using DataFrames
using GeoMakie, CairoMakie

1.3 Download data

We will ask the database for every recorded occurrence of a Canidae fossil from the Miocene epoch.

canids = pbdb_occurrences(
    base_name = "Canidae",
    interval  = "Miocene",
    show      = "full",
    vocab     = "pbdb"
)

println("Found ", nrow(canids), " Miocene canid fossils!")
Found 1345 Miocene canid fossils!
  • The base_name keyword argument of Canidae filters for occurrences of fossils identified as Canidae and lower taxonomic levels (that is, occurrences of all fossils identified to the level of species or genera as belonging to the Canidae, as well as fossils identified only to the level of family as “Canidae” itself).
  • The interval keyword argument of Miocene filters for fossil occurrences in the Miocene Epoch.
  • The show keyword argument of full to specify we want extended information.
  • The vocab keyword argument of pbdb returns human-readable column names such rather than the raw API codes.

1.4 2: Prepare the data

Real-world data is often messy. Some fossil records may be missing exact GPS coordinates. Before we can make a map, we must remove any rows that lack a latitude or longitude value.

clean_canids = dropmissing(canids, [:lat, :lng])

println("Records with valid coordinates: ", nrow(clean_canids))
println("Records dropped: ", nrow(canids) - nrow(clean_canids))
Records with valid coordinates: 1345
Records dropped: 0

1.5 3: Visualize the data

Now we will plot these points on a world map. In the Miocene the continents were roughly where they are today, but the climate and faunal dispersal corridors were quite different.

fig = Figure()
ax  = GeoAxis(fig[1, 1]; title = "Miocene Canidae Occurrences")

lines!(ax, GeoMakie.coastlines())  # land reference — without this the plot is blank
scatter!(ax, clean_canids.lng, clean_canids.lat;
         color = :red, markersize = 5)

display(fig)
CairoMakie.Screen{IMAGE}

2 Data acquisition and curation

2.1 The Paleobiology Database

The Paleobiology Database (PBDB) is a community-curated, openly accessible repository of fossil occurrence records spanning the history of life on Earth. Each record — an occurrence — documents that a particular taxon was found at a particular place, in a stratigraphic unit of known geological age. The PBDB currently holds several million such records contributed by researchers worldwide.

TipOccurrences, collections, and taxa

The PBDB organizes its data around three main entities.

A collection is a set of fossils from the same physical locality and stratigraphic context — essentially one fossil site. An occurrence links a taxon to a collection: it records that taxon X was identified in collection Y. A taxon record holds the name, classification, and synonymy information for a biological group.

When we ask for “occurrences of Canidae,” we are asking for every row in the database that links a canid identification to a dated collection, along with the geographic and stratigraphic context of that collection.

The PaleobiologyDB.jl package provides a Julia interface to the PBDB’s web API. Every API endpoint corresponds to a Julia function; calling the function returns a DataFrame.

2.2 Setting up the environment

We will use four packages in this chapter. PaleobiologyDB fetches data from the PBDB API. DataFrames is the standard Julia tabular-data package we used in the previous chapter. CairoMakie is the rendering backend for Makie plots. GeoMakie extends Makie with geographic axes and projections — we introduce it below.

using Pkg
Pkg.add(["PaleobiologyDB", "GeoMakie", "CairoMakie", "DataFrames"])

You can confirm which versions are active at any time:

using Pkg
Pkg.status()
Status `~/site/storage/local/authoring/teaching/20260315_just-enough-julia/20260108_just-enough-julia-for-scientific-reasoning/20260108_just-enough-julia-for-scientific-reasoning/Project.toml`
  [46ada45e] Agents v7.0.0
⌃ [13f3f980] CairoMakie v0.15.8
  [a93c6f00] DataFrames v1.8.1
  [31c24e10] Distributions v0.25.123
  [dedd4f52] GBIF2 v0.2.1
⌃ [e9467ef8] GLMakie v0.13.8
  [db073c08] GeoMakie v0.7.16
⌅ [ee78f7c6] Makie v0.24.8
⌃ [ee6415c8] OccurrencesInterface v1.2.1
  [c9cb2f45] PaleobiologyDB v1.1.2
⌃ [91a5bcdd] Plots v1.41.5
⌃ [72b53823] SpeciesDistributionToolkit v1.8.1
  [2913bbd2] StatsBase v0.34.10
  [9a3f8284] Random v1.11.0
Info Packages marked with ⌃ and ⌅ have new versions available. Those with ⌃ may be upgradable, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`

After installation, load the packages:

using PaleobiologyDB
using DataFrames
using CairoMakie
using GeoMakie

2.3 Exploring the package

Before retrieving any data, it is worth understanding what functions PaleobiologyDB exposes. The package models the PBDB API closely, so listing its exported names gives you a map of the available endpoints:

foreach(println, names(PaleobiologyDB))
PaleobiologyDB
pbdb_collection
pbdb_collections
pbdb_collections_geo
pbdb_config
pbdb_interval
pbdb_intervals
pbdb_measurements
pbdb_occurrence
pbdb_occurrences
pbdb_opinion
pbdb_opinions
pbdb_opinions_taxa
pbdb_ref_collections
pbdb_ref_occurrences
pbdb_ref_specimens
pbdb_ref_taxa
pbdb_reference
pbdb_references
pbdb_scale
pbdb_scales
pbdb_specimen
pbdb_specimens
pbdb_strata
pbdb_strata_auto
pbdb_taxa
pbdb_taxa_auto
pbdb_taxon

pbdb_occurrences is the function we want. Read its documentation directly in the notebook:

@doc pbdb_occurrences
pbdb_occurrences(; kwargs...)

Get information about fossil occurrence records stored in the Paleobiology Database.

Arguments

  • kwargs...: Filtering and output parameters. Common options include:

    • limit: Maximum number of records to return (Int or "all").

    • taxon_name: Return only records with the specified taxonomic name(s).

    • base_name: Return records for the specified name(s) and all descendant taxa.

    • lngmin, lngmax, latmin, latmax: Geographic bounding box.

    • min_ma, max_ma: Minimum and maximum age in millions of years.

    • interval: Named geologic interval (e.g. "Miocene").

    • cc: Country/continent codes (ISO two-letter or three-letter).

    • show: Extra information blocks ("coords", "classext", "ident", etc.). show = "full" for everything.

    • extids: Set extids = true to show the newer string identifiers.

    • vocab: Vocabulary for field names ("pbdb" for full names, "com" for short codes).

Returns

A DataFrame with fossil occurrence records matching the query.

Examples


# `taxon_name` retrieves *only* units of this rank
occs = pbdb_occurrences(
    taxon_name="Canis",
    show="full", # all columns
    limit=100,
)

# `base_name` retrieves units of this and nested rank
occs = pbdb_occurrences(
    base_name="Canis",
    show=["coords","classext"],
    limit=100,
)
TipPBDB data service API help

The PaleobiologyDB package includes a built-in help system that mirrors the upstream API documentation, going well beyond what fits in a single function docstring:

using PaleobiologyDB.ApiHelp
names(ApiHelp)

There is even help on using the help system itself:

?ApiHelp
?pbdb_help()
?pbdb_endpoints()
?pbdb_parameters()
?pbdb_fields()
?pbdb_search()

This is particularly useful when you want to know which show blocks are available, what each field name means, or how to construct a bounding-box query.

2.4 Acquiring occurrence data

The primary function for retrieving occurrence records is pbdb_occurrences. Its arguments map directly onto the PBDB API’s query parameters.

When first exploring a new group, request a limited sample so the API call is fast and the result is manageable:

occs = pbdb_occurrences(
    ;   # the ';' indicates end of positional arguments
    base_name = "Carnivora",
    show      = "full",
    vocab     = "pbdb",
    extids    = true,
    limit     = 1000,
)
1000×135 DataFrame
35 columns and 975 rows omitted
Row occurrence_no reid_no flags collection_no identified_name identified_rank identified_no difference accepted_name accepted_attr accepted_rank accepted_no early_interval late_interval max_ma min_ma ref_author ref_pubyr reference_no phylum class order family genus plant_organ abund_value abund_unit lng lat occurrence_comments collection_name container_no collection_aka cc state county latlng_basis latlng_precision altitude_value altitude_unit geogscale geogcomments paleomodel geoplate paleoage paleolng paleolat cc_1 protected direct_ma_value direct_ma_error direct_ma_unit direct_ma_method max_ma_value max_ma_error max_ma_unit max_ma_method min_ma_value min_ma_error min_ma_unit min_ma_method formation geological_group member stratscale zone zone_type localsection localbed localbedunit localorder regionalsection regionalbed regionalbedunit regionalorder stratcomments lithdescript lithology1 lithadj1 lithification1 minor_lithology1 fossilsfrom1 lithology2 lithadj2 lithification2 minor_lithology2 fossilsfrom2 environment tectonic_setting geology_comments size_classes articulated_parts associated_parts common_body_parts rare_body_parts feed_pred_traces artifacts component_comments pres_mode preservation_quality ⋯
String15 String15? Missing String15 String String15 String15 String31? String Missing String15 String15 String31 String31? Float64 Float64 String31 Int64 String15 String15 String15 String15 String31 String31 Missing Int64? String15? Float64 Float64 String String Missing String? String3 String15 String31? String31 String7 Int64? String7? String31 String String7 String31 String3 Float64? Float64? String3 String3? Float64? Float64? String3? String7? Float64? Missing String3? String7? Float64? Float64? String3? String7? String31 String15? String? String15 Missing Missing String15? String15? String3? String15? Missing Missing Missing Missing String String String31 String String31? String31? String1 String15 String31 String15? String15? String1 String31? Missing String? String String7? String7? String7? String15? Missing Missing Missing String String7? ⋯
1 occ:117266 missing missing col:9070 Cynodictis lacustris species txn:349281 missing Cynodictis lacustris missing species txn:349281 Late Eocene missing 37.71 33.9 Hooker 1994 1994 ref:11154 Chordata Mammalia Carnivora Amphicyonidae Cynodictis missing missing missing -1.0856 50.6772 move Howgate Bay, Bembridge Marls, Isle of Wight missing missing UK England missing stated in text 4 missing missing outcrop Howgate Bay, southern end (National Grid Reference SZ 647868) gplates 315 mid 4.41 44.84 UK missing missing missing missing missing missing missing missing missing missing missing missing missing Bembridge Marls missing formation missing missing missing missing missing missing missing missing missing missing Daley calls this Division WH VI (Lattorfian), per his 1973 publication. Most other authors consider this unit to be Upper Eocene. a thin grey marl, interbedded with poorly fossiliferous, colour mottled, red and green muds marl gray missing missing Y mudstone shelly/skeletal,green,red missing missing Y missing missing macrofossils missing missing missing missing missing missing missing body missing ⋯
2 occ:137493 missing missing col:11601 Enaliarctos mealsi n. gen. n. sp. species txn:71871 missing Enaliarctos mealsi missing species txn:71871 Chattian missing 27.3 23.04 Mitchell and Tedford 1973 1973 ref:4383 Chordata Mammalia Carnivora NO_FAMILY_SPECIFIED Enaliarctos missing 14 specimens -118.848 35.4928 Pyramid Hill Sand Member grit zone missing LACMVP Loc. 1603, 1626, 1627; UCMP Loc. V-7032 US California Kern based on nearby landmark seconds missing missing local area low hills in a northwest southeast trending belt between the Sierra Nevada and the area of the city of Bakersfield; LACM 1626 is in the "center of SE1/4 of Sect 15, T 28 S, R 29 E" (Howard 1969: basis of coordinate) gplates 130 mid -105.16 39.96 US missing missing missing missing missing missing missing missing missing missing missing missing missing Jewett missing Pyramid Hill bed missing missing missing missing missing missing missing missing missing missing grit zone; age assignments vary; age for this collection originally listed as Chattian after Scheirer and Magoon, 2007 and Barboza et al., 2017; but Aquitanian after discussion in Shimada et al. 2014 contains pebbles, rounded black chert grains, angular quartz clasts, and is referred to as the ""grit zone"" by some geologists sandstone concretionary lithified missing Y missing missing marine indet. missing macrofossils missing missing missing missing missing missing missing body,concretion,permineralized,original phosphate good ⋯
3 occ:137495 missing missing col:11601 Pinnarctidion bishopi species txn:72007 missing Pinnarctidion bishopi missing species txn:72007 Chattian missing 27.3 23.04 Barnes 1979 1979 ref:4175 Chordata Mammalia Carnivora NO_FAMILY_SPECIFIED Pinnarctidion missing 2 specimens -118.848 35.4928 Pyramid Hill Sand Member grit zone missing LACMVP Loc. 1603, 1626, 1627; UCMP Loc. V-7032 US California Kern based on nearby landmark seconds missing missing local area low hills in a northwest southeast trending belt between the Sierra Nevada and the area of the city of Bakersfield; LACM 1626 is in the "center of SE1/4 of Sect 15, T 28 S, R 29 E" (Howard 1969: basis of coordinate) gplates 130 mid -105.16 39.96 US missing missing missing missing missing missing missing missing missing missing missing missing missing Jewett missing Pyramid Hill bed missing missing missing missing missing missing missing missing missing missing grit zone; age assignments vary; age for this collection originally listed as Chattian after Scheirer and Magoon, 2007 and Barboza et al., 2017; but Aquitanian after discussion in Shimada et al. 2014 contains pebbles, rounded black chert grains, angular quartz clasts, and is referred to as the ""grit zone"" by some geologists sandstone concretionary lithified missing Y missing missing marine indet. missing macrofossils missing missing missing missing missing missing missing body,concretion,permineralized,original phosphate good ⋯
4 occ:138737 missing missing col:11798 Indarctos sinensis species txn:90198 missing Indarctos atticus missing species txn:90196 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:11437 Chordata Mammalia Carnivora Ursidae Indarctos missing missing 102.067 25.0167 Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
5 occ:138738 missing missing col:11798 Protursus sp. genus txn:116932 missing Protursus missing genus txn:116932 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:11437 Chordata Mammalia Carnivora Ailuridae Protursus missing missing 102.067 25.0167 Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
6 occ:138739 missing missing col:11798 Ursinae indet. subfamily txn:65407 missing Ursinae missing subfamily txn:65407 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:10885 Chordata Mammalia Carnivora Ursidae missing missing 102.067 25.0167 Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
7 occ:138740 missing missing col:11798 Proputorius lufengensis species var:509396 recombined as Cernictis lufengensis missing species txn:509396 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:11437 Chordata Mammalia Carnivora Mustelidae Cernictis missing missing 102.067 25.0167 Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
8 occ:138741 missing missing col:11798 Sivaonyx bathygnathus species txn:438269 Sivaonyx bathygnathus missing species txn:156337 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:11437 Chordata Mammalia Carnivora Mustelidae Sivaonyx missing missing 102.067 25.0167 Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
9 occ:138742 missing missing col:11798 Lutra sp. genus txn:41122 Lutra missing genus txn:41122 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:11437 Chordata Mammalia Carnivora Mustelidae Lutra missing missing 102.067 25.0167 Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
10 occ:138743 missing missing col:11798 Ictitherium gaudryi genus txn:41026 species not entered Ictitherium missing genus txn:41026 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:11437 Chordata Mammalia Carnivora Hyaenidae Ictitherium missing missing 102.067 25.0167 Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
11 occ:138744 missing missing col:11798 Viverrinae indet. subfamily txn:72213 Viverrinae missing subfamily txn:72213 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:11437 Chordata Mammalia Carnivora Viverridae missing missing 102.067 25.0167 2 spp. Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
12 occ:138826 missing missing col:11799 Machaerodus sp. genus txn:47826 Machaerodus missing genus txn:47826 Late Miocene Early Pliocene 11.63 3.6 Maxson 1946 1946 ref:4198 Chordata Mammalia Carnivora Felidae Machaerodus missing missing 32.8667 39.9333 Yozgat missing TR based on nearby landmark minutes missing missing local area 10 km southwest of Kucuk Yozgat village 40 km east of Ankara\\r\\nLat/Long given for Ankara gplates 510 mid 33.18 38.28 TR missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing Silty strata interbedded in a thick non-marine section of sandstones and conglomerates siltstone Y terrestrial indet. missing missing missing missing missing missing missing missing ⋯
13 occ:147936 missing missing col:13061 Atopotarus courseni n. gen. n. sp. species txn:53071 Atopotarus courseni missing species txn:53071 Late Hemingfordian Early Barstovian 18.5 12.5 Downs 1956 1956 ref:4342 Chordata Mammalia Carnivora Desmatophocidae Atopotarus missing missing -118.34 33.773 Coursen garden missing LACM 1098 US California Los Angeles 3 missing missing small collection Garden walk of Mr. and Mrs. Walter H. Coursen, Jr., 3 Meadowlark Lane, Rolling Hills, CA. directly west of hill 443 and north & degrees west of hill 918 (see Woodring, et al., 1946, geol. map). gplates not computable using this model mid missing missing US missing missing missing missing missing missing missing missing missing missing missing missing missing Monterey missing Altamira member missing missing missing missing missing missing missing missing missing missing hard, siliceous limestone with parallel layers of opaline chert wich occasionaly replaced portions of the actual skeleton "limestone" lithified cherty/siliceous chert lithified carbonate indet. missing macrofossils missing missing missing missing missing missing missing body,original phosphate,replaced with silica good ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
989 occ:183590 missing col:17925 Mesocyon socialis genus txn:41220 species not entered Mesocyon missing genus txn:41220 Early Hemingfordian 18.5 16.3 Fremd 1994 1994 ref:1547 Chordata Mammalia Carnivora Canidae Mesocyon missing missing -120.2 44.9 Haystack Member missing US Oregon Wheeler based on political unit 1 missing although the Haystack Valley is in Wheeler County, some of this material is probably from Grant County gplates 129 mid -110.81 48.56 US missing missing 22.6 missing Ma Ar/Ar missing missing John Day member missing missing JohnD 11 bottom to top missing missing missing missing overlies "? ATR Tuff" dated at 22.6 Ma (AA)\\r\\nsaid to be 20 Ma or younger based on biochronology, i.e., early Hemingfordian not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
990 occ:183591 missing col:17925 Paradaphoenus sp. genus txn:41289 Paradaphoenus missing genus txn:41289 Early Hemingfordian 18.5 16.3 Fremd 1994 1994 ref:1547 Chordata Mammalia Carnivora Amphicyonidae Paradaphoenus missing missing -120.2 44.9 presumably includes type of "Daphoenus minimus" from the "Scenic Member" (Hough 1948) Haystack Member missing US Oregon Wheeler based on political unit 1 missing although the Haystack Valley is in Wheeler County, some of this material is probably from Grant County gplates 129 mid -110.81 48.56 US missing missing 22.6 missing Ma Ar/Ar missing missing John Day member missing missing JohnD 11 bottom to top missing missing missing missing overlies "? ATR Tuff" dated at 22.6 Ma (AA)\\r\\nsaid to be 20 Ma or younger based on biochronology, i.e., early Hemingfordian not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
991 occ:183607 missing col:17927 Cynarctoides acridens species txn:45433 Cynarctoides acridens missing species txn:45441 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Cynarctoides missing missing -103.1 42.8 "'B' Quarry" Hemingford B Quarry missing US Nebraska Dawes based on political unit 1 missing small collection gplates 101 mid -94.98 46.73 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
992 occ:183608 missing col:17927 Desmocyon matthewi species txn:45610 Desmocyon matthewi missing species txn:45610 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Desmocyon missing missing -103.1 42.8 "'B' Quarry" Hemingford B Quarry missing US Nebraska Dawes based on political unit 1 missing small collection gplates 101 mid -94.98 46.73 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
993 occ:183615 missing col:17928 Desmocyon matthewi species txn:45610 Desmocyon matthewi missing species txn:45610 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Desmocyon missing missing -103.1 42.8 "'C' Quarry" Hemingford C Quarry missing US Nebraska Dawes based on political unit 1 missing small collection gplates 101 mid -94.98 46.73 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
994 occ:183616 missing col:17929 Cynarctoides acridens species txn:45433 Cynarctoides acridens missing species txn:45441 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Cynarctoides missing missing -103.1 42.2 Hemingford Quarry 0 missing UNSM Bx-0 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
995 occ:183622 missing col:17932 Cynarctoides acridens species txn:45433 Cynarctoides acridens missing species txn:45441 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Cynarctoides missing missing -103.1 42.2 Hemingford Quarry 7 missing UNSM Bx-7 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
996 occ:183623 missing col:17932 Phlaocyon marslandensis species txn:50347 Phlaocyon marslandensis missing species txn:50347 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Phlaocyon missing missing -103.1 42.2 Hemingford Quarry 7 missing UNSM Bx-7 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
997 occ:183624 missing col:17932 Desmocyon matthewi species txn:45610 Desmocyon matthewi missing species txn:45610 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Desmocyon missing missing -103.1 42.2 Hemingford Quarry 7 missing UNSM Bx-7 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
998 occ:183625 missing col:17932 Metatomarctus canavus species txn:48462 Metatomarctus canavus missing species txn:45454 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Metatomarctus missing missing -103.1 42.2 Hemingford Quarry 7 missing UNSM Bx-7 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
999 occ:183626 rei:30999 missing col:17933 Daphoenodon (Borocyon) robustum species txn:318454 Daphoenodon (Borocyon) robustum missing species txn:44589 Early Hemingfordian 18.5 16.3 Hunt 2009 2009 ref:54871 Chordata Mammalia Carnivora Amphicyonidae Daphoenodon (Borocyon) missing missing -103.1 42.2 Hemingford Quarry 7A missing UNSM Bx-7A US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
1000 occ:183627 missing col:17934 Cynarctoides acridens species txn:45433 Cynarctoides acridens missing species txn:45441 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Cynarctoides missing missing -103.1 42.2 Hemingford Quarry 7B missing UNSM Bx-7B US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
TipWhat the key arguments do

base_name = "Carnivora" retrieves occurrences for Carnivora and all of its nested descendant taxa. Using taxon_name instead would restrict results to records identified to exactly that name.

show = "full" requests the complete set of response fields, including geographic coordinates, paleocoordinates, classification hierarchy, and stratigraphic details. You can also pass a list of specific blocks — for example show = ["coords", "class"] — to request only what you need.

vocab = "pbdb" uses the long, human-readable field names (e.g. accepted_name, early_interval) rather than the compact API codes.

extids = true returns modern string-style record identifiers (e.g. "occ:12345") rather than bare integers.

limit = 1000 caps the number of records returned. Remove the limit (or set limit = "all") only when you are ready to work with the full dataset.

The return value is a standard DataFrame:

typeof(occs)
DataFrame
size(occs)
(1000, 135)

2.5 Inspecting the data

Before doing anything else, examine what came back.

2.5.1 List the columns of the dataset

The function foreach applies the first argument, a function, to every element of the second argument. Here the first argument is println and the second argument is names(occs). The function call names(occs) returns all the column names of the DataFrame object passed to it, occs. Put altogether: the names(occs) call returns an array of column names, and the foreach function then applies the println function to each element of this array, which prints the column name out to the standard output. The end result is each column name in the occs DataFrame being printed on its own line.

foreach(println, names(occs))
occurrence_no
reid_no
flags
collection_no
identified_name
identified_rank
identified_no
difference
accepted_name
accepted_attr
accepted_rank
accepted_no
early_interval
late_interval
max_ma
min_ma
ref_author
ref_pubyr
reference_no
phylum
class
order
family
genus
plant_organ
abund_value
abund_unit
lng
lat
occurrence_comments
collection_name
container_no
collection_aka
cc
state
county
latlng_basis
latlng_precision
altitude_value
altitude_unit
geogscale
geogcomments
paleomodel
geoplate
paleoage
paleolng
paleolat
cc_1
protected
direct_ma_value
direct_ma_error
direct_ma_unit
direct_ma_method
max_ma_value
max_ma_error
max_ma_unit
max_ma_method
min_ma_value
min_ma_error
min_ma_unit
min_ma_method
formation
geological_group
member
stratscale
zone
zone_type
localsection
localbed
localbedunit
localorder
regionalsection
regionalbed
regionalbedunit
regionalorder
stratcomments
lithdescript
lithology1
lithadj1
lithification1
minor_lithology1
fossilsfrom1
lithology2
lithadj2
lithification2
minor_lithology2
fossilsfrom2
environment
tectonic_setting
geology_comments
size_classes
articulated_parts
associated_parts
common_body_parts
rare_body_parts
feed_pred_traces
artifacts
component_comments
pres_mode
preservation_quality
spatial_resolution
temporal_resolution
lagerstatten
concentration
orientation
abund_in_sediment
sorting
fragmentation
bioerosion
encrustation
preservation_comments
collection_type
collection_methods
museum
collection_coverage
collection_size
rock_censused
collectors
collection_dates
collection_comments
taxonomy_comments
research_group
taxon_environment
environment_basis
motility
life_habit
vision
diet
reproduction
ontogeny
ecospace_comments
composition
architecture
thickness
reinforcement

2.5.2 Detailed summary of the dataset structure

describe gives the most useful first-look summary: column name, element type, and the number of missing values in each column.

describe(occs)
135×7 DataFrame
110 rows omitted
Row variable mean min median max nmissing eltype
Symbol Union… Any Union… Any Int64 Type
1 occurrence_no occ:117266 occ:183627 0 String15
2 reid_no rei:674 27 Union{Missing, String15}
3 flags 1000 Missing
4 collection_no col:11601 col:99845 0 String15
5 identified_name "Hesperocyon" coloradensis cf. Promartes sp. 0 String
6 identified_rank family suborder 0 String15
7 identified_no txn:104016 var:52393 0 String15
8 difference species not entered 6 Union{Missing, String31}
9 accepted_name Acheronictis webbi Zodiolestes daimonelixensis 0 String
10 accepted_attr 1000 Missing
11 accepted_rank family suborder 0 String15
12 accepted_no txn:100309 txn:90655 0 String15
13 early_interval Arikareean Whitneyan 0 String31
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
124 environment_basis 1000 Missing
125 motility actively mobile actively mobile 0 String15
126 life_habit amphibious scansorial 0 String15
127 vision 1000 Missing
128 diet carnivore omnivore 0 String31
129 reproduction viviparous viviparous 0 String15
130 ontogeny modification of parts 1 Union{Missing, String31}
131 ecospace_comments life habit and diet based on Nowak 1999; see also Van Valkenburgh 1988 23 Union{Missing, String}
132 composition hydroxyapatite hydroxyapatite 0 String15
133 architecture 1000 Missing
134 thickness 1000 Missing
135 reinforcement 1000 Missing

By default, Julia truncates the output being displayed in the REPL, only showing the first and last rows of large results. Use the show function with allrows = true keyword argument make sure everything is listed.

show(describe(occs), allrows = true)
135×7 DataFrame
 Row │ variable               mean      min                              media ⋯
     │ Symbol                 Union…    Any                              Union ⋯
─────┼──────────────────────────────────────────────────────────────────────────
   1 │ occurrence_no                    occ:117266                             ⋯
   2 │ reid_no                                                                
   3 │ flags                                                                  
   4 │ collection_no                    col:11601                             
   5 │ identified_name                  "Hesperocyon" coloradensis             ⋯
   6 │ identified_rank                  family                                
   7 │ identified_no                    txn:104016                            
   8 │ difference                                                             
   9 │ accepted_name                    Acheronictis webbi                     ⋯
  10 │ accepted_attr                                                          
  11 │ accepted_rank                    family                                
  12 │ accepted_no                      txn:100309                            
  13 │ early_interval                   Arikareean                             ⋯
  14 │ late_interval                                                          
  15 │ max_ma                 29.0917   0.0117                           29.5
  16 │ min_ma                 25.1156   0.0                              24.7
  17 │ ref_author                       Albright 1996                          ⋯
  18 │ ref_pubyr              1987.26   1879                             1994.
  19 │ reference_no                     ref:1028                              
  20 │ phylum                           Chordata                              
  21 │ class                            Mammalia                               ⋯
  22 │ order                            Carnivora                             
  23 │ family                                                                 
  24 │ genus                                                                  
  25 │ plant_organ                                                             ⋯
  26 │ abund_value            1.86087   1                                1.0
  27 │ abund_unit                                                             
  28 │ lng                    -96.8768  -120.5                           -103.
  29 │ lat                    41.7539   -38.0                            42.8  ⋯
  30 │ occurrence_comments                                                    
  31 │ collection_name                  15 Mi. North of Harrison              
  32 │ container_no                                                           
  33 │ collection_aka                                                          ⋯
  34 │ cc                               AR                                    
  35 │ state                                                                  
  36 │ county                                                                 
  37 │ latlng_basis                                                            ⋯
  38 │ latlng_precision                 1                                     
  39 │ altitude_value         2366.64   38                               1059.
  40 │ altitude_unit                                                          
  41 │ geogscale                                                               ⋯
  42 │ geogcomments                                                           
  43 │ paleomodel                       gplates                               
  44 │ geoplate                         101                                   
  45 │ paleoage                         mid                                    ⋯
  46 │ paleolng               -86.441   -111.56                          -90.2
  47 │ paleolat               47.0513   -37.9                            48.02
  48 │ cc_1                             AR                                    
  49 │ protected                                                               ⋯
  50 │ direct_ma_value        26.3      24.5                             27.0
  51 │ direct_ma_error        0.7       0.5                              0.5
  52 │ direct_ma_unit                                                         
  53 │ direct_ma_method                                                        ⋯
  54 │ max_ma_value           27.7867   22.6                             28.7
  55 │ max_ma_error                                                           
  56 │ max_ma_unit                                                            
  57 │ max_ma_method                                                           ⋯
  58 │ min_ma_value           27.7604   22.6                             28.7
  59 │ min_ma_error           0.07      0.07                             0.07
  60 │ min_ma_unit                                                            
  61 │ min_ma_method                                                           ⋯
  62 │ formation                                                              
  63 │ geological_group                                                       
  64 │ member                                                                 
  65 │ stratscale                                                              ⋯
  66 │ zone                                                                   
  67 │ zone_type                                                              
  68 │ localsection                                                           
  69 │ localbed                                                                ⋯
  70 │ localbedunit                                                           
  71 │ localorder                                                             
  72 │ regionalsection                                                        
  73 │ regionalbed                                                             ⋯
  74 │ regionalbedunit                                                        
  75 │ regionalorder                                                          
  76 │ stratcomments                                                          
  77 │ lithdescript                                                            ⋯
  78 │ lithology1                                                             
  79 │ lithadj1                                                               
  80 │ lithification1                                                         
  81 │ minor_lithology1                                                        ⋯
  82 │ fossilsfrom1                                                           
  83 │ lithology2                                                             
  84 │ lithadj2                                                               
  85 │ lithification2                                                          ⋯
  86 │ minor_lithology2                                                       
  87 │ fossilsfrom2                                                           
  88 │ environment                                                            
  89 │ tectonic_setting                                                        ⋯
  90 │ geology_comments                                                       
  91 │ size_classes                                                           
  92 │ articulated_parts                                                      
  93 │ associated_parts                                                        ⋯
  94 │ common_body_parts                                                      
  95 │ rare_body_parts                                                        
  96 │ feed_pred_traces                                                       
  97 │ artifacts                                                               ⋯
  98 │ component_comments                                                     
  99 │ pres_mode                                                              
 100 │ preservation_quality                                                   
 101 │ spatial_resolution                                                      ⋯
 102 │ temporal_resolution                                                    
 103 │ lagerstatten                                                           
 104 │ concentration                                                          
 105 │ orientation                                                             ⋯
 106 │ abund_in_sediment                                                      
 107 │ sorting                                                                
 108 │ fragmentation                                                          
 109 │ bioerosion                                                              ⋯
 110 │ encrustation                                                           
 111 │ preservation_comments                                                  
 112 │ collection_type                                                        
 113 │ collection_methods                                                      ⋯
 114 │ museum                                                                 
 115 │ collection_coverage                                                    
 116 │ collection_size                                                        
 117 │ rock_censused                                                           ⋯
 118 │ collectors                                                             
 119 │ collection_dates                                                       
 120 │ collection_comments                                                    
 121 │ taxonomy_comments                                                       ⋯
 122 │ research_group                   marine invertebrate,paleobotany       
 123 │ taxon_environment                coastal                               
 124 │ environment_basis                                                      
 125 │ motility                         actively mobile                        ⋯
 126 │ life_habit                       amphibious                            
 127 │ vision                                                                 
 128 │ diet                             carnivore                             
 129 │ reproduction                     viviparous                             ⋯
 130 │ ontogeny                                                               
 131 │ ecospace_comments                                                      
 132 │ composition                      hydroxyapatite                        
 133 │ architecture                                                            ⋯
 134 │ thickness                                                              
 135 │ reinforcement                                                          
                                                               4 columns omitted

Pay attention to the nmissing column. Data from a live database rarely arrives perfectly complete. Coordinates in particular (lng, lat) are sometimes absent from older records that predate GPS-referenced collection practice, and not every occurrence is resolved to species level. Based on the study and the analytical objectives, we need to drop rows that have missing data in needed fields, and the certainty and resolution of the taxonomic identificatioon.

It is also worth seeing how records are distributed across taxonomic ranks, since finer ranks carry more information for most analyses:

combine(groupby(occs, :accepted_rank), nrow)
6×2 DataFrame
Row accepted_rank nrow
String15 Int64
1 species 697
2 genus 217
3 subfamily 19
4 family 42
5 order 24
6 suborder 1

2.6 Data quality assessment

Auditing the data means more than confirming that records are present — it means ensuring the data that is present meets our standards. For paleobiological occurrence data, quality has three distinct dimensions:

  1. Taxonomic resolution — Is the occurrence identified to the rank we need?
  2. Chronological precision — Does the record carry usable age information?
  3. Spatial completeness — Are coordinates available?

We address each in turn.

2.6.1 Taxonomic resolution

Many PBDB occurrences are identified only to genus, family, or higher. For analyses that require species-level identities, those records add noise rather than signal. We define a filter that accepts occurrences at species and subspecies rank:

clean_taxonomy_flt = row -> row.accepted_rank == "species" || row.accepted_rank == "subspecies"

occs_species = filter(clean_taxonomy_flt, occs)
697×135 DataFrame
35 columns and 672 rows omitted
Row occurrence_no reid_no flags collection_no identified_name identified_rank identified_no difference accepted_name accepted_attr accepted_rank accepted_no early_interval late_interval max_ma min_ma ref_author ref_pubyr reference_no phylum class order family genus plant_organ abund_value abund_unit lng lat occurrence_comments collection_name container_no collection_aka cc state county latlng_basis latlng_precision altitude_value altitude_unit geogscale geogcomments paleomodel geoplate paleoage paleolng paleolat cc_1 protected direct_ma_value direct_ma_error direct_ma_unit direct_ma_method max_ma_value max_ma_error max_ma_unit max_ma_method min_ma_value min_ma_error min_ma_unit min_ma_method formation geological_group member stratscale zone zone_type localsection localbed localbedunit localorder regionalsection regionalbed regionalbedunit regionalorder stratcomments lithdescript lithology1 lithadj1 lithification1 minor_lithology1 fossilsfrom1 lithology2 lithadj2 lithification2 minor_lithology2 fossilsfrom2 environment tectonic_setting geology_comments size_classes articulated_parts associated_parts common_body_parts rare_body_parts feed_pred_traces artifacts component_comments pres_mode preservation_quality ⋯
String15 String15? Missing String15 String String15 String15 String31? String Missing String15 String15 String31 String31? Float64 Float64 String31 Int64 String15 String15 String15 String15 String31 String31 Missing Int64? String15? Float64 Float64 String String Missing String? String3 String15 String31? String31 String7 Int64? String7? String31 String String7 String31 String3 Float64? Float64? String3 String3? Float64? Float64? String3? String7? Float64? Missing String3? String7? Float64? Float64? String3? String7? String31 String15? String? String15 Missing Missing String15? String15? String3? String15? Missing Missing Missing Missing String String String31 String String31? String31? String1 String15 String31 String15? String15? String1 String31? Missing String? String String7? String7? String7? String15? Missing Missing Missing String String7? ⋯
1 occ:117266 missing missing col:9070 Cynodictis lacustris species txn:349281 missing Cynodictis lacustris missing species txn:349281 Late Eocene missing 37.71 33.9 Hooker 1994 1994 ref:11154 Chordata Mammalia Carnivora Amphicyonidae Cynodictis missing missing missing -1.0856 50.6772 move Howgate Bay, Bembridge Marls, Isle of Wight missing missing UK England missing stated in text 4 missing missing outcrop Howgate Bay, southern end (National Grid Reference SZ 647868) gplates 315 mid 4.41 44.84 UK missing missing missing missing missing missing missing missing missing missing missing missing missing Bembridge Marls missing formation missing missing missing missing missing missing missing missing missing missing Daley calls this Division WH VI (Lattorfian), per his 1973 publication. Most other authors consider this unit to be Upper Eocene. a thin grey marl, interbedded with poorly fossiliferous, colour mottled, red and green muds marl gray missing missing Y mudstone shelly/skeletal,green,red missing missing Y missing missing macrofossils missing missing missing missing missing missing missing body missing ⋯
2 occ:137493 missing missing col:11601 Enaliarctos mealsi n. gen. n. sp. species txn:71871 missing Enaliarctos mealsi missing species txn:71871 Chattian missing 27.3 23.04 Mitchell and Tedford 1973 1973 ref:4383 Chordata Mammalia Carnivora NO_FAMILY_SPECIFIED Enaliarctos missing 14 specimens -118.848 35.4928 Pyramid Hill Sand Member grit zone missing LACMVP Loc. 1603, 1626, 1627; UCMP Loc. V-7032 US California Kern based on nearby landmark seconds missing missing local area low hills in a northwest southeast trending belt between the Sierra Nevada and the area of the city of Bakersfield; LACM 1626 is in the "center of SE1/4 of Sect 15, T 28 S, R 29 E" (Howard 1969: basis of coordinate) gplates 130 mid -105.16 39.96 US missing missing missing missing missing missing missing missing missing missing missing missing missing Jewett missing Pyramid Hill bed missing missing missing missing missing missing missing missing missing missing grit zone; age assignments vary; age for this collection originally listed as Chattian after Scheirer and Magoon, 2007 and Barboza et al., 2017; but Aquitanian after discussion in Shimada et al. 2014 contains pebbles, rounded black chert grains, angular quartz clasts, and is referred to as the ""grit zone"" by some geologists sandstone concretionary lithified missing Y missing missing marine indet. missing macrofossils missing missing missing missing missing missing missing body,concretion,permineralized,original phosphate good ⋯
3 occ:137495 missing missing col:11601 Pinnarctidion bishopi species txn:72007 missing Pinnarctidion bishopi missing species txn:72007 Chattian missing 27.3 23.04 Barnes 1979 1979 ref:4175 Chordata Mammalia Carnivora NO_FAMILY_SPECIFIED Pinnarctidion missing 2 specimens -118.848 35.4928 Pyramid Hill Sand Member grit zone missing LACMVP Loc. 1603, 1626, 1627; UCMP Loc. V-7032 US California Kern based on nearby landmark seconds missing missing local area low hills in a northwest southeast trending belt between the Sierra Nevada and the area of the city of Bakersfield; LACM 1626 is in the "center of SE1/4 of Sect 15, T 28 S, R 29 E" (Howard 1969: basis of coordinate) gplates 130 mid -105.16 39.96 US missing missing missing missing missing missing missing missing missing missing missing missing missing Jewett missing Pyramid Hill bed missing missing missing missing missing missing missing missing missing missing grit zone; age assignments vary; age for this collection originally listed as Chattian after Scheirer and Magoon, 2007 and Barboza et al., 2017; but Aquitanian after discussion in Shimada et al. 2014 contains pebbles, rounded black chert grains, angular quartz clasts, and is referred to as the ""grit zone"" by some geologists sandstone concretionary lithified missing Y missing missing marine indet. missing macrofossils missing missing missing missing missing missing missing body,concretion,permineralized,original phosphate good ⋯
4 occ:138737 missing missing col:11798 Indarctos sinensis species txn:90198 missing Indarctos atticus missing species txn:90196 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:11437 Chordata Mammalia Carnivora Ursidae Indarctos missing missing 102.067 25.0167 Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
5 occ:138740 missing missing col:11798 Proputorius lufengensis species var:509396 recombined as Cernictis lufengensis missing species txn:509396 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:11437 Chordata Mammalia Carnivora Mustelidae Cernictis missing missing 102.067 25.0167 Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
6 occ:138741 missing missing col:11798 Sivaonyx bathygnathus species txn:438269 Sivaonyx bathygnathus missing species txn:156337 Tortonian missing 11.63 7.246 Qi 1985 1985 ref:11437 Chordata Mammalia Carnivora Mustelidae Sivaonyx missing missing 102.067 25.0167 Lufeng missing CN Yunnan Lufeng stated in text minutes missing missing outcrop Located about 9 km north of Lufeng, at the southern side of Miaoshanpo-Mountain; IVPP-Point 75033, called section D. gplates 611 mid 101.5 23.6 CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing member missing missing missing missing missing missing missing missing missing missing Quarry sequence is divided into 8 beds (numbered from top), in the Xi River Basin north of Lufeng, China. These unconformably overly Mesozoic sediments. No Formation names are given.\\r\\nHan 1985: "Its geological age is Late Miocene, equivalent to the Turolian of European sequence of land mammal ages." Qi 1986: Age "8 Ma or perhaps a bit later" Quarry section at Lufeng is 8 m thick. Detailed microstratigraphy given in Qi 1985 and Badgley et al 1988. lignite black unlithified sandy Y mudstone unlithified calcareous Y fluvial-lacustrine indet. missing macrofossils,mesofossils,microfossils missing missing missing missing missing missing missing body ⋯
7 occ:147936 missing missing col:13061 Atopotarus courseni n. gen. n. sp. species txn:53071 Atopotarus courseni missing species txn:53071 Late Hemingfordian Early Barstovian 18.5 12.5 Downs 1956 1956 ref:4342 Chordata Mammalia Carnivora Desmatophocidae Atopotarus missing missing -118.34 33.773 Coursen garden missing LACM 1098 US California Los Angeles 3 missing missing small collection Garden walk of Mr. and Mrs. Walter H. Coursen, Jr., 3 Meadowlark Lane, Rolling Hills, CA. directly west of hill 443 and north & degrees west of hill 918 (see Woodring, et al., 1946, geol. map). gplates not computable using this model mid missing missing US missing missing missing missing missing missing missing missing missing missing missing missing missing Monterey missing Altamira member missing missing missing missing missing missing missing missing missing missing hard, siliceous limestone with parallel layers of opaline chert wich occasionaly replaced portions of the actual skeleton "limestone" lithified cherty/siliceous chert lithified carbonate indet. missing macrofossils missing missing missing missing missing missing missing body,original phosphate,replaced with silica good ⋯
8 occ:149438 missing missing col:13192 Allodesmus kernensis species txn:72005 Allodesmus kernensis missing species txn:72005 Langhian 15.98 13.82 Barnes 1972 1972 ref:4384 Chordata Mammalia Carnivora Desmatophocidae Allodesmus missing missing -118.985 35.3861 CAS locality 275 missing US California Kern based on political unit seconds missing missing outcrop Kern River, Miocene, 1 mi west of Kern River, 4 miles above oil city, Sec. 28, T 28 S, R 28 E gplates 130 mid -111.56 38.5 US missing missing missing missing missing missing missing missing missing missing missing missing missing Round Mountain missing missing missing missing missing missing missing missing missing missing missing Age after Scheirer and Magoon 2007 siltstone marine indet. missing macrofossils missing missing missing missing missing missing missing body ⋯
9 occ:150060 missing missing col:13293 Panthera pardus species txn:72185 Panthera pardus missing species txn:104158 Middle Pleistocene Late Pleistocene 0.774 0.0117 Huang et al. 1988 1988 ref:4412 Chordata Mammalia Carnivora Felidae Panthera missing missing 111.567 22.7667 Xiashan Cave, lower part (Guangdong Province) missing CN Guangdong Luoding based on nearby landmark minutes missing missing small collection Xiashan Cave, Xiashan River Valley, at the western side of Dayunwu Mountain, 4 km from the Capital of Pingtang District (Southwest of Luoding). Lat long is for Luoding.\\n gplates not computable using this model mid missing missing CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing lower layer missing missing missing missing missing missing "The sediments of the site belong to cave accumulations..."\\nlate middle Pleistocene See stratigraphic column, Fig. 2 claystone brown,yellow unlithified sandy Y terrestrial indet. missing cave deposits macrofossils,mesofossils missing missing missing missing missing missing missing body ⋯
10 occ:150061 missing missing col:13293 Panthera tigris species txn:90651 Panthera tigris missing species txn:104157 Middle Pleistocene Late Pleistocene 0.774 0.0117 Huang et al. 1988 1988 ref:4412 Chordata Mammalia Carnivora Felidae Panthera missing missing 111.567 22.7667 Xiashan Cave, lower part (Guangdong Province) missing CN Guangdong Luoding based on nearby landmark minutes missing missing small collection Xiashan Cave, Xiashan River Valley, at the western side of Dayunwu Mountain, 4 km from the Capital of Pingtang District (Southwest of Luoding). Lat long is for Luoding.\\n gplates not computable using this model mid missing missing CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing lower layer missing missing missing missing missing missing "The sediments of the site belong to cave accumulations..."\\nlate middle Pleistocene See stratigraphic column, Fig. 2 claystone brown,yellow unlithified sandy Y terrestrial indet. missing cave deposits macrofossils,mesofossils missing missing missing missing missing missing missing body ⋯
11 occ:150062 missing missing col:13293 Crocuta crocuta species txn:53881 Crocuta crocuta missing species txn:232133 Middle Pleistocene Late Pleistocene 0.774 0.0117 Huang et al. 1988 1988 ref:4412 Chordata Mammalia Carnivora Hyaenidae Crocuta missing missing 111.567 22.7667 C. crocuta ultima Xiashan Cave, lower part (Guangdong Province) missing CN Guangdong Luoding based on nearby landmark minutes missing missing small collection Xiashan Cave, Xiashan River Valley, at the western side of Dayunwu Mountain, 4 km from the Capital of Pingtang District (Southwest of Luoding). Lat long is for Luoding.\\n gplates not computable using this model mid missing missing CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing lower layer missing missing missing missing missing missing "The sediments of the site belong to cave accumulations..."\\nlate middle Pleistocene See stratigraphic column, Fig. 2 claystone brown,yellow unlithified sandy Y terrestrial indet. missing cave deposits macrofossils,mesofossils missing missing missing missing missing missing missing body ⋯
12 occ:150063 missing missing col:13293 Paguma larvata species txn:232934 Paguma larvata missing species txn:232933 Middle Pleistocene Late Pleistocene 0.774 0.0117 Huang et al. 1988 1988 ref:4412 Chordata Mammalia Carnivora Viverridae Paguma missing missing 111.567 22.7667 Xiashan Cave, lower part (Guangdong Province) missing CN Guangdong Luoding based on nearby landmark minutes missing missing small collection Xiashan Cave, Xiashan River Valley, at the western side of Dayunwu Mountain, 4 km from the Capital of Pingtang District (Southwest of Luoding). Lat long is for Luoding.\\n gplates not computable using this model mid missing missing CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing lower layer missing missing missing missing missing missing "The sediments of the site belong to cave accumulations..."\\nlate middle Pleistocene See stratigraphic column, Fig. 2 claystone brown,yellow unlithified sandy Y terrestrial indet. missing cave deposits macrofossils,mesofossils missing missing missing missing missing missing missing body ⋯
13 occ:150064 missing missing col:13293 Herpestes cf. urva species txn:240142 Herpestes urva missing species txn:240141 Middle Pleistocene Late Pleistocene 0.774 0.0117 Huang et al. 1988 1988 ref:4412 Chordata Mammalia Carnivora Herpestidae Herpestes missing missing 111.567 22.7667 Xiashan Cave, lower part (Guangdong Province) missing CN Guangdong Luoding based on nearby landmark minutes missing missing small collection Xiashan Cave, Xiashan River Valley, at the western side of Dayunwu Mountain, 4 km from the Capital of Pingtang District (Southwest of Luoding). Lat long is for Luoding.\\n gplates not computable using this model mid missing missing CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing lower layer missing missing missing missing missing missing "The sediments of the site belong to cave accumulations..."\\nlate middle Pleistocene See stratigraphic column, Fig. 2 claystone brown,yellow unlithified sandy Y terrestrial indet. missing cave deposits macrofossils,mesofossils missing missing missing missing missing missing missing body ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
686 occ:183585 missing col:17923 Cynarctoides acridens species txn:45433 Cynarctoides acridens missing species txn:45441 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Cynarctoides missing missing -103.1 42.8 Havorka Quarry missing Hovorka's Quarry; UNSM Bx-21 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -94.98 46.73 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
687 occ:183587 missing col:17924 Desmocyon thomsoni species txn:45611 Desmocyon thomsoni missing species txn:45465 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Desmocyon missing missing -103.1 42.8 Hay Springs Creek (Runningwater) missing US Nebraska Dawes based on political unit 1 missing gplates 101 mid -94.98 46.73 US missing missing missing missing missing missing Runningwater missing missing missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
688 occ:183607 missing col:17927 Cynarctoides acridens species txn:45433 Cynarctoides acridens missing species txn:45441 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Cynarctoides missing missing -103.1 42.8 "'B' Quarry" Hemingford B Quarry missing US Nebraska Dawes based on political unit 1 missing small collection gplates 101 mid -94.98 46.73 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
689 occ:183608 missing col:17927 Desmocyon matthewi species txn:45610 Desmocyon matthewi missing species txn:45610 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Desmocyon missing missing -103.1 42.8 "'B' Quarry" Hemingford B Quarry missing US Nebraska Dawes based on political unit 1 missing small collection gplates 101 mid -94.98 46.73 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
690 occ:183615 missing col:17928 Desmocyon matthewi species txn:45610 Desmocyon matthewi missing species txn:45610 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Desmocyon missing missing -103.1 42.8 "'C' Quarry" Hemingford C Quarry missing US Nebraska Dawes based on political unit 1 missing small collection gplates 101 mid -94.98 46.73 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
691 occ:183616 missing col:17929 Cynarctoides acridens species txn:45433 Cynarctoides acridens missing species txn:45441 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Cynarctoides missing missing -103.1 42.2 Hemingford Quarry 0 missing UNSM Bx-0 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
692 occ:183622 missing col:17932 Cynarctoides acridens species txn:45433 Cynarctoides acridens missing species txn:45441 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Cynarctoides missing missing -103.1 42.2 Hemingford Quarry 7 missing UNSM Bx-7 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
693 occ:183623 missing col:17932 Phlaocyon marslandensis species txn:50347 Phlaocyon marslandensis missing species txn:50347 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Phlaocyon missing missing -103.1 42.2 Hemingford Quarry 7 missing UNSM Bx-7 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
694 occ:183624 missing col:17932 Desmocyon matthewi species txn:45610 Desmocyon matthewi missing species txn:45610 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Desmocyon missing missing -103.1 42.2 Hemingford Quarry 7 missing UNSM Bx-7 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
695 occ:183625 missing col:17932 Metatomarctus canavus species txn:48462 Metatomarctus canavus missing species txn:45454 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Metatomarctus missing missing -103.1 42.2 Hemingford Quarry 7 missing UNSM Bx-7 US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
696 occ:183626 rei:30999 missing col:17933 Daphoenodon (Borocyon) robustum species txn:318454 Daphoenodon (Borocyon) robustum missing species txn:44589 Early Hemingfordian 18.5 16.3 Hunt 2009 2009 ref:54871 Chordata Mammalia Carnivora Amphicyonidae Daphoenodon (Borocyon) missing missing -103.1 42.2 Hemingford Quarry 7A missing UNSM Bx-7A US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
697 occ:183627 missing col:17934 Cynarctoides acridens species txn:45433 Cynarctoides acridens missing species txn:45441 Early Hemingfordian 18.5 16.3 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Cynarctoides missing missing -103.1 42.2 Hemingford Quarry 7B missing UNSM Bx-7B US Nebraska Box Butte based on political unit 1 missing small collection gplates 101 mid -95.06 46.14 US missing missing missing missing missing missing Runningwater missing missing PineR 16 bottom to top missing missing missing missing not reported terrestrial indet. missing macrofossils missing missing missing body ⋯
TipEquivalent filtering approaches

DataFrames provides several syntactically different but logically equivalent ways to select rows. The following all produce the same result:

## Boolean indexing
occs_species = occs[occs.accepted_rank .== "species", :]

## filter with a row-function
occs_species = filter(r -> r.accepted_rank == "species", occs)

## subset with a column-function
occs_species = subset(occs, :accepted_rank => ByRow(r -> r == "species"))

## subset with operator shorthand
occs_species = subset(occs, :accepted_rank => ByRow(== "species"))

filter receives the entire row as a NamedTuple, making it natural when the condition involves multiple columns. subset receives one column at a time, which is more composable when building up complex multi-column filters incrementally.

2.6.2 Chronological precision

All PBDB columns that encode dates contain _ma in their names (for millions of years). Selecting them by regex gives a quick overview:

occs_species[:, r".*_ma.*"]
697×14 DataFrame
672 rows omitted
Row max_ma min_ma direct_ma_value direct_ma_error direct_ma_unit direct_ma_method max_ma_value max_ma_error max_ma_unit max_ma_method min_ma_value min_ma_error min_ma_unit min_ma_method
Float64 Float64 Float64? Float64? String3? String7? Float64? Missing String3? String7? Float64? Float64? String3? String7?
1 37.71 33.9 missing missing missing missing missing missing missing missing missing missing missing missing
2 27.3 23.04 missing missing missing missing missing missing missing missing missing missing missing missing
3 27.3 23.04 missing missing missing missing missing missing missing missing missing missing missing missing
4 11.63 7.246 missing missing missing missing missing missing missing missing missing missing missing missing
5 11.63 7.246 missing missing missing missing missing missing missing missing missing missing missing missing
6 11.63 7.246 missing missing missing missing missing missing missing missing missing missing missing missing
7 18.5 12.5 missing missing missing missing missing missing missing missing missing missing missing missing
8 15.98 13.82 missing missing missing missing missing missing missing missing missing missing missing missing
9 0.774 0.0117 missing missing missing missing missing missing missing missing missing missing missing missing
10 0.774 0.0117 missing missing missing missing missing missing missing missing missing missing missing missing
11 0.774 0.0117 missing missing missing missing missing missing missing missing missing missing missing missing
12 0.774 0.0117 missing missing missing missing missing missing missing missing missing missing missing missing
13 0.774 0.0117 missing missing missing missing missing missing missing missing missing missing missing missing
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
686 18.5 16.3 missing missing missing missing missing missing
687 18.5 16.3 missing missing missing missing missing missing
688 18.5 16.3 missing missing missing missing missing missing
689 18.5 16.3 missing missing missing missing missing missing
690 18.5 16.3 missing missing missing missing missing missing
691 18.5 16.3 missing missing missing missing missing missing
692 18.5 16.3 missing missing missing missing missing missing
693 18.5 16.3 missing missing missing missing missing missing
694 18.5 16.3 missing missing missing missing missing missing
695 18.5 16.3 missing missing missing missing missing missing
696 18.5 16.3 missing missing missing missing missing missing
697 18.5 16.3 missing missing missing missing missing missing

There are several trade-offs when choosing which age columns to require:

  • direct_ma_value is populated when a collection has been directly dated (e.g. by radiometric methods) — highest precision, lowest coverage.
  • max_ma / min_ma bracket the age of the stratigraphic unit in which the fossil was found — broader coverage, lower precision.
  • ma_error captures measurement uncertainty — rarely populated.

Check the data coverage for each:

println("Direct MA measurements: ", nrow(dropmissing(occs_species, r".*direct_ma.*")))
println("Max MA measurements:    ", nrow(dropmissing(occs_species, r".*max_ma.*")))
println("Min MA measurements:    ", nrow(dropmissing(occs_species, r".*min_ma.*")))
println("MA error measurements:  ", nrow(dropmissing(occs_species, r".*ma_error*")))
Direct MA measurements: 11
Max MA measurements:    0
Min MA measurements:    12
MA error measurements:  0

Requiring direct_ma_value maximizes precision but discards the majority of records. For exploratory work, min_ma and max_ma together are the practical choice — they are present for nearly every record that has been properly entered. We retain direct_ma_value as well, since it is useful when available:

occs_with_ages = dropmissing(occs_species, [:direct_ma_value, :max_ma, :min_ma])
11×135 DataFrame
35 columns omitted
Row occurrence_no reid_no flags collection_no identified_name identified_rank identified_no difference accepted_name accepted_attr accepted_rank accepted_no early_interval late_interval max_ma min_ma ref_author ref_pubyr reference_no phylum class order family genus plant_organ abund_value abund_unit lng lat occurrence_comments collection_name container_no collection_aka cc state county latlng_basis latlng_precision altitude_value altitude_unit geogscale geogcomments paleomodel geoplate paleoage paleolng paleolat cc_1 protected direct_ma_value direct_ma_error direct_ma_unit direct_ma_method max_ma_value max_ma_error max_ma_unit max_ma_method min_ma_value min_ma_error min_ma_unit min_ma_method formation geological_group member stratscale zone zone_type localsection localbed localbedunit localorder regionalsection regionalbed regionalbedunit regionalorder stratcomments lithdescript lithology1 lithadj1 lithification1 minor_lithology1 fossilsfrom1 lithology2 lithadj2 lithification2 minor_lithology2 fossilsfrom2 environment tectonic_setting geology_comments size_classes articulated_parts associated_parts common_body_parts rare_body_parts feed_pred_traces artifacts component_comments pres_mode preservation_quality ⋯
String15 String15? Missing String15 String String15 String15 String31? String Missing String15 String15 String31 String31? Float64 Float64 String31 Int64 String15 String15 String15 String15 String31 String31 Missing Int64? String15? Float64 Float64 String String Missing String? String3 String15 String31? String31 String7 Int64? String7? String31 String String7 String31 String3 Float64? Float64? String3 String3? Float64 Float64? String3? String7? Float64? Missing String3? String7? Float64? Float64? String3? String7? String31 String15? String? String15 Missing Missing String15? String15? String3? String15? Missing Missing Missing Missing String String String31 String String31? String31? String1 String15 String31 String15? String15? String1 String31? Missing String? String String7? String7? String7? String15? Missing Missing Missing String String7? ⋯
1 occ:181308 missing col:17476 Palaeogale minuta species txn:49650 Palaeogale minuta missing species txn:48859 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Palaeogalidae Palaeogale missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
2 occ:181309 missing col:17476 Acheronictis webbi n. sp. species txn:43716 Acheronictis webbi missing species txn:43716 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Mustelidae Acheronictis missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
3 occ:181310 missing col:17476 Arikarictis chapini n. sp. species txn:44318 Arikarictis chapini missing species txn:44318 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Mustelidae Arikarictis missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
4 occ:181311 missing col:17476 Enhydrocyon cf. pahinsintewakpa species txn:46086 Enhydrocyon pahinsintewakpa missing species txn:51890 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Canidae Enhydrocyon missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
5 occ:181312 missing col:17476 Osbornodon wangi n. sp. species txn:49466 Osbornodon wangi missing species txn:49466 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Canidae Osbornodon missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
6 occ:181313 missing col:17476 Phlaocyon taylori n. sp. species txn:50350 Phlaocyon taylori missing species txn:50350 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Canidae Phlaocyon missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
7 occ:183071 missing col:17837 Mammacyon cf. obtusidens species txn:47861 Mammacyon obtusidens missing species txn:47861 Harrisonian 23.1 18.5 Frailey 1978 1978 ref:1543 Chordata Mammalia Carnivora Amphicyonidae Mammacyon missing 4 specimens -83.0 30.3 SB-1A missing Live Oak US Florida Suwannee based on political unit 1 missing small collection 1 mi N of Live Oak gplates 109 mid -72.16 33.14 US 24.5 0.5 Ma unknown missing missing missing missing group of beds missing missing missing missing missing missing 25–24 Ma (Tedford et al., 2004)\\r\\nHarrisonian not reported terrestrial indet. missing macrofossils,mesofossils missing missing missing body ⋯
8 occ:183073 rei:5071 missing col:17837 Phlaocyon leucosteus species txn:50345 Phlaocyon leucosteus missing species txn:50345 Harrisonian 23.1 18.5 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Phlaocyon missing 1 specimens -83.0 30.3 SB-1A missing Live Oak US Florida Suwannee based on political unit 1 missing small collection 1 mi N of Live Oak gplates 109 mid -72.16 33.14 US 24.5 0.5 Ma unknown missing missing missing missing group of beds missing missing missing missing missing missing 25–24 Ma (Tedford et al., 2004)\\r\\nHarrisonian not reported terrestrial indet. missing macrofossils,mesofossils missing missing missing body ⋯
9 occ:183074 rei:5072 missing col:17837 Megalictis frazieri species txn:48010 Megalictis frazieri missing species txn:50010 Harrisonian 23.1 18.5 Alroy 2002 2002 ref:6294 Chordata Mammalia Carnivora Mustelidae Megalictis missing 1 specimens -83.0 30.3 SB-1A missing Live Oak US Florida Suwannee based on political unit 1 missing small collection 1 mi N of Live Oak gplates 109 mid -72.16 33.14 US 24.5 0.5 Ma unknown missing missing missing missing group of beds missing missing missing missing missing missing 25–24 Ma (Tedford et al., 2004)\\r\\nHarrisonian not reported terrestrial indet. missing macrofossils,mesofossils missing missing missing body ⋯
10 occ:183075 rei:5073 missing col:17837 Palaeogale minuta species txn:49650 Palaeogale minuta missing species txn:48859 Harrisonian 23.1 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Palaeogalidae Palaeogale missing 1 specimens -83.0 30.3 SB-1A missing Live Oak US Florida Suwannee based on political unit 1 missing small collection 1 mi N of Live Oak gplates 109 mid -72.16 33.14 US 24.5 0.5 Ma unknown missing missing missing missing group of beds missing missing missing missing missing missing 25–24 Ma (Tedford et al., 2004)\\r\\nHarrisonian not reported terrestrial indet. missing macrofossils,mesofossils missing missing missing body ⋯
11 occ:183076 missing col:17837 Arikarictis chapini species txn:44318 Arikarictis chapini missing species txn:44318 Harrisonian 23.1 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Mustelidae Arikarictis missing missing -83.0 30.3 SB-1A missing Live Oak US Florida Suwannee based on political unit 1 missing small collection 1 mi N of Live Oak gplates 109 mid -72.16 33.14 US 24.5 0.5 Ma unknown missing missing missing missing group of beds missing missing missing missing missing missing 25–24 Ma (Tedford et al., 2004)\\r\\nHarrisonian not reported terrestrial indet. missing macrofossils,mesofossils missing missing missing body ⋯

2.6.3 Spatial completeness

The PBDB provides two sets of coordinates for each collection: modern geographic coordinates and reconstructed paleocoordinates (the location of the fossil site at the time of deposition, accounting for plate movement). Check the coverage of each:

modern_coords = nrow(dropmissing(occs_with_ages, [:lng, :lat]))
println("Records with modern coordinates:    ", modern_coords)

paleo_coords = nrow(dropmissing(occs_with_ages, [:paleolng, :paleolat]))
println("Records with paleo-coordinates:     ", paleo_coords)

complete_spatial = nrow(dropmissing(occs_with_ages, [:lng, :lat, :paleolng, :paleolat]))
println("Records with complete spatial data: ", complete_spatial)
Records with modern coordinates:    11
Records with paleo-coordinates:     11
Records with complete spatial data: 11

For visualization on a present-day map, modern coordinates are sufficient. For paleogeographic analyses, paleocoordinates are essential. We require both:

occs_clean = dropmissing(occs_with_ages, [:lng, :lat, :paleolng, :paleolat])
11×135 DataFrame
35 columns omitted
Row occurrence_no reid_no flags collection_no identified_name identified_rank identified_no difference accepted_name accepted_attr accepted_rank accepted_no early_interval late_interval max_ma min_ma ref_author ref_pubyr reference_no phylum class order family genus plant_organ abund_value abund_unit lng lat occurrence_comments collection_name container_no collection_aka cc state county latlng_basis latlng_precision altitude_value altitude_unit geogscale geogcomments paleomodel geoplate paleoage paleolng paleolat cc_1 protected direct_ma_value direct_ma_error direct_ma_unit direct_ma_method max_ma_value max_ma_error max_ma_unit max_ma_method min_ma_value min_ma_error min_ma_unit min_ma_method formation geological_group member stratscale zone zone_type localsection localbed localbedunit localorder regionalsection regionalbed regionalbedunit regionalorder stratcomments lithdescript lithology1 lithadj1 lithification1 minor_lithology1 fossilsfrom1 lithology2 lithadj2 lithification2 minor_lithology2 fossilsfrom2 environment tectonic_setting geology_comments size_classes articulated_parts associated_parts common_body_parts rare_body_parts feed_pred_traces artifacts component_comments pres_mode preservation_quality ⋯
String15 String15? Missing String15 String String15 String15 String31? String Missing String15 String15 String31 String31? Float64 Float64 String31 Int64 String15 String15 String15 String15 String31 String31 Missing Int64? String15? Float64 Float64 String String Missing String? String3 String15 String31? String31 String7 Int64? String7? String31 String String7 String31 String3 Float64 Float64 String3 String3? Float64 Float64? String3? String7? Float64? Missing String3? String7? Float64? Float64? String3? String7? String31 String15? String? String15 Missing Missing String15? String15? String3? String15? Missing Missing Missing Missing String String String31 String String31? String31? String1 String15 String31 String15? String15? String1 String31? Missing String? String String7? String7? String7? String15? Missing Missing Missing String String7? ⋯
1 occ:181308 missing col:17476 Palaeogale minuta species txn:49650 Palaeogale minuta missing species txn:48859 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Palaeogalidae Palaeogale missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
2 occ:181309 missing col:17476 Acheronictis webbi n. sp. species txn:43716 Acheronictis webbi missing species txn:43716 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Mustelidae Acheronictis missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
3 occ:181310 missing col:17476 Arikarictis chapini n. sp. species txn:44318 Arikarictis chapini missing species txn:44318 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Mustelidae Arikarictis missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
4 occ:181311 missing col:17476 Enhydrocyon cf. pahinsintewakpa species txn:46086 Enhydrocyon pahinsintewakpa missing species txn:51890 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Canidae Enhydrocyon missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
5 occ:181312 missing col:17476 Osbornodon wangi n. sp. species txn:49466 Osbornodon wangi missing species txn:49466 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Canidae Osbornodon missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
6 occ:181313 missing col:17476 Phlaocyon taylori n. sp. species txn:50350 Phlaocyon taylori missing species txn:50350 Arikareean 29.5 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Canidae Phlaocyon missing missing -82.2 28.5 Brooksville 2 missing US Florida Hernando based on political unit 1 missing outcrop Florida Rock Industries Limerock Mine, 8 km NE of Brooksville\\r\\nBrooksville 2 is a series of at least five fissure-fill deposits gplates 109 mid -71.63 31.26 US 27.0 1.0 Ma other missing missing missing missing group of beds missing missing missing missing missing missing Middle Arikareean\\r\\nThe age of the deposit was considered to be 26–25 Ma by Tedford et al. (2004) based on mammalian biochronology. Later analysis by Czaplewski and Morgan (2012) and Morgan et al. (2019) suggested a slightly older age range of 28–26 Ma laminated clay and sand claystone planar lamination Y sandstone Y fissure fill missing macrofossils,mesofossils missing missing missing missing body ⋯
7 occ:183071 missing col:17837 Mammacyon cf. obtusidens species txn:47861 Mammacyon obtusidens missing species txn:47861 Harrisonian 23.1 18.5 Frailey 1978 1978 ref:1543 Chordata Mammalia Carnivora Amphicyonidae Mammacyon missing 4 specimens -83.0 30.3 SB-1A missing Live Oak US Florida Suwannee based on political unit 1 missing small collection 1 mi N of Live Oak gplates 109 mid -72.16 33.14 US 24.5 0.5 Ma unknown missing missing missing missing group of beds missing missing missing missing missing missing 25–24 Ma (Tedford et al., 2004)\\r\\nHarrisonian not reported terrestrial indet. missing macrofossils,mesofossils missing missing missing body ⋯
8 occ:183073 rei:5071 missing col:17837 Phlaocyon leucosteus species txn:50345 Phlaocyon leucosteus missing species txn:50345 Harrisonian 23.1 18.5 Wang et al. 1999 1999 ref:3558 Chordata Mammalia Carnivora Canidae Phlaocyon missing 1 specimens -83.0 30.3 SB-1A missing Live Oak US Florida Suwannee based on political unit 1 missing small collection 1 mi N of Live Oak gplates 109 mid -72.16 33.14 US 24.5 0.5 Ma unknown missing missing missing missing group of beds missing missing missing missing missing missing 25–24 Ma (Tedford et al., 2004)\\r\\nHarrisonian not reported terrestrial indet. missing macrofossils,mesofossils missing missing missing body ⋯
9 occ:183074 rei:5072 missing col:17837 Megalictis frazieri species txn:48010 Megalictis frazieri missing species txn:50010 Harrisonian 23.1 18.5 Alroy 2002 2002 ref:6294 Chordata Mammalia Carnivora Mustelidae Megalictis missing 1 specimens -83.0 30.3 SB-1A missing Live Oak US Florida Suwannee based on political unit 1 missing small collection 1 mi N of Live Oak gplates 109 mid -72.16 33.14 US 24.5 0.5 Ma unknown missing missing missing missing group of beds missing missing missing missing missing missing 25–24 Ma (Tedford et al., 2004)\\r\\nHarrisonian not reported terrestrial indet. missing macrofossils,mesofossils missing missing missing body ⋯
10 occ:183075 rei:5073 missing col:17837 Palaeogale minuta species txn:49650 Palaeogale minuta missing species txn:48859 Harrisonian 23.1 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Palaeogalidae Palaeogale missing 1 specimens -83.0 30.3 SB-1A missing Live Oak US Florida Suwannee based on political unit 1 missing small collection 1 mi N of Live Oak gplates 109 mid -72.16 33.14 US 24.5 0.5 Ma unknown missing missing missing missing group of beds missing missing missing missing missing missing 25–24 Ma (Tedford et al., 2004)\\r\\nHarrisonian not reported terrestrial indet. missing macrofossils,mesofossils missing missing missing body ⋯
11 occ:183076 missing col:17837 Arikarictis chapini species txn:44318 Arikarictis chapini missing species txn:44318 Harrisonian 23.1 18.5 Hayes 2000 2000 ref:4229 Chordata Mammalia Carnivora Mustelidae Arikarictis missing missing -83.0 30.3 SB-1A missing Live Oak US Florida Suwannee based on political unit 1 missing small collection 1 mi N of Live Oak gplates 109 mid -72.16 33.14 US 24.5 0.5 Ma unknown missing missing missing missing group of beds missing missing missing missing missing missing 25–24 Ma (Tedford et al., 2004)\\r\\nHarrisonian not reported terrestrial indet. missing macrofossils,mesofossils missing missing missing body ⋯

2.6.4 Taxonomic diversity after filtering

After all three quality filters, examine what diversity remains:

species_counts = combine(groupby(occs_clean, :accepted_name), nrow)
genus_counts   = combine(groupby(occs_clean, :genus), nrow)
family_counts  = combine(groupby(occs_clean, :family), nrow)

println("Species:  ", nrow(species_counts))
println("Genera:   ", nrow(genus_counts))
println("Families: ", nrow(family_counts))
Species:  9
Genera:   8
Families: 4

2.7 Missing values in Julia

TipThe Missing type

Julia represents absent data with the singleton value missing, which belongs to the type Missing. A column that might contain absent data has an element type like Union{Missing, Float64} — it can hold either a real number or the missing sentinel.

Arithmetic and comparisons that involve missing propagate it silently:

missing + 1.0    # → missing
missing > 0.0    # → missing

This is intentional: the result of an operation on unknown data is itself unknown. The implication for filtering is that df[df.lng .> -180, :] will silently drop rows where lng is missing, because the comparison returns missing rather than true. Using dropmissing (above) is the explicit, correct approach.

The dropmissing function accepts an optional column list that restricts the check to only those columns — rows with missing in other columns are kept. This deliberate targeting is important: requiring every column to be non-missing is almost always too aggressive. In our quality assessment above, we applied it three times, each time targeting only the columns relevant to that dimension of quality.

Called with no column argument, dropmissing(df) removes any row with a missing value in any column. Use this form only when you genuinely need a fully complete dataset.

Let us confirm that our final filtered dataset is free of missing values in the columns we care about:

describe(occs_clean[:, [:lng, :lat, :paleolng, :paleolat, :min_ma, :max_ma]])
6×7 DataFrame
Row variable mean min median max nmissing eltype
Symbol Float64 Float64 Float64 Float64 Int64 DataType
1 lng -82.5636 -83.0 -82.2 -82.2 0 Float64
2 lat 29.3182 28.5 28.5 30.3 0 Float64
3 paleolng -71.8709 -72.16 -71.63 -71.63 0 Float64
4 paleolat 32.1145 31.26 31.26 33.14 0 Float64
5 min_ma 18.5 18.5 18.5 18.5 0 Float64
6 max_ma 26.5909 23.1 29.5 29.5 0 Float64

3 Visualization of occurrences in space and time

The systematic quality assessment above applies to any clade. With the pipeline established, we can now focus on a single family for the visualization work that follows. Canidae — the family containing wolves, foxes, and their extinct relatives — has a well-sampled fossil record extending back to the Eocene, making it a useful working example.

We pull the full Canidae dataset:

canids = pbdb_occurrences(
    base_name = "Canidae",
    show      = "full",
    vocab     = "pbdb",
)
3483×136 DataFrame
36 columns and 3458 rows omitted
Row occurrence_no record_type reid_no flags collection_no identified_name identified_rank identified_no difference accepted_name accepted_attr accepted_rank accepted_no early_interval late_interval max_ma min_ma ref_author ref_pubyr reference_no phylum class order family genus plant_organ abund_value abund_unit lng lat occurrence_comments collection_name container_no collection_aka cc state county latlng_basis latlng_precision altitude_value altitude_unit geogscale geogcomments paleomodel geoplate paleoage paleolng paleolat cc_1 protected direct_ma_value direct_ma_error direct_ma_unit direct_ma_method max_ma_value max_ma_error max_ma_unit max_ma_method min_ma_value min_ma_error min_ma_unit min_ma_method formation geological_group member stratscale zone zone_type localsection localbed localbedunit localorder regionalsection regionalbed regionalbedunit regionalorder stratcomments lithdescript lithology1 lithadj1 lithification1 minor_lithology1 fossilsfrom1 lithology2 lithadj2 lithification2 minor_lithology2 fossilsfrom2 environment tectonic_setting geology_comments size_classes articulated_parts associated_parts common_body_parts rare_body_parts feed_pred_traces artifacts component_comments pres_mode ⋯
Int64 String3 Int64? Missing Int64 String String15 Int64 String31? String Missing String15 Int64 String31 String31 Float64 Float64 String Int64 Int64 String15 String15 String15 String7 String15 Missing String7? String15? Float64 Float64 String? String Int64? String? String3 String String31 String31 String7 Int64? String7? String31 String String7 String31 String3 Float64? Float64? String3 String3? Float64? Float64? String3? String31? Float64? Float64? String3? String31? Float64? Float64? String3? String31? String31? String31? String31? String15? String? String15? String? String31 String3? String15? String31? String15? String1? String15? String String String String String31 String31 String1 String31? String? String31? String31? String1? String31 String31? String String String7? String7? String? String31? String? String? String? String ⋯
1 150070 occ missing missing 13293 Cuon sp. genus 41204 missing Cuon missing genus 41204 Middle Pleistocene Late Pleistocene 0.774 0.0117 Huang et al. 1988 1988 4412 Chordata Mammalia Carnivora Canidae Cuon missing missing 111.567 22.7667 missing Xiashan Cave, lower part (Guangdong Province) missing missing CN Guangdong Luoding based on nearby landmark minutes missing missing small collection Xiashan Cave, Xiashan River Valley, at the western side of Dayunwu Mountain, 4 km from the Capital of Pingtang District (Southwest of Luoding). Lat long is for Luoding.\\n gplates not computable using this model mid missing missing CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing lower layer missing missing missing missing missing "The sediments of the site belong to cave accumulations..."\\nlate middle Pleistocene See stratigraphic column, Fig. 2 claystone brown,yellow unlithified sandy Y missing missing missing missing terrestrial indet. missing cave deposits macrofossils,mesofossils missing missing missing missing body ⋯
2 176227 occ 3083 missing 16626 Hesperocyon sp. genus 41217 missing Hesperocyon missing genus 41217 Late Uintan 45.9 39.7 Bryant 1991 1991 1139 Chordata Mammalia Carnivora Canidae Hesperocyon missing 1 specimens -107.6 50.2 missing Swift Current Creek missing missing CA Saskatchewan based on nearby landmark 1 792 meters outcrop 13 mi. ESE of Swift Current gplates 101 mid -83.75 56.77 CA missing missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing group of beds missing missing missing missing missing missing locality occurs "in the upper beds" of a 15 m section "unconsolidated lens" in a formation of "sands, gravels, and conglomerates and sandstones with calcareous cement" sandstone conglomerate missing missing missing fluvial indet. missing environment of formation was "braided streams" macrofossils,mesofossils missing missing missing missing body ⋯
3 177551 occ missing missing 16840 ? Hesperocyon sp. genus 41217 missing Hesperocyon missing genus 41217 Duchesnean Chadronian 39.7 33.9 Tabrum et al. 1996 1996 3359 Chordata Mammalia Carnivora Canidae Hesperocyon missing -112.8 45.0 missing Diamond O Ranch missing MV 6726, MV 6727, MV 6728, MV 6729, MV 6730 US Montana Beaverhead based on political unit 1 missing small collection 2 mi W of Beaverheard Rock gplates 127 mid -95.46 51.92 US FED missing missing missing missing missing missing missing missing missing missing missing missing Renova missing Climbing Arrow missing JeffB 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
4 177611 occ 3336 missing 16845 Hesperocyon cf. gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Duchesnean 39.7 37.0 Wang 1994 1994 6226 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.9 50.0 missing Lac Pelletier Lower Fauna missing SMNH Locs. 72G13-001, -003, -005, -006, -009, 72J04-001, 004, -006 CA Saskatchewan based on nearby landmark 1 missing small collection 23 km SSW of Swift Current gplates 101 mid -86.94 56.36 CA missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing LacPl 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
5 177863 occ 3406 missing 16888 Prohesperocyon wilsoni species 50768 missing Prohesperocyon wilsoni missing species 46917 Chadronian 37.0 33.9 Gustafson 1986 1986 1758 Chordata Mammalia Carnivora Canidae Prohesperocyon missing -104.2 29.9 missing Airstrip (TMM 40504) missing US Texas Presidio based on political unit 1 missing small collection gplates 101 mid -89.99 35.83 US missing missing missing missing missing missing missing missing missing missing missing missing Capote Mountain Tuff missing missing Vieja 8 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
6 177871 occ missing missing 16891 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Wang 1994 1994 6226 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.1 43.2 "Chadronian" from 6 mi NW and 3 mi NW of Alcova Alcova missing US Wyoming Natrona based on political unit 1 missing Bates Hole gplates 101 mid -90.2 49.33 US FED missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
7 177897 occ missing missing 16896 Hesperocyon cf. gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Harshman 1972 1972 1803 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.1 43.2 includes "H. cf. paterculus" Bates Hole Lower White River 1 missing US Wyoming Natrona based on political unit 1 missing gplates 101 mid -90.2 49.33 US FED missing missing missing missing missing missing missing missing missing missing missing missing White River missing lower missing Bates 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
8 177901 occ 3419 missing 16898 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Alroy 2002 2002 6294 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.1 43.2 Bates Hole (Reed Collection) missing US Wyoming Natrona based on political unit 1 missing county assignment uncertain gplates 101 mid -90.2 49.33 US FED missing missing missing missing missing missing missing missing missing missing missing missing White River missing lower missing missing missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
9 177975 occ missing missing 16917 Hesperocyon sp. genus 41217 missing Hesperocyon missing genus 41217 Chadronian 37.0 33.9 Ostrander 1985 1985 2716 Chordata Mammalia Carnivora Canidae Hesperocyon missing -103.8 42.4 Bone Cove missing US Nebraska Sioux based on political unit 1 missing gplates 101 mid -86.81 48.11 US missing missing missing missing missing missing missing missing missing missing missing missing Chadron missing Big Cottonwood Creek missing PineR 1 missing bottom to top missing missing missing member assignment based on LaGarry pers. commun. 19 May 2005 missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
10 177988 occ missing missing 16918 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Gustafson 1986 1986 1757 Chordata Mammalia Carnivora Canidae Hesperocyon missing -103.1 42.8 specimen from KU-NE-083 "Chadronian" of Walter Brecht Ranch listed by Wang 1994 Brecht Ranch missing TNAS US Nebraska Dawes based on political unit 1 missing small collection gplates 101 mid -85.94 48.41 US missing missing missing missing missing missing missing missing missing missing missing missing Chadron missing missing PineR 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
11 178029 occ missing missing 16919 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Storer 1996 1996 3318 Chordata Mammalia Carnivora Canidae Hesperocyon missing -109.883 49.5667 see Russell 1972: includes "Alloeodectes mcgrewi" of Russell 1984 according to Bryant 1991, Wang 1994; also includes cf. "Hyaenodon minutus" according to Bryant 1993 Calf Creek missing SMNH Locality 6; SMNH 72F10-0001; ROM V-37-44; NMC Localities 115 and 117; Bone Coulee; "Cypress Hills" in part; Hunter Quarry CA Saskatchewan based on nearby landmark minutes missing outcrop 16 km NW of Eastend and apparently north of Fort Walsh (basis of coordinate) gplates 101 mid -91.37 55.94 CA missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing missing missing missing missing not reported missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
12 178030 occ missing missing 16919 Canidae indet. family 41189 missing Canidae missing family 41189 Chadronian 37.0 33.9 Storer 1996 1996 3318 Chordata Mammalia Carnivora Canidae missing -109.883 49.5667 Calf Creek missing SMNH Locality 6; SMNH 72F10-0001; ROM V-37-44; NMC Localities 115 and 117; Bone Coulee; "Cypress Hills" in part; Hunter Quarry CA Saskatchewan based on nearby landmark minutes missing outcrop 16 km NW of Eastend and apparently north of Fort Walsh (basis of coordinate) gplates 101 mid -91.37 55.94 CA missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing missing missing missing missing not reported missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
13 178086 occ missing missing 16920 Pseudocynodictis cf. paterculus species 51185 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Van Houten 1964 1964 3479 Chordata Mammalia Carnivora Canidae Hesperocyon missing -108.2 43.2 Cameron Spring missing Locality 19; Cameron Springs US Wyoming Fremont based on political unit 1 missing small collection gplates 101 mid -91.42 49.47 US missing missing missing missing missing missing missing missing missing missing missing missing White River missing missing missing missing missing missing not reported missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
3472 2041254 occ missing missing 196908 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Detry and Arruda 2013 2013 93416 Chordata Mammalia Carnivora Canidae Canis missing -8.64325 40.7527 Monte Moliäo missing PT based on nearby landmark 5 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing 400 BC–AD 0; AD 0–200 not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3473 2041319 occ missing missing 275782 Canis sp. genus 41198 Canis missing genus 41198 Holocene 0.0117 0.0 Davis 2006 2006 93418 Chordata Mammalia Carnivora Canidae Canis missing -8.26563 39.9116 Alcáçova de Santarém missing PT Leiria based on political unit 6 missing Santarém, on the right bank of the Tagus River, is 78 kilometres northeast of Lisbon gplates not computable using this model mid missing missing PT missing missing missing missing missing missing They derive from 18 levels — most dated to the Iron Age, Roman and Moslem periods. not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3474 2041333 occ missing missing 275784 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Pereira 2013 2013 93419 Chordata Mammalia Carnivora Canidae Canis missing -8.2015 37.1794 Castelo de Paderne missing PT Faro based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3475 2045484 occ missing missing 276279 Canis lupus species 44860 Canis lupus missing species 44860 Late Pleistocene 0.129 0.0117 Moreno-García and Pimenta 2002 2002 93654 Chordata Mammalia Carnivora Canidae Canis missing -8.73528 39.7553 Abrigo do Lagar Velho missing PT Leiria based on nearby landmark seconds missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported marine indet. macrofossils body,anthropogenic ⋯
3476 2045485 occ missing missing 276279 Vulpes vulpes species 52691 Vulpes vulpes missing species 44890 Late Pleistocene 0.129 0.0117 Moreno-García and Pimenta 2002 2002 93654 Chordata Mammalia Carnivora Canidae Vulpes missing -8.73528 39.7553 Abrigo do Lagar Velho missing PT Leiria based on nearby landmark seconds missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported marine indet. macrofossils body,anthropogenic ⋯
3477 2046556 occ missing missing 276394 Canis familiearis genus 41198 species not entered Canis missing genus 41198 Holocene 0.0117 0.0 Gomez et al. 2001 2001 93700 Chordata Mammalia Carnivora Canidae Canis missing -8.43822 37.1892 Silves missing PT Faro based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3478 2046644 occ missing missing 276407 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Moreno García and Pimenta 2020 2020 93710 Chordata Mammalia Carnivora Canidae Canis missing -7.6611 37.643 Biblioteca de Mértola missing PT Beja Beja based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3479 2046651 occ missing missing 276407 Vulpes vulpes species 52691 Vulpes vulpes missing species 44890 Holocene 0.0117 0.0 Moreno García and Pimenta 2020 2020 93710 Chordata Mammalia Carnivora Canidae Vulpes missing -7.6611 37.643 Biblioteca de Mértola missing PT Beja Beja based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3480 2046774 occ missing missing 276424 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Detry et al. 2021 2021 93709 Chordata Mammalia Carnivora Canidae Canis missing -9.18333 38.6667 Largo do Coreto missing Bandstand square, Carnide PT Setúbal District based on nearby landmark 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3481 2048227 occ missing missing 276686 Canis cf. mosbachensis species 366711 Canis mosbachensis missing species 366711 Ionian 0.774 0.129 Dehm 1962 1962 93840 Chordata Mammalia Carnivora Canidae Canis missing 11.1153 48.9103 "[right p4, 2 right m1, left mandibular fragment with m1-talonid and m2]" Schernfeld Fissure-Filling missing DE Bayern based on nearby landmark seconds missing outcrop "[Around 400m NE of the edge of Schernfeld, towards Eichstätt]" (Dehm 1962) gplates not computable using this model mid missing missing DE missing missing missing missing missing missing "Older Pleistocene, especially to the Cromerian" (Dehm 1962), this likely corresponds to the Ionian sandstone fluvial-lacustrine indet. macrofossils body ⋯
3482 2048228 occ missing missing 276686 Alopex aff. praeglacialis species 474917 Alopex praeglacialis missing species 474917 Ionian 0.774 0.129 Dehm 1962 1962 93840 Chordata Mammalia Carnivora Canidae Vulpes missing 11.1153 48.9103 "[right mandibular segment with almost fresh m1-2, and alveoli of p1-4 and m3, and corresponding left mandibular segment with p1-3 alveoli]" Schernfeld Fissure-Filling missing DE Bayern based on nearby landmark seconds missing outcrop "[Around 400m NE of the edge of Schernfeld, towards Eichstätt]" (Dehm 1962) gplates not computable using this model mid missing missing DE missing missing missing missing missing missing "Older Pleistocene, especially to the Cromerian" (Dehm 1962), this likely corresponds to the Ionian sandstone fluvial-lacustrine indet. macrofossils body ⋯
3483 2056564 occ missing missing 277762 Phlaocyon taylori species 50350 Phlaocyon taylori missing species 50350 late Early Arikareean 29.5 18.5 Albright et al. 2026 2026 94175 Chordata Mammalia Carnivora Canidae Phlaocyon missing -88.6628 31.657 Jones Branch Local Fauna missing US Mississippi Wayne County based on nearby landmark 6 missing gplates 109 mid -78.61 35.05 US missing missing missing missing missing missing Catahoula In this area, the Catahoula Formation is composed of unweathered gray-/green-colored, fissile clays interspersed with interbedded distributary channel and thick fine-grained to coarse graveliferous sands of an emergent delta with marginal marine, brackish water, and terrestrial influences. claystone terrestrial indet. macrofossils,mesofossils body ⋯
size(canids)
(3483, 136)

Drop rows missing either coordinate:

canids_geo = dropmissing(canids, [:lng, :lat])
3483×136 DataFrame
36 columns and 3458 rows omitted
Row occurrence_no record_type reid_no flags collection_no identified_name identified_rank identified_no difference accepted_name accepted_attr accepted_rank accepted_no early_interval late_interval max_ma min_ma ref_author ref_pubyr reference_no phylum class order family genus plant_organ abund_value abund_unit lng lat occurrence_comments collection_name container_no collection_aka cc state county latlng_basis latlng_precision altitude_value altitude_unit geogscale geogcomments paleomodel geoplate paleoage paleolng paleolat cc_1 protected direct_ma_value direct_ma_error direct_ma_unit direct_ma_method max_ma_value max_ma_error max_ma_unit max_ma_method min_ma_value min_ma_error min_ma_unit min_ma_method formation geological_group member stratscale zone zone_type localsection localbed localbedunit localorder regionalsection regionalbed regionalbedunit regionalorder stratcomments lithdescript lithology1 lithadj1 lithification1 minor_lithology1 fossilsfrom1 lithology2 lithadj2 lithification2 minor_lithology2 fossilsfrom2 environment tectonic_setting geology_comments size_classes articulated_parts associated_parts common_body_parts rare_body_parts feed_pred_traces artifacts component_comments pres_mode ⋯
Int64 String3 Int64? Missing Int64 String String15 Int64 String31? String Missing String15 Int64 String31 String31 Float64 Float64 String Int64 Int64 String15 String15 String15 String7 String15 Missing String7? String15? Float64 Float64 String? String Int64? String? String3 String String31 String31 String7 Int64? String7? String31 String String7 String31 String3 Float64? Float64? String3 String3? Float64? Float64? String3? String31? Float64? Float64? String3? String31? Float64? Float64? String3? String31? String31? String31? String31? String15? String? String15? String? String31 String3? String15? String31? String15? String1? String15? String String String String String31 String31 String1 String31? String? String31? String31? String1? String31 String31? String String String7? String7? String? String31? String? String? String? String ⋯
1 150070 occ missing missing 13293 Cuon sp. genus 41204 missing Cuon missing genus 41204 Middle Pleistocene Late Pleistocene 0.774 0.0117 Huang et al. 1988 1988 4412 Chordata Mammalia Carnivora Canidae Cuon missing missing 111.567 22.7667 missing Xiashan Cave, lower part (Guangdong Province) missing missing CN Guangdong Luoding based on nearby landmark minutes missing missing small collection Xiashan Cave, Xiashan River Valley, at the western side of Dayunwu Mountain, 4 km from the Capital of Pingtang District (Southwest of Luoding). Lat long is for Luoding.\\n gplates not computable using this model mid missing missing CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing lower layer missing missing missing missing missing "The sediments of the site belong to cave accumulations..."\\nlate middle Pleistocene See stratigraphic column, Fig. 2 claystone brown,yellow unlithified sandy Y missing missing missing missing terrestrial indet. missing cave deposits macrofossils,mesofossils missing missing missing missing body ⋯
2 176227 occ 3083 missing 16626 Hesperocyon sp. genus 41217 missing Hesperocyon missing genus 41217 Late Uintan 45.9 39.7 Bryant 1991 1991 1139 Chordata Mammalia Carnivora Canidae Hesperocyon missing 1 specimens -107.6 50.2 missing Swift Current Creek missing missing CA Saskatchewan based on nearby landmark 1 792 meters outcrop 13 mi. ESE of Swift Current gplates 101 mid -83.75 56.77 CA missing missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing group of beds missing missing missing missing missing missing locality occurs "in the upper beds" of a 15 m section "unconsolidated lens" in a formation of "sands, gravels, and conglomerates and sandstones with calcareous cement" sandstone conglomerate missing missing missing fluvial indet. missing environment of formation was "braided streams" macrofossils,mesofossils missing missing missing missing body ⋯
3 177551 occ missing missing 16840 ? Hesperocyon sp. genus 41217 missing Hesperocyon missing genus 41217 Duchesnean Chadronian 39.7 33.9 Tabrum et al. 1996 1996 3359 Chordata Mammalia Carnivora Canidae Hesperocyon missing -112.8 45.0 missing Diamond O Ranch missing MV 6726, MV 6727, MV 6728, MV 6729, MV 6730 US Montana Beaverhead based on political unit 1 missing small collection 2 mi W of Beaverheard Rock gplates 127 mid -95.46 51.92 US FED missing missing missing missing missing missing missing missing missing missing missing missing Renova missing Climbing Arrow missing JeffB 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
4 177611 occ 3336 missing 16845 Hesperocyon cf. gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Duchesnean 39.7 37.0 Wang 1994 1994 6226 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.9 50.0 missing Lac Pelletier Lower Fauna missing SMNH Locs. 72G13-001, -003, -005, -006, -009, 72J04-001, 004, -006 CA Saskatchewan based on nearby landmark 1 missing small collection 23 km SSW of Swift Current gplates 101 mid -86.94 56.36 CA missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing LacPl 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
5 177863 occ 3406 missing 16888 Prohesperocyon wilsoni species 50768 missing Prohesperocyon wilsoni missing species 46917 Chadronian 37.0 33.9 Gustafson 1986 1986 1758 Chordata Mammalia Carnivora Canidae Prohesperocyon missing -104.2 29.9 missing Airstrip (TMM 40504) missing US Texas Presidio based on political unit 1 missing small collection gplates 101 mid -89.99 35.83 US missing missing missing missing missing missing missing missing missing missing missing missing Capote Mountain Tuff missing missing Vieja 8 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
6 177871 occ missing missing 16891 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Wang 1994 1994 6226 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.1 43.2 "Chadronian" from 6 mi NW and 3 mi NW of Alcova Alcova missing US Wyoming Natrona based on political unit 1 missing Bates Hole gplates 101 mid -90.2 49.33 US FED missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
7 177897 occ missing missing 16896 Hesperocyon cf. gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Harshman 1972 1972 1803 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.1 43.2 includes "H. cf. paterculus" Bates Hole Lower White River 1 missing US Wyoming Natrona based on political unit 1 missing gplates 101 mid -90.2 49.33 US FED missing missing missing missing missing missing missing missing missing missing missing missing White River missing lower missing Bates 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
8 177901 occ 3419 missing 16898 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Alroy 2002 2002 6294 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.1 43.2 Bates Hole (Reed Collection) missing US Wyoming Natrona based on political unit 1 missing county assignment uncertain gplates 101 mid -90.2 49.33 US FED missing missing missing missing missing missing missing missing missing missing missing missing White River missing lower missing missing missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
9 177975 occ missing missing 16917 Hesperocyon sp. genus 41217 missing Hesperocyon missing genus 41217 Chadronian 37.0 33.9 Ostrander 1985 1985 2716 Chordata Mammalia Carnivora Canidae Hesperocyon missing -103.8 42.4 Bone Cove missing US Nebraska Sioux based on political unit 1 missing gplates 101 mid -86.81 48.11 US missing missing missing missing missing missing missing missing missing missing missing missing Chadron missing Big Cottonwood Creek missing PineR 1 missing bottom to top missing missing missing member assignment based on LaGarry pers. commun. 19 May 2005 missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
10 177988 occ missing missing 16918 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Gustafson 1986 1986 1757 Chordata Mammalia Carnivora Canidae Hesperocyon missing -103.1 42.8 specimen from KU-NE-083 "Chadronian" of Walter Brecht Ranch listed by Wang 1994 Brecht Ranch missing TNAS US Nebraska Dawes based on political unit 1 missing small collection gplates 101 mid -85.94 48.41 US missing missing missing missing missing missing missing missing missing missing missing missing Chadron missing missing PineR 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
11 178029 occ missing missing 16919 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Storer 1996 1996 3318 Chordata Mammalia Carnivora Canidae Hesperocyon missing -109.883 49.5667 see Russell 1972: includes "Alloeodectes mcgrewi" of Russell 1984 according to Bryant 1991, Wang 1994; also includes cf. "Hyaenodon minutus" according to Bryant 1993 Calf Creek missing SMNH Locality 6; SMNH 72F10-0001; ROM V-37-44; NMC Localities 115 and 117; Bone Coulee; "Cypress Hills" in part; Hunter Quarry CA Saskatchewan based on nearby landmark minutes missing outcrop 16 km NW of Eastend and apparently north of Fort Walsh (basis of coordinate) gplates 101 mid -91.37 55.94 CA missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing missing missing missing missing not reported missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
12 178030 occ missing missing 16919 Canidae indet. family 41189 missing Canidae missing family 41189 Chadronian 37.0 33.9 Storer 1996 1996 3318 Chordata Mammalia Carnivora Canidae missing -109.883 49.5667 Calf Creek missing SMNH Locality 6; SMNH 72F10-0001; ROM V-37-44; NMC Localities 115 and 117; Bone Coulee; "Cypress Hills" in part; Hunter Quarry CA Saskatchewan based on nearby landmark minutes missing outcrop 16 km NW of Eastend and apparently north of Fort Walsh (basis of coordinate) gplates 101 mid -91.37 55.94 CA missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing missing missing missing missing not reported missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
13 178086 occ missing missing 16920 Pseudocynodictis cf. paterculus species 51185 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Van Houten 1964 1964 3479 Chordata Mammalia Carnivora Canidae Hesperocyon missing -108.2 43.2 Cameron Spring missing Locality 19; Cameron Springs US Wyoming Fremont based on political unit 1 missing small collection gplates 101 mid -91.42 49.47 US missing missing missing missing missing missing missing missing missing missing missing missing White River missing missing missing missing missing missing not reported missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
3472 2041254 occ missing missing 196908 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Detry and Arruda 2013 2013 93416 Chordata Mammalia Carnivora Canidae Canis missing -8.64325 40.7527 Monte Moliäo missing PT based on nearby landmark 5 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing 400 BC–AD 0; AD 0–200 not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3473 2041319 occ missing missing 275782 Canis sp. genus 41198 Canis missing genus 41198 Holocene 0.0117 0.0 Davis 2006 2006 93418 Chordata Mammalia Carnivora Canidae Canis missing -8.26563 39.9116 Alcáçova de Santarém missing PT Leiria based on political unit 6 missing Santarém, on the right bank of the Tagus River, is 78 kilometres northeast of Lisbon gplates not computable using this model mid missing missing PT missing missing missing missing missing missing They derive from 18 levels — most dated to the Iron Age, Roman and Moslem periods. not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3474 2041333 occ missing missing 275784 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Pereira 2013 2013 93419 Chordata Mammalia Carnivora Canidae Canis missing -8.2015 37.1794 Castelo de Paderne missing PT Faro based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3475 2045484 occ missing missing 276279 Canis lupus species 44860 Canis lupus missing species 44860 Late Pleistocene 0.129 0.0117 Moreno-García and Pimenta 2002 2002 93654 Chordata Mammalia Carnivora Canidae Canis missing -8.73528 39.7553 Abrigo do Lagar Velho missing PT Leiria based on nearby landmark seconds missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported marine indet. macrofossils body,anthropogenic ⋯
3476 2045485 occ missing missing 276279 Vulpes vulpes species 52691 Vulpes vulpes missing species 44890 Late Pleistocene 0.129 0.0117 Moreno-García and Pimenta 2002 2002 93654 Chordata Mammalia Carnivora Canidae Vulpes missing -8.73528 39.7553 Abrigo do Lagar Velho missing PT Leiria based on nearby landmark seconds missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported marine indet. macrofossils body,anthropogenic ⋯
3477 2046556 occ missing missing 276394 Canis familiearis genus 41198 species not entered Canis missing genus 41198 Holocene 0.0117 0.0 Gomez et al. 2001 2001 93700 Chordata Mammalia Carnivora Canidae Canis missing -8.43822 37.1892 Silves missing PT Faro based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3478 2046644 occ missing missing 276407 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Moreno García and Pimenta 2020 2020 93710 Chordata Mammalia Carnivora Canidae Canis missing -7.6611 37.643 Biblioteca de Mértola missing PT Beja Beja based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3479 2046651 occ missing missing 276407 Vulpes vulpes species 52691 Vulpes vulpes missing species 44890 Holocene 0.0117 0.0 Moreno García and Pimenta 2020 2020 93710 Chordata Mammalia Carnivora Canidae Vulpes missing -7.6611 37.643 Biblioteca de Mértola missing PT Beja Beja based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3480 2046774 occ missing missing 276424 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Detry et al. 2021 2021 93709 Chordata Mammalia Carnivora Canidae Canis missing -9.18333 38.6667 Largo do Coreto missing Bandstand square, Carnide PT Setúbal District based on nearby landmark 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3481 2048227 occ missing missing 276686 Canis cf. mosbachensis species 366711 Canis mosbachensis missing species 366711 Ionian 0.774 0.129 Dehm 1962 1962 93840 Chordata Mammalia Carnivora Canidae Canis missing 11.1153 48.9103 "[right p4, 2 right m1, left mandibular fragment with m1-talonid and m2]" Schernfeld Fissure-Filling missing DE Bayern based on nearby landmark seconds missing outcrop "[Around 400m NE of the edge of Schernfeld, towards Eichstätt]" (Dehm 1962) gplates not computable using this model mid missing missing DE missing missing missing missing missing missing "Older Pleistocene, especially to the Cromerian" (Dehm 1962), this likely corresponds to the Ionian sandstone fluvial-lacustrine indet. macrofossils body ⋯
3482 2048228 occ missing missing 276686 Alopex aff. praeglacialis species 474917 Alopex praeglacialis missing species 474917 Ionian 0.774 0.129 Dehm 1962 1962 93840 Chordata Mammalia Carnivora Canidae Vulpes missing 11.1153 48.9103 "[right mandibular segment with almost fresh m1-2, and alveoli of p1-4 and m3, and corresponding left mandibular segment with p1-3 alveoli]" Schernfeld Fissure-Filling missing DE Bayern based on nearby landmark seconds missing outcrop "[Around 400m NE of the edge of Schernfeld, towards Eichstätt]" (Dehm 1962) gplates not computable using this model mid missing missing DE missing missing missing missing missing missing "Older Pleistocene, especially to the Cromerian" (Dehm 1962), this likely corresponds to the Ionian sandstone fluvial-lacustrine indet. macrofossils body ⋯
3483 2056564 occ missing missing 277762 Phlaocyon taylori species 50350 Phlaocyon taylori missing species 50350 late Early Arikareean 29.5 18.5 Albright et al. 2026 2026 94175 Chordata Mammalia Carnivora Canidae Phlaocyon missing -88.6628 31.657 Jones Branch Local Fauna missing US Mississippi Wayne County based on nearby landmark 6 missing gplates 109 mid -78.61 35.05 US missing missing missing missing missing missing Catahoula In this area, the Catahoula Formation is composed of unweathered gray-/green-colored, fissile clays interspersed with interbedded distributary channel and thick fine-grained to coarse graveliferous sands of an emergent delta with marginal marine, brackish water, and terrestrial influences. claystone terrestrial indet. macrofossils,mesofossils body ⋯
nrow(canids) - nrow(canids_geo)   # rows removed
0
nrow(canids_geo)
3483

Confirm the coordinate columns are clean:

describe(canids_geo[:, [:lng, :lat]])
2×7 DataFrame
Row variable mean min median max nmissing eltype
Symbol Float64 Float64 Float64 Float64 Int64 DataType
1 lng -67.5731 -172.993 -100.8 169.848 0 Float64
2 lat 33.6259 -53.3833 40.8 83.1012 0 Float64

Each occurrence record carries two age fields: min_ma (the minimum estimated age in millions of years, i.e. the younger bound) and max_ma (the maximum estimated age, i.e. the older bound). These bounds reflect the age range of the stratigraphic unit in which the fossil was found — they are not point estimates.

describe(canids_geo[:, [:min_ma, :max_ma, :early_interval, :late_interval]])
4×7 DataFrame
Row variable mean min median max nmissing eltype
Symbol Union… Any Union… Any Int64 DataType
1 min_ma 7.6434 0.0 3.6 39.7 0 Float64
2 max_ma 10.4491 0.0117 9.4 56.0 0 Float64
3 early_interval Arikareean late Late Arikareean 0 String31
4 late_interval late Early Hemphillian 0 String31

A natural summary age for plotting purposes is the midpoint of the range. We add this as a derived column using transform:

canids_geo = transform(
    canids_geo,
    [:min_ma, :max_ma] => ((lo, hi) -> (lo .+ hi) ./ 2) => :mid_ma
)
3483×137 DataFrame
37 columns and 3458 rows omitted
Row occurrence_no record_type reid_no flags collection_no identified_name identified_rank identified_no difference accepted_name accepted_attr accepted_rank accepted_no early_interval late_interval max_ma min_ma ref_author ref_pubyr reference_no phylum class order family genus plant_organ abund_value abund_unit lng lat occurrence_comments collection_name container_no collection_aka cc state county latlng_basis latlng_precision altitude_value altitude_unit geogscale geogcomments paleomodel geoplate paleoage paleolng paleolat cc_1 protected direct_ma_value direct_ma_error direct_ma_unit direct_ma_method max_ma_value max_ma_error max_ma_unit max_ma_method min_ma_value min_ma_error min_ma_unit min_ma_method formation geological_group member stratscale zone zone_type localsection localbed localbedunit localorder regionalsection regionalbed regionalbedunit regionalorder stratcomments lithdescript lithology1 lithadj1 lithification1 minor_lithology1 fossilsfrom1 lithology2 lithadj2 lithification2 minor_lithology2 fossilsfrom2 environment tectonic_setting geology_comments size_classes articulated_parts associated_parts common_body_parts rare_body_parts feed_pred_traces artifacts component_comments pres_mode ⋯
Int64 String3 Int64? Missing Int64 String String15 Int64 String31? String Missing String15 Int64 String31 String31 Float64 Float64 String Int64 Int64 String15 String15 String15 String7 String15 Missing String7? String15? Float64 Float64 String? String Int64? String? String3 String String31 String31 String7 Int64? String7? String31 String String7 String31 String3 Float64? Float64? String3 String3? Float64? Float64? String3? String31? Float64? Float64? String3? String31? Float64? Float64? String3? String31? String31? String31? String31? String15? String? String15? String? String31 String3? String15? String31? String15? String1? String15? String String String String String31 String31 String1 String31? String? String31? String31? String1? String31 String31? String String String7? String7? String? String31? String? String? String? String ⋯
1 150070 occ missing missing 13293 Cuon sp. genus 41204 missing Cuon missing genus 41204 Middle Pleistocene Late Pleistocene 0.774 0.0117 Huang et al. 1988 1988 4412 Chordata Mammalia Carnivora Canidae Cuon missing missing 111.567 22.7667 missing Xiashan Cave, lower part (Guangdong Province) missing missing CN Guangdong Luoding based on nearby landmark minutes missing missing small collection Xiashan Cave, Xiashan River Valley, at the western side of Dayunwu Mountain, 4 km from the Capital of Pingtang District (Southwest of Luoding). Lat long is for Luoding.\\n gplates not computable using this model mid missing missing CN missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing lower layer missing missing missing missing missing "The sediments of the site belong to cave accumulations..."\\nlate middle Pleistocene See stratigraphic column, Fig. 2 claystone brown,yellow unlithified sandy Y missing missing missing missing terrestrial indet. missing cave deposits macrofossils,mesofossils missing missing missing missing body ⋯
2 176227 occ 3083 missing 16626 Hesperocyon sp. genus 41217 missing Hesperocyon missing genus 41217 Late Uintan 45.9 39.7 Bryant 1991 1991 1139 Chordata Mammalia Carnivora Canidae Hesperocyon missing 1 specimens -107.6 50.2 missing Swift Current Creek missing missing CA Saskatchewan based on nearby landmark 1 792 meters outcrop 13 mi. ESE of Swift Current gplates 101 mid -83.75 56.77 CA missing missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing group of beds missing missing missing missing missing missing locality occurs "in the upper beds" of a 15 m section "unconsolidated lens" in a formation of "sands, gravels, and conglomerates and sandstones with calcareous cement" sandstone conglomerate missing missing missing fluvial indet. missing environment of formation was "braided streams" macrofossils,mesofossils missing missing missing missing body ⋯
3 177551 occ missing missing 16840 ? Hesperocyon sp. genus 41217 missing Hesperocyon missing genus 41217 Duchesnean Chadronian 39.7 33.9 Tabrum et al. 1996 1996 3359 Chordata Mammalia Carnivora Canidae Hesperocyon missing -112.8 45.0 missing Diamond O Ranch missing MV 6726, MV 6727, MV 6728, MV 6729, MV 6730 US Montana Beaverhead based on political unit 1 missing small collection 2 mi W of Beaverheard Rock gplates 127 mid -95.46 51.92 US FED missing missing missing missing missing missing missing missing missing missing missing missing Renova missing Climbing Arrow missing JeffB 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
4 177611 occ 3336 missing 16845 Hesperocyon cf. gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Duchesnean 39.7 37.0 Wang 1994 1994 6226 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.9 50.0 missing Lac Pelletier Lower Fauna missing SMNH Locs. 72G13-001, -003, -005, -006, -009, 72J04-001, 004, -006 CA Saskatchewan based on nearby landmark 1 missing small collection 23 km SSW of Swift Current gplates 101 mid -86.94 56.36 CA missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing LacPl 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
5 177863 occ 3406 missing 16888 Prohesperocyon wilsoni species 50768 missing Prohesperocyon wilsoni missing species 46917 Chadronian 37.0 33.9 Gustafson 1986 1986 1758 Chordata Mammalia Carnivora Canidae Prohesperocyon missing -104.2 29.9 missing Airstrip (TMM 40504) missing US Texas Presidio based on political unit 1 missing small collection gplates 101 mid -89.99 35.83 US missing missing missing missing missing missing missing missing missing missing missing missing Capote Mountain Tuff missing missing Vieja 8 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
6 177871 occ missing missing 16891 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Wang 1994 1994 6226 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.1 43.2 "Chadronian" from 6 mi NW and 3 mi NW of Alcova Alcova missing US Wyoming Natrona based on political unit 1 missing Bates Hole gplates 101 mid -90.2 49.33 US FED missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
7 177897 occ missing missing 16896 Hesperocyon cf. gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Harshman 1972 1972 1803 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.1 43.2 includes "H. cf. paterculus" Bates Hole Lower White River 1 missing US Wyoming Natrona based on political unit 1 missing gplates 101 mid -90.2 49.33 US FED missing missing missing missing missing missing missing missing missing missing missing missing White River missing lower missing Bates 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
8 177901 occ 3419 missing 16898 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Alroy 2002 2002 6294 Chordata Mammalia Carnivora Canidae Hesperocyon missing -107.1 43.2 Bates Hole (Reed Collection) missing US Wyoming Natrona based on political unit 1 missing county assignment uncertain gplates 101 mid -90.2 49.33 US FED missing missing missing missing missing missing missing missing missing missing missing missing White River missing lower missing missing missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
9 177975 occ missing missing 16917 Hesperocyon sp. genus 41217 missing Hesperocyon missing genus 41217 Chadronian 37.0 33.9 Ostrander 1985 1985 2716 Chordata Mammalia Carnivora Canidae Hesperocyon missing -103.8 42.4 Bone Cove missing US Nebraska Sioux based on political unit 1 missing gplates 101 mid -86.81 48.11 US missing missing missing missing missing missing missing missing missing missing missing missing Chadron missing Big Cottonwood Creek missing PineR 1 missing bottom to top missing missing missing member assignment based on LaGarry pers. commun. 19 May 2005 missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
10 177988 occ missing missing 16918 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Gustafson 1986 1986 1757 Chordata Mammalia Carnivora Canidae Hesperocyon missing -103.1 42.8 specimen from KU-NE-083 "Chadronian" of Walter Brecht Ranch listed by Wang 1994 Brecht Ranch missing TNAS US Nebraska Dawes based on political unit 1 missing small collection gplates 101 mid -85.94 48.41 US missing missing missing missing missing missing missing missing missing missing missing missing Chadron missing missing PineR 1 missing bottom to top missing missing missing missing missing missing terrestrial indet. missing missing missing missing missing body ⋯
11 178029 occ missing missing 16919 Hesperocyon gregarius species 46911 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Storer 1996 1996 3318 Chordata Mammalia Carnivora Canidae Hesperocyon missing -109.883 49.5667 see Russell 1972: includes "Alloeodectes mcgrewi" of Russell 1984 according to Bryant 1991, Wang 1994; also includes cf. "Hyaenodon minutus" according to Bryant 1993 Calf Creek missing SMNH Locality 6; SMNH 72F10-0001; ROM V-37-44; NMC Localities 115 and 117; Bone Coulee; "Cypress Hills" in part; Hunter Quarry CA Saskatchewan based on nearby landmark minutes missing outcrop 16 km NW of Eastend and apparently north of Fort Walsh (basis of coordinate) gplates 101 mid -91.37 55.94 CA missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing missing missing missing missing not reported missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
12 178030 occ missing missing 16919 Canidae indet. family 41189 missing Canidae missing family 41189 Chadronian 37.0 33.9 Storer 1996 1996 3318 Chordata Mammalia Carnivora Canidae missing -109.883 49.5667 Calf Creek missing SMNH Locality 6; SMNH 72F10-0001; ROM V-37-44; NMC Localities 115 and 117; Bone Coulee; "Cypress Hills" in part; Hunter Quarry CA Saskatchewan based on nearby landmark minutes missing outcrop 16 km NW of Eastend and apparently north of Fort Walsh (basis of coordinate) gplates 101 mid -91.37 55.94 CA missing missing missing missing missing missing missing missing missing missing missing missing Cypress Hills missing missing missing missing missing missing not reported missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
13 178086 occ missing missing 16920 Pseudocynodictis cf. paterculus species 51185 missing Hesperocyon gregarius missing species 44845 Chadronian 37.0 33.9 Van Houten 1964 1964 3479 Chordata Mammalia Carnivora Canidae Hesperocyon missing -108.2 43.2 Cameron Spring missing Locality 19; Cameron Springs US Wyoming Fremont based on political unit 1 missing small collection gplates 101 mid -91.42 49.47 US missing missing missing missing missing missing missing missing missing missing missing missing White River missing missing missing missing missing missing not reported missing missing missing terrestrial indet. missing macrofossils,mesofossils missing missing missing missing body ⋯
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
3472 2041254 occ missing missing 196908 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Detry and Arruda 2013 2013 93416 Chordata Mammalia Carnivora Canidae Canis missing -8.64325 40.7527 Monte Moliäo missing PT based on nearby landmark 5 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing 400 BC–AD 0; AD 0–200 not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3473 2041319 occ missing missing 275782 Canis sp. genus 41198 Canis missing genus 41198 Holocene 0.0117 0.0 Davis 2006 2006 93418 Chordata Mammalia Carnivora Canidae Canis missing -8.26563 39.9116 Alcáçova de Santarém missing PT Leiria based on political unit 6 missing Santarém, on the right bank of the Tagus River, is 78 kilometres northeast of Lisbon gplates not computable using this model mid missing missing PT missing missing missing missing missing missing They derive from 18 levels — most dated to the Iron Age, Roman and Moslem periods. not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3474 2041333 occ missing missing 275784 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Pereira 2013 2013 93419 Chordata Mammalia Carnivora Canidae Canis missing -8.2015 37.1794 Castelo de Paderne missing PT Faro based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3475 2045484 occ missing missing 276279 Canis lupus species 44860 Canis lupus missing species 44860 Late Pleistocene 0.129 0.0117 Moreno-García and Pimenta 2002 2002 93654 Chordata Mammalia Carnivora Canidae Canis missing -8.73528 39.7553 Abrigo do Lagar Velho missing PT Leiria based on nearby landmark seconds missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported marine indet. macrofossils body,anthropogenic ⋯
3476 2045485 occ missing missing 276279 Vulpes vulpes species 52691 Vulpes vulpes missing species 44890 Late Pleistocene 0.129 0.0117 Moreno-García and Pimenta 2002 2002 93654 Chordata Mammalia Carnivora Canidae Vulpes missing -8.73528 39.7553 Abrigo do Lagar Velho missing PT Leiria based on nearby landmark seconds missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported marine indet. macrofossils body,anthropogenic ⋯
3477 2046556 occ missing missing 276394 Canis familiearis genus 41198 species not entered Canis missing genus 41198 Holocene 0.0117 0.0 Gomez et al. 2001 2001 93700 Chordata Mammalia Carnivora Canidae Canis missing -8.43822 37.1892 Silves missing PT Faro based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3478 2046644 occ missing missing 276407 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Moreno García and Pimenta 2020 2020 93710 Chordata Mammalia Carnivora Canidae Canis missing -7.6611 37.643 Biblioteca de Mértola missing PT Beja Beja based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3479 2046651 occ missing missing 276407 Vulpes vulpes species 52691 Vulpes vulpes missing species 44890 Holocene 0.0117 0.0 Moreno García and Pimenta 2020 2020 93710 Chordata Mammalia Carnivora Canidae Vulpes missing -7.6611 37.643 Biblioteca de Mértola missing PT Beja Beja based on political unit 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3480 2046774 occ missing missing 276424 Canis familiaris species 104153 Canis familiaris missing species 104153 Holocene 0.0117 0.0 Detry et al. 2021 2021 93709 Chordata Mammalia Carnivora Canidae Canis missing -9.18333 38.6667 Largo do Coreto missing Bandstand square, Carnide PT Setúbal District based on nearby landmark 6 missing gplates not computable using this model mid missing missing PT missing missing missing missing missing missing not reported terrestrial indet. macrofossils body,anthropogenic ⋯
3481 2048227 occ missing missing 276686 Canis cf. mosbachensis species 366711 Canis mosbachensis missing species 366711 Ionian 0.774 0.129 Dehm 1962 1962 93840 Chordata Mammalia Carnivora Canidae Canis missing 11.1153 48.9103 "[right p4, 2 right m1, left mandibular fragment with m1-talonid and m2]" Schernfeld Fissure-Filling missing DE Bayern based on nearby landmark seconds missing outcrop "[Around 400m NE of the edge of Schernfeld, towards Eichstätt]" (Dehm 1962) gplates not computable using this model mid missing missing DE missing missing missing missing missing missing "Older Pleistocene, especially to the Cromerian" (Dehm 1962), this likely corresponds to the Ionian sandstone fluvial-lacustrine indet. macrofossils body ⋯
3482 2048228 occ missing missing 276686 Alopex aff. praeglacialis species 474917 Alopex praeglacialis missing species 474917 Ionian 0.774 0.129 Dehm 1962 1962 93840 Chordata Mammalia Carnivora Canidae Vulpes missing 11.1153 48.9103 "[right mandibular segment with almost fresh m1-2, and alveoli of p1-4 and m3, and corresponding left mandibular segment with p1-3 alveoli]" Schernfeld Fissure-Filling missing DE Bayern based on nearby landmark seconds missing outcrop "[Around 400m NE of the edge of Schernfeld, towards Eichstätt]" (Dehm 1962) gplates not computable using this model mid missing missing DE missing missing missing missing missing missing "Older Pleistocene, especially to the Cromerian" (Dehm 1962), this likely corresponds to the Ionian sandstone fluvial-lacustrine indet. macrofossils body ⋯
3483 2056564 occ missing missing 277762 Phlaocyon taylori species 50350 Phlaocyon taylori missing species 50350 late Early Arikareean 29.5 18.5 Albright et al. 2026 2026 94175 Chordata Mammalia Carnivora Canidae Phlaocyon missing -88.6628 31.657 Jones Branch Local Fauna missing US Mississippi Wayne County based on nearby landmark 6 missing gplates 109 mid -78.61 35.05 US missing missing missing missing missing missing Catahoula In this area, the Catahoula Formation is composed of unweathered gray-/green-colored, fissile clays interspersed with interbedded distributary channel and thick fine-grained to coarse graveliferous sands of an emergent delta with marginal marine, brackish water, and terrestrial influences. claystone terrestrial indet. macrofossils,mesofossils body ⋯
Tiptransform with multiple source columns

The specification [:min_ma, :max_ma] => function => :mid_ma tells transform to pass both source columns together as arguments to the function, and store the result in the new column :mid_ma.

The function receives two vectors and returns one. Broadcasting with .+ and ./ ensures the arithmetic is applied element-wise.

describe(canids_geo[:, [:min_ma, :max_ma, :mid_ma]])
3×7 DataFrame
Row variable mean min median max nmissing eltype
Symbol Float64 Float64 Float64 Float64 Int64 DataType
1 min_ma 7.6434 0.0 3.6 39.7 0 Float64
2 max_ma 10.4491 0.0117 9.4 56.0 0 Float64
3 mid_ma 9.04627 0.00585 6.9075 46.855 0 Float64

3.1 Geographic visualization

GeoMakie.jl extends the standard Makie figure/axis system with GeoAxis, a specialised axis type that understands geographic coordinates and map projections. The API is otherwise identical to a regular Axis: you place it in a Figure, then call plotting functions on it.

TipMap projections

A projection is a mathematical transformation from a spherical Earth surface onto a flat plane. No projection is distortion-free; different projections make different trade-offs between preserving area, shape, distance, and direction.

GeoAxis accepts a dest keyword that specifies the target projection using a PROJ string. "+proj=natearth2" is the Natural Earth II projection — a pseudocylindrical projection that balances area and shape distortion reasonably well for a world map. Other common choices include "+proj=moll" (Mollweide, equal-area) and "+proj=longlat" (plain longitude/latitude, no transformation).

3.1.1 Building the map

fig = Figure(size = (900, 500))

ga = GeoAxis(
    fig[1, 1];
    dest        = "+proj=natearth2",
    title       = "Fossil occurrences of Canidae",
    xlabel      = "Longitude",
    ylabel      = "Latitude",
)

## Draw land outlines as filled polygons
poly!(ga, GeoMakie.land(); color = :whitesmoke, strokecolor = :gray60, strokewidth = 0.4)

## Overlay occurrence points
scatter!(
    ga,
    canids_geo.lng,
    canids_geo.lat;
    color      = (:firebrick, 0.5),
    markersize = 5,
)

fig
TipGeoMakie.land()

GeoMakie.land() returns a GeoJSON FeatureCollection containing the outlines of Earth’s land masses from the Natural Earth dataset, which is bundled with GeoMakie. poly! knows how to render GeoJSON feature collections directly onto a GeoAxis.

The color argument for the occurrence scatter uses a tuple (:firebrick, 0.5), where the second element is the alpha (opacity) channel — values run from 0.0 (transparent) to 1.0 (fully opaque). Partial transparency is useful here because many points overlap; reducing opacity lets overlapping density show through.

3.1.2 Coloring by age

A single color for all points treats all epochs identically. We can encode the midpoint age as a color using a continuous colormap, which reveals whether canid occurrences in different parts of the world tend to be from different time periods.

fig2 = Figure(size = (900, 520))

ga2 = GeoAxis(
    fig2[1, 1];
    dest  = "+proj=natearth2",
    title = "Canidae fossil occurrences, colored by midpoint age (Ma)",
)

poly!(ga2, GeoMakie.land(); color = :whitesmoke, strokecolor = :gray60, strokewidth = 0.4)

sc = scatter!(
    ga2,
    canids_geo.lng,
    canids_geo.lat;
    color      = canids_geo.mid_ma,
    colormap   = :viridis,
    markersize = 6,
)

Colorbar(fig2[1, 2], sc; label = "Age (Ma)", flipaxis = false)

fig2
TipColorbars and the Colorbar function

When scatter! is called with a numeric color vector and a colormap, it returns a plot object that carries the mapping from data values to colors. Passing that object to Colorbar creates a legend axis showing the color scale.

fig2[1, 2] places the colorbar in column 2 of the figure grid, to the right of the map. flipaxis = false puts the colorbar labels on the left side of the bar (adjacent to the map).

3.2 Temporal visualization

The geographic map shows where canids were found; a temporal plot shows when. The temporal distribution of fossil occurrences is a central quantity in paleontology: it reflects both the evolutionary history of the group and the preservational biases of the geological record.

3.2.1 Distribution of midpoint ages

A histogram is the natural first look at the temporal distribution. The x-axis is geological age in millions of years; note that by convention, time on paleontological plots runs right to left (older on the right), so that “forward in time” means moving left.

fig3 = Figure(size = (700, 350))
ax3  = Axis(
    fig3[1, 1];
    xlabel    = "Midpoint age (Ma)",
    ylabel    = "Number of occurrences",
    title     = "Temporal distribution of Canidae fossil occurrences",
    xreversed = true,
)

hist!(ax3, canids_geo.mid_ma; bins = 40, color = (:steelblue, 0.8))

fig3
TipReversed time axis

The xreversed = true keyword on Axis reverses the direction of the x-axis so that zero (the present) is on the right and the past extends leftward. This matches the stratigraphic convention used throughout the paleontological literature, where older strata are drawn deeper (lower or further left) than younger ones.

3.2.2 Age uncertainty

Each record’s age is a range (min_ma to max_ma), not a precise date. We can visualise the spread of this uncertainty by plotting the width of each record’s age bracket:

canids_geo = transform(
    canids_geo,
    [:min_ma, :max_ma] => ((lo, hi) -> hi .- lo) => :age_range_ma
)

fig4 = Figure(size = (700, 350))
ax4  = Axis(
    fig4[1, 1];
    xlabel = "Age range width (Ma)",
    ylabel = "Number of occurrences",
    title  = "Distribution of stratigraphic age uncertainty, Canidae",
)

hist!(ax4, canids_geo.age_range_ma; bins = 40, color = (:darkorange, 0.8))

fig4

Wide age ranges reflect coarse stratigraphic resolution in the original collections. Narrow ranges indicate that the fossil was found in a well-dated stratigraphic unit. The shape of this distribution is itself informative: it reflects how well-dated different parts of the geological record are for this taxon.

3.3 Putting it together: a reusable workflow

The steps above form a complete, repeatable pipeline. Here they are collected into a compact sequence that can be adapted for any taxon:

using PaleobiologyDB, DataFrames, CairoMakie, GeoMakie

## 1. Acquire
raw = pbdb_occurrences(
    base_name = "Canidae",
    show      = "full",
    vocab     = "pbdb",
)

## 2. Validate (coordinates and age bounds required)
clean = dropmissing(raw, [:lng, :lat, :min_ma, :max_ma])

## 3. Derive midpoint age
clean = transform(
    clean,
    [:min_ma, :max_ma] => ((lo, hi) -> (lo .+ hi) ./ 2) => :mid_ma
)

## 4. Map (colored by age)
fig = Figure(size = (900, 500))
ga  = GeoAxis(fig[1, 1]; dest = "+proj=natearth2", title = "Canidae occurrences")
poly!(ga, GeoMakie.land(); color = :whitesmoke, strokecolor = :gray60, strokewidth = 0.4)
sc  = scatter!(ga, clean.lng, clean.lat; color = clean.mid_ma, colormap = :viridis, markersize = 6)
Colorbar(fig[1, 2], sc; label = "Age (Ma)", flipaxis = false)
display(fig)

## 5. Temporal histogram
fig2 = Figure(size = (700, 350))
ax   = Axis(fig2[1, 1]; xlabel = "Midpoint age (Ma)", ylabel = "Occurrences", xreversed = true)
hist!(ax, clean.mid_ma; bins = 40, color = (:steelblue, 0.8))
display(fig2)

4 Exercises


Exercise 1

Repeat the complete workflow above for a different carnivoran family of your choice — for example "Felidae" (cats), "Ursidae" (bears), or "Mustelidae" (weasels and kin).

  1. How many occurrence records does the PBDB hold for your chosen family?
  2. How many were dropped due to missing coordinates?
  3. Does the geographic distribution of occurrences match your expectation given what you know about the group’s present-day range?
  4. What does the temporal histogram tell you about when the fossil record of this family is densest?

Exercise 2
The PBDB supports filtering by geographic bounding box using the lngmin, lngmax, latmin, and latmax parameters.

  1. Retrieve Canidae occurrences restricted to North America (approximately lngmin = -170, lngmax = -50, latmin = 15, latmax = 75).
  2. Retrieve Canidae occurrences restricted to Eurasia (approximately lngmin = -10, lngmax = 180, latmin = 10, latmax = 75).
  3. Produce a single figure with two panels side by side — one histogram for each region — showing the temporal distribution of midpoint ages.
  4. Do the two regions show different temporal patterns? What might explain any differences you observe?

Hint: To place two axes side by side, use Axis(fig[1, 1]; ...) and Axis(fig[1, 2]; ...).


Exercise 3
Fossil occurrences from the PBDB come with a genus field (when show = ["class"] is requested and vocab = "pbdb" is set).

  1. Using the groupby and combine functions from DataFrames, count the number of occurrences per genus in your cleaned Canidae dataset.
  2. Sort the result by occurrence count in descending order.
  3. Which five genera have the most fossil occurrences?
  4. Produce a bar chart (use barplot! in CairoMakie) showing the top ten genera by occurrence count.

Hint: groupby(df, :genus) groups the rows; combine(gdf, nrow => :count) counts rows per group.

Back to top
Basics of specialized workflows
Basics of agent-based modeling: spatial epidemic dynamics with Agents.jl
  • © Jeet Sukumaran

Please share or adapt under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).