Core papers from the lecture

Public LiDAR & DEM data

Source Coverage Resolution Access
INEGI Elevation Portal All of Mexico — including Tabasco (Aguada Fénix), Yucatan, Campeche, Chiapas, Quintana Roo 5m DEM (newer areas: 1.5m) Free Download by map sheet
Chactun ML-Ready Dataset Calakmul region, Mexico — 10,000+ annotated Maya structures 0.5m ALS-derived visualizations Free CC-BY on Figshare
OpenTopography Middle Usumacinta Region, Mexico (943 km²) Full LAS/LAZ point cloud Free Registration required
Copernicus GLO-30 Global coverage 30m Free
PACUNAM LiDAR Initiative 2,144 km² of Maya Biosphere Reserve, Guatemala (Tikal, El Zotz, Holmul) High-resolution point cloud Restricted Institutional agreement with PACUNAM / Tulane

Key insight: The Aguada Fénix discovery used free INEGI 5m DEMs. Inomata's team scanned 85,000 km² and found 478 previously unknown ceremonial complexes using publicly available data. You don't need restricted datasets to do meaningful work.

Open-source ML tools for archaeology

Tool What it does License
ADAF 8 pre-trained models for archaeological feature detection (segmentation + object detection). Jupyter notebook workflow. 84% recall on known sites. Open
RVT (Relief Visualization Toolbox) Converts DEMs to hillshade, sky-view factor, slope, openness. Essential preprocessing for ML on terrain data. Also available as QGIS plugin. Apache 2.0
Arran Benchmark ML benchmark for archaeological detection (round houses, cairns, shieling huts). Scottish data, but methodology transfers. Open
archaeology-machine-learning Curated list of ML resources, papers, and tools for archaeology. Open
open-archaeo Directory of open-source archaeological software tools.

Glyph & artifact datasets

Dataset Contents Access
IDIAP Maya Glyph Dataset ~1,000 annotated Maya hieroglyph images with Thompson catalog numbers Academic Contact IDIAP Research Institute
Digital Dresden Codex Complete facsimile of the Dresden Codex — the most elaborate surviving Maya manuscript Public domain
Kerr Maya Vase Database ~1,500 rollout photographs of painted Maya vessels Restricted Varies by institution

Getting started in archaeological ML

You don't need to become a computer scientist. The most impactful contributors to this field understand the archaeology deeply and can communicate effectively with data scientists. That said, some technical literacy goes a long way.

Practical skills path

  1. Learn QGIS — Free, open-source GIS. Load LiDAR DEMs, create terrain visualizations, draw annotations. This is where archaeology meets spatial data. Start with the INEGI DEMs above.
  2. Python basics — Focus on data handling (pandas, numpy) and visualization (matplotlib). You don't need deep learning expertise to be useful in a team.
  3. Try RVT — Install the Relief Visualization Toolbox. Process a DEM into hillshade, slope, and sky-view-factor. See what the ML models actually see as input.
  4. Annotate real data — Download the Chactun dataset. Open the visualizations in QGIS. Try drawing polygons around structures. You're now creating training data.
  5. Run a pre-trained model — Set up ADAF in a Jupyter notebook. Run inference on a tile. Compare model predictions to your own annotations.
  6. Find a project — The best way to learn is to contribute. Many archaeological ML projects need people who can annotate data, validate predictions, and provide cultural context. That's domain expertise, not coding.

Key research groups

Remote Sensing

PACUNAM Foundation (Canuto, Estrada-Belli, Garrison)
Technical University of Košice (Bundzel, Sinčák)
University of Cambridge (Orengo)

Epigraphy & Decipherment

IDIAP Research Institute (Gatica-Perez, Odobez)
MIT Computational Linguistics (Barzilay)
University of Bonn (Grube, Prager)

Predictive Modeling

University of Arizona (Inomata)
Leiden University (Verhagen)
SUNY Albany (Feinman, Nicholas)