Threshold Lab — Seeing Through the Jungle

What you're seeing: A U-Net model has analyzed this LiDAR terrain and produced a per-pixel confidence score for archaeological structures. The threshold slider controls the minimum confidence required to classify a pixel as "structure." Green pixels are correct predictions. Orange pixels are false alarms. Blue pixels are real structures the model missed at this threshold.

The confidence histogram below the slider shows how many pixels fall at each confidence level. Dragging the threshold line through the distribution lets you see exactly what you're trading off. Note: this is an earlier, less trained model — deliberately chosen because a highly confident model snaps between states and obscures the gradient. The histogram makes this explicit. See Model Output for the better-trained version.

Data: Kokalj et al. 2023, CC BY 4.0. Model: U-Net + ResNet34, trained on Chactun tiles.

True positive (model correct) False positive (model wrong) False negative (model missed)

Loading LiDAR tile and model predictions...

TP: 0 px FP: 0 px FN: 0 px

-- Precision Of flagged pixels, how many are real structures?

-- Recall Of all real structure pixels, how many were flagged?

-- IoU Overlap between prediction and ground truth

-- F1 Score Harmonic mean — balances precision and recall

What's happening

Move the slider to see the tradeoff in action.

Try this

Scenario A: Limited budget

You can only ground-truth 15 locations this field season. Drag the threshold up until the false positives nearly disappear. What's the recall cost?

Scenario B: Regional screening

You're mapping a new 500 km² region and don't want to miss anything. Drag the threshold down. How much noise do you accept?

Scenario C: Compare tiles

Switch between Dense Complex and Sparse Scatter at the same threshold. Does the same setting work for both? Why not?

Precision / Recall Threshold Lab

What's happening

Try this

Scenario A: Limited budget

Scenario B: Regional screening

Scenario C: Compare tiles