Georeferenced Map Comparison -; Project Index

Overview

Scanned historical maps are gorgeous and almost completely useless to a computer: they're just pixels, with no idea where on Earth they sit or what they're drawing. This project is a two-stage pipeline that fixes both problems. The first half georeferences a scan -; figures out its real-world coordinates and warps it into alignment with the modern globe. The second half reads the warped map and turns what it depicts into honest vector data: ask for "named peaks" or "depth contours" and get GeoJSON back.

The public page linked above is the friendly end of it -; an interactive before/after comparison where you flip between an old map and the modern basemap it was snapped onto.

Background

Georeferencing by hand is the tedious part of any historical-mapping project: you sit there clicking matching points between a scan and a basemap until the warp looks right. The question that kicked this off was whether a vision model could place those control points itself -; read the lat/lon labels printed in a map's margins, or recognize a drawn coastline well enough to line it up with the real one.

The catch is that "an old map" isn't one kind of thing. A clean USGS topographic sheet with a printed graticule is a completely different problem from a decorative 1850s atlas plate, which is different again from a coastline-dominated nautical chart. So the pipeline doesn't try to be one clever algorithm -; it triages the map first, then dispatches it to whichever strategy actually fits.

How It Works

Stage one -; georeference. A map first goes through triage: a MapProfile is built from a user hint plus an optional vision-model classification, sorting the scan into a content type. A dispatcher then routes it to a matching strategy:

modern_topo → graticule corners → polynomial-1 warp
coast_dominant → coastline ICP → thin-plate-spline warp
decorative_historical → graticule + marginalia crop → poly-1 warp

For clean topos, the graticule module detects the map frame, crops the four corners, and sends them to Claude vision to OCR the printed lat/lon labels -; four high-confidence ground control points, done. For coastal charts, the coastline_icp module segments the drawn land area and aligns it against Natural Earth's modern coastline using Iterative Closest Point, generating dense control points for a thin-plate-spline warp. Whatever the source of the points, they're written to a GDAL .vrt and warped out to a georeferenced GeoTIFF (affine, polynomial, or TPS depending on the strategy).

A diagnostics pass keeps it honest. Before warping, the control-point set is scored on five metrics -; pixel- and coordinate-space colinearity, convex-hull coverage, bounding-box ratio, and a duplicate-coordinate check -; and tagged ok, warn, or severe. That duplicate check exists for a real reason: raw ICP once produced 613 "correspondences" that were 94% duplicates, which would have made the thin-plate-spline solve singular.

Stage two -; detect & catalog. The second engine takes the georeferenced GeoTIFF plus a plain-English query and returns vector features. It tiles the raster, has Claude build a detection plan (geometry type, a visual signature, and a visual anti-signature), culls to the promising tiles from a contact sheet to save cost, then runs vision detection per tile in parallel through the Anthropic Batch API with tile-level caching. Detections are converted from tile pixels to image pixels to world coordinates via the raster's affine transform, de-duplicated (proximity for points, IoU suppression for polygons, bearing-aware merging for lines), and exported as GeoJSON, always reprojected to EPSG:4326. There's a Click CLI (mad extract / prepare / detect / export) and an MCP server for driving it from a GIS.

Current Status

Working end-to-end on real maps. The georeferencing side has been validated on an 1885 Michigan map, an 1853 German atlas plate of Patagonia, a Maine USGS topo, and several NOAA charts. The detection side grew to 411 passing tests and was checked against live GeoTIFFs through the QGIS MCP -; depth contours pulled off a NOAA Florida nautical chart, islands off a Maine topo.

Three of five routing paths are implemented; settlement_rich and hybrid are still stubs.
OCR moved off Tesseract onto vision-model corner reading, which is more robust on decorative type.
Detection settled on Sonnet over Haiku after A/B testing for precision; Opus worked but cost ~60% more for no clear win.