Implementing Differential Privacy for Coordinate Generation

Adding scalar Laplace or Gaussian noise directly to raw latitude/longitude values produces invalid geometries, out-of-bounds points, and a privacy guarantee that no longer holds — because degrees are not metres and coordinates are not unbounded.

Part of Privacy-Preserving Generation Frameworks within Synthetic Spatial Data Architecture & Fundamentals: the parent page defines how a differential privacy budget is allocated across a generation run; this page resolves the specific failure of spending that budget at the coordinate level without shredding spatial topology or violating the projection envelope. It is the spatial-validity counterpart to the CRS contract enforcement that every geometry in the pipeline already owes.

Root Cause: Unbounded Noise on a Bounded, Non-Euclidean Domain

The standard mechanisms assume an unbounded, continuous, isotropic domain. Spatial coordinates violate all three assumptions at once: they are bounded by the coordinate reference system (CRS) extent, by administrative borders, and by network topology, and in geographic coordinates (EPSG:4326) a degree of longitude shrinks from ~111 km at the equator to zero at the poles. Injecting raw noise into coordinate generation therefore produces three deterministic failure modes:

Extent violation. Points shift outside the valid projection bounds, so downstream libraries raise OutOfBoundsError or silently wrap a coordinate across the antimeridian, planting a feature on the wrong side of the planet.
Topological collapse. Adjacent features — parcel boundaries, road segments, building footprints — receive independent noise realizations, opening gaps, overlaps, and self-intersections that invalidate spatial indexes and break routing.
Metric distortion. Noise applied in degrees treats latitude and longitude as uniform units. The resulting latitude-dependent scale error corrupts the distance-based sensitivity calculation, which means the formal $(\varepsilon, \delta)$ guarantee you think you proved is not the one you actually delivered.

The fix is a projection-aware, bounded, vector-valued mechanism. The single most consequential decision is where the noise lives: project to a locally Euclidean CRS, perturb in metres, clamp to the envelope, repair topology, then optionally reproject. Everything below builds on that ordering.

The Gaussian mechanism is preferred over Laplace for continuous coordinate domains because it composes tightly under Rényi Differential Privacy (RDP) accounting. For a target $(\varepsilon, \delta)$ and L2 sensitivity $\Delta_2$ , calibrate the per-axis standard deviation as:

\sigma \geq \frac{\Delta_2 \sqrt{2 \ln(1.25/\delta)}}{\varepsilon}

Because X and Y are perturbed together, sensitivity is vector-valued: for a maximum permissible displacement $\Delta$ in each axis, $\Delta_2 = \Delta\sqrt{2}$ . Treating the axes as two independent scalar releases doubles budget consumption and ignores the spatial correlation the differential privacy budget is supposed to ration. The accountant must track cumulative RDP parameters across every generation step so the total guarantee stays inside the compliance threshold; the OpenDP differential privacy library implements the composition tracking this requires.

Prerequisite Check: Confirm You Are Perturbing in Metres

Before calibrating anything, confirm the geometries are in a projected CRS whose units are metres. Perturbing a EPSG:4326 frame is the root failure; this check makes it loud instead of silent:

python
import geopandas as gpd
from pyproj import CRS

def assert_metric_crs(gdf: gpd.GeoDataFrame) -> None:
    if gdf.crs is None:
        raise ValueError("CRS_UNDEFINED: cannot calibrate sensitivity without a projection.")
    crs = CRS.from_user_input(gdf.crs)
    if crs.is_geographic:
        raise ValueError(
            f"CRS_GEOGRAPHIC: noise in degrees breaks the epsilon guarantee. "
            f"Reproject {crs.to_epsg()} to a metric CRS (e.g. a local UTM zone) first."
        )
    unit = crs.axis_info[0].unit_name
    if unit not in ("metre", "meter"):
        raise ValueError(f"CRS_NON_METRIC: axis unit is {unit!r}, expected metres.")

If assert_metric_crs raises, no amount of sigma tuning will recover a valid guarantee — reproject to a local UTM zone or another equal-distance projection and re-run the check.

Fix: A Projection-Aware Bounded Gaussian Pipeline

The pipeline runs as a strict sequence. Each numbered stage is a hard precondition for the next.

Source ingestion & CRS validation — reject implicit projections; transform geographic frames with PROJ coordinate transformation software before anything else.
Sensitivity derivation — set $\Delta$ from the maximum permissible displacement for the use case (for example 5.0 m for parcel centroids, 0.5 m for survey points), never an arbitrary constant.
Bounded Gaussian injection — apply multivariate Gaussian noise scaled to $\Delta_2$ , then clamp to the CRS validity envelope.
Topology reconstruction — rebuild geometries from perturbed vertices and resolve self-intersections with deterministic snapping or make_valid.
Inverse transformation (optional) — reproject to geographic coordinates only after every validity check passes.

python
import numpy as np
from typing import Tuple
from shapely.geometry import Polygon
from shapely.validation import explain_validity, make_valid


def compute_gaussian_sigma(epsilon: float, delta: float, sensitivity_l2: float) -> float:
    """Gaussian-mechanism sigma for (epsilon, delta)-DP.

    sigma >= sensitivity * sqrt(2 * ln(1.25 / delta)) / epsilon
    """
    if not (0 < delta < 1) or epsilon <= 0:
        raise ValueError("epsilon must be > 0 and delta in (0, 1).")
    return sensitivity_l2 * np.sqrt(2.0 * np.log(1.25 / delta)) / epsilon


def perturb_coordinates(
    coords: np.ndarray,            # (N, 2) array of [x, y] in a metric CRS
    epsilon: float,
    delta: float,
    max_displacement_m: float,
    crs_bounds: Tuple[float, float, float, float],
) -> np.ndarray:
    """Apply bounded multivariate Gaussian noise to projected coordinates."""
    sensitivity_l2 = max_displacement_m * np.sqrt(2.0)        # Δ√2 over both axes
    sigma = compute_gaussian_sigma(epsilon, delta, sensitivity_l2)

    rng = np.random.default_rng()                            # seed deterministically in CI
    noise = rng.normal(loc=0.0, scale=sigma, size=coords.shape)
    perturbed = coords + noise

    # Hard clamp to the CRS validity envelope to prevent extent violation.
    minx, miny, maxx, maxy = crs_bounds
    perturbed[:, 0] = np.clip(perturbed[:, 0], minx, maxx)
    perturbed[:, 1] = np.clip(perturbed[:, 1], miny, maxy)
    return perturbed


def repair_polygon(vertices: np.ndarray) -> Polygon:
    """Rebuild a polygon from perturbed vertices and resolve invalidity."""
    poly = Polygon(vertices)
    if not poly.is_valid:
        poly = make_valid(poly)        # deterministic; resolves self-intersections
    return poly

Clamping introduces a small bias toward the envelope edge, but it is the price of guaranteed CRS validity — an unclamped sample that lands outside the projection bounds crashes the pipeline outright. For polygonal features, perturb shared vertices once and propagate the same realization to every feature that touches them; independent noise on a shared boundary is exactly what opens slivers between adjacent parcels. Where a shared-vertex index is unavailable, fall back to a planar topology-repair pass after injection. These spatial-validity gates are the privacy-pipeline analogue of the topology checks that CI/CD integration for spatial data already enforces on every generated batch.

Verification Step: Gate Privacy and Topology Together

Coordinate-level DP cannot rely on manual review. A generation batch must fail the build if either the privacy accounting or the spatial validity slips. Gate both in CI:

python
import numpy as np


def test_displacement_within_sensitivity():
    """No clamped point should exceed a high quantile of the noise scale."""
    coords = np.random.default_rng(0).uniform(0, 1000, size=(5000, 2))
    out = perturb_coordinates(
        coords, epsilon=1.0, delta=1e-6,
        max_displacement_m=5.0, crs_bounds=(0, 0, 1000, 1000),
    )
    sigma = compute_gaussian_sigma(1.0, 1e-6, 5.0 * np.sqrt(2))
    # 99.9% of a Gaussian sits inside ~3.3 sigma per axis.
    shift = np.abs(out - coords)
    assert np.quantile(shift, 0.999) < 3.3 * sigma, "displacement exceeds calibrated scale"


def test_envelope_and_topology(geometries):
    """All geometries valid and inside the CRS envelope, or fail the batch."""
    minx, miny, maxx, maxy = (0, 0, 1000, 1000)
    errors = [explain_validity(g) for g in geometries if not g.is_valid]
    assert not errors, f"TOPOLOGY_FAILURE: {errors[:3]}"
    for g in geometries:
        gx0, gy0, gx1, gy1 = g.bounds
        assert minx <= gx0 and gy0 >= miny and gx1 <= maxx and gy1 <= maxy, "extent violation"

Wire the same suite into the batch gate so the build halts when the privacy budget exhausts $\varepsilon_{max}$ or $\delta_{max}$ , when the topology error rate exceeds 0.1% of generated features, or when a spatial-utility metric — nearest-neighbour distance distribution, or the Wasserstein distance the realism gate tracks — drifts past tolerance. A failing batch never reaches a downstream consumer.

Edge Cases & Gotchas

Antimeridian-spanning features. A geometry straddling ±180° longitude has a bounding box nearly 360° wide in geographic coordinates, so a naïve clamp pins points to the wrong meridian and the inverse reprojection scatters them across the Pacific. Detect bound widths above 180°, split the feature at the date line, and perturb each half inside its own UTM zone before merging.
Null Island and failed geocodes. Records that defaulted to (0, 0) after a failed geocode collapse onto the equator/prime-meridian intersection. Under a global envelope they survive clamping and silently bias the released density toward Null Island. Filter Point(0, 0) before injection and assert a metric CRS so sensitivity is computed in metres, not degrees.
Floating-point precision at datum boundaries. Near a UTM zone seam or a datum edge, perturbed vertices that should be bitwise-identical across two features can diverge in the last ULP, reopening a hairline sliver after make_valid. Round shared-vertex coordinates to the grid resolution (for example millimetres) before reconstruction, and snap with a fixed tolerance so the repair is deterministic and reproducible in CI.

Document the sensitivity derivation, CRS selection, $(\varepsilon, \delta)$ parameters, RDP composition path, clamping thresholds, and validation-gate outputs as an audit trail. Logging the privacy accountant state — never the raw source coordinates — is what lets a compliance engineer reproduce the GDPR/CCPA attestation without re-exposing the original data.

Frequently Asked Questions

Why Gaussian instead of the Laplace mechanism for coordinates?

Laplace gives pure $\varepsilon$ -DP but composes loosely across the many perturbation steps a spatial pipeline runs. The Gaussian mechanism, accounted under RDP, yields tighter utility for the same total budget on a continuous two-dimensional domain — which matters when one geometry can carry thousands of vertices.

Can I keep working in EPSG:4326 if I scale the noise by latitude?

You can approximate it, but you should not rely on it. A per-row latitude rescale still leaves a non-uniform metric and re-introduces distortion at high latitudes and across zone seams. Projecting to a local UTM zone, perturbing in metres, and reprojecting back is both simpler to reason about and what keeps the formal guarantee intact.

How do I stop adjacent polygons from developing slivers after noise?

Perturb each shared vertex exactly once and propagate that single realization to every feature that references it, rather than letting each polygon draw its own noise on the common boundary. Where you cannot index shared vertices, run a planar topology-repair pass and round to the grid resolution before reconstruction.

Does clamping to the envelope break the differential privacy guarantee?

No — clamping is data-independent post-processing on the mechanism output, and post-processing cannot weaken a DP guarantee. It does add a small bias toward the envelope edge, so size the envelope from the true CRS bounds and keep $\sigma$ small relative to the extent so clamping is rare.

What sensitivity should I pick if the source resolution is unknown?

Derive $\Delta$ from the smallest displacement that still preserves topology for the use case, not from a default number. For parcel centroids that is typically a few metres; for survey-grade points it is sub-metre. If you cannot justify a value, you cannot justify the released privacy level — treat that as a blocking gap, not a tuning knob.

Privacy-Preserving Generation Frameworks — the parent page that defines how the $(\varepsilon, \delta)$ budget this mechanism spends is allocated across a run.
CRS Contract Enforcement — the projection contract every geometry must satisfy before noise is ever applied.
Syncing Synthetic Data Generation with GitHub Actions — the CI layer that runs the privacy-and-topology gate above on every batch.
Evaluating Spatial Realism with Wasserstein Distance — the utility metric that detects when the noise has degraded spatial fidelity past tolerance.