SoilPoint · blog soilpoint.dev
← SoilPoint

How to Get USDA Soil Data (SSURGO) from a Lat/Lon in Python — Without the T-SQL Pain

By Adrian · June 2026 · ~8 min read

Last updated: June 2026. I'm a solo developer (not a soil scientist) — I built the JSON API at the end of this post because I got tired of the workflow in the first half. Everything before the reveal is the genuinely-useful, free way to do it yourself.

You have a latitude and longitude. You want the soil at that point: the dominant soil series, texture, drainage class, hydrologic group, pH, available water capacity (AWC), organic matter. In the US, the authoritative source is SSURGO — the USDA-NRCS Soil Survey Geographic Database — and getting it programmatically is harder than it has any right to be.

This post walks the whole path:

  1. What SSURGO actually is, and why a single point is a multi-table join
  2. The official way: POST T-SQL to USDA's Soil Data Access (SDA), with a runnable Python example
  3. Why that's only step one — the depth-weighted aggregation you still have to write yourself
  4. The gotchas (query hangs, the SSURGO+STATSGO mix, row/byte limits, the JSON+COLUMNNAME quirk)
  5. A one-line JSON alternative
  6. Honest limits, and when you should just use the free R/Python packages instead

If you only want a working answer right now, jump to “The one-line version”. Otherwise, let’s do it the real way first so the shortcut makes sense.


What SSURGO actually is

SSURGO isn’t a grid of values you can sample. It’s a vector polygon survey with a normalized relational schema bolted onto it. To get from a coordinate to a number, you traverse a hierarchy:

spatial polygon (a "map unit" on the ground)
   └─ mapunit        ← identified by mukey; e.g. "Diablo clay, 15 to 30 percent slopes"
        └─ component ← the soil series that make up that map unit, each with a % (comppct_r)
             └─ chorizon ← the soil horizons (layers) within a component, with depths
                  └─ pH, AWC, organic matter, sand/silt/clay... measured per horizon

So “the pH at this point” is really:

That last step is the part every tutorial quietly skips, and it’s where most people get stuck. More on it below.

SSURGO values come as _l / _r / _h (low / representative / high). Use _r (representative) unless you specifically need the range.


The official way: POST T-SQL to Soil Data Access

USDA exposes SSURGO through Soil Data Access (SDA) — a web service that takes T-SQL in the body of an HTTP POST and returns the result set. The REST endpoint is:

https://SDMDataAccess.sc.egov.usda.gov/Tabular/post.rest

The docs are at sdmdataaccess.nrcs.usda.gov/WebServiceHelp.aspx. It’s authoritative, it’s free, and there’s no API key — credit where due. It’s also an ASP.NET help page with no quickstart and no lat/lon recipe, which is why you’re reading this.

Step 1: coordinate → mukey

SDA ships a spatial helper, SDA_Get_Mukey_from_intersection_with_WktWgs84, that takes a WKT point in EPSG:4326. Note the WKT order is POINT(lon lat) — longitude first.

import requests

SDA_URL = "https://SDMDataAccess.sc.egov.usda.gov/Tabular/post.rest"

def sda_query(sql: str) -> list[dict]:
    """POST a T-SQL query to SDA and return rows as dicts."""
    resp = requests.post(
        SDA_URL,
        json={"query": sql, "format": "JSON+COLUMNNAME"},
        timeout=120,  # SDA can hang ~a minute; don't set this low
    )
    resp.raise_for_status()
    table = resp.json().get("Table", [])
    if not table:
        return []
    header, *rows = table  # JSON+COLUMNNAME puts column names in row 0
    return [dict(zip(header, r)) for r in rows]

lat, lon = 37.368402, -121.771
mukey_sql = f"""
SELECT mukey
FROM SDA_Get_Mukey_from_intersection_with_WktWgs84('point({lon} {lat})')
"""
mukey = sda_query(mukey_sql)[0]["mukey"]
print(mukey)  # -> 1882921

That JSON+COLUMNNAME format is an SDA quirk: instead of an array of objects, you get a Table array whose first row is the column names and the rest are values. The helper above re-zips it back into dicts.

If you stop here — as the most commonly-cited SDA point query does — you have a mukey and a map-unit name. That’s it. No texture, no drainage, no pH. To get usable numbers you keep going.

Step 2: mukey → dominant component → horizon properties

Now the join nobody warns you about. You need to:

  1. Find the dominant component for the mukey (MAX(comppct_r)).
  2. Pull its horizons from chorizon, each with a top/bottom depth (hzdept_r, hzdepb_r).
  3. Depth-weight each property over your target window (here, 0–30 cm).

The depth-weighting is the real work. For a property like pH over 0–30 cm, you weight each horizon’s value by how much of that horizon falls inside the window:

weighted_value = Σ (horizon_value × overlap_thickness) / Σ (overlap_thickness)

where overlap_thickness = max(0, min(horizon_bottom, 30) − max(horizon_top, 0)).

You can push some of this into SQL, but it gets ugly fast — and SDA’s T-SQL doesn’t make windowed depth math pleasant. A pragmatic split is: get the dominant component and its horizons from SDA, then do the weighting in Python where it’s readable and testable.

comp_sql = f"""
SELECT TOP 1 co.cokey, co.compname, co.comppct_r,
       co.drainagecl, co.hydgrp
FROM component AS co
WHERE co.mukey = '{mukey}'
ORDER BY co.comppct_r DESC
"""
comp = sda_query(comp_sql)[0]

hz_sql = f"""
SELECT ch.hzdept_r, ch.hzdepb_r,
       ch.ph1to1h2o_r AS ph,
       ch.awc_r       AS awc,
       ch.om_r        AS om
FROM chorizon AS ch
WHERE ch.cokey = '{comp["cokey"]}'
ORDER BY ch.hzdept_r
"""
horizons = sda_query(hz_sql)

def depth_weight(horizons, field, top=0, bottom=30):
    num = den = 0.0
    for h in horizons:
        val = h.get(field)
        if val in (None, ""):
            continue
        ht, hb = float(h["hzdept_r"]), float(h["hzdepb_r"])
        overlap = max(0.0, min(hb, bottom) - max(ht, top))
        if overlap <= 0:
            continue
        num += float(val) * overlap
        den += overlap
    return round(num / den, 2) if den else None

result = {
    "series":        comp["compname"],
    "componentPct":  comp["comppct_r"],
    "drainageClass": comp["drainagecl"],
    "hydrologicGroup": comp["hydgrp"],
    "ph_0_30":  depth_weight(horizons, "ph"),
    "awc_0_30": depth_weight(horizons, "awc"),
    "om_0_30":  depth_weight(horizons, "om"),
}
print(result)

This works. But notice what you just built: a brittle multi-query pipeline with hand-rolled depth math, no caching, and a hard dependency on a service that — as anyone who’s used it knows — likes to hang.


The gotchas nobody documents up front

If you go the SDA route in production, budget time for these:

None of this is a knock on USDA — SDA exposes an enormous, genuinely valuable dataset for free. It’s just built for SQL-literate batch analysts, not for “give me JSON for this coordinate in a request handler.”


The clunky-but-good alternatives

Before the shortcut, the honest menu:

If one of those fits your workflow, use it. The rest of this post is for the case where you just want hosted JSON per coordinate, no install, no SQL, no babysitting SDA outages.


The one-line version

That’s the API I built. It does the point-in-polygon lookup, picks the dominant component, depth-weights the 0–30 cm properties, and returns clean JSON — caching results and serving stale-but-labeled data when SDA is down, so a flaky upstream doesn’t take your request handler with it.

curl "https://soilpoint.dev/v1/soil?lat=37.368402&lon=-121.771"

Real response (San Jose, CA — Diablo clay), live as of this writing:

{
  "lat": 37.368402,
  "lon": -121.771,
  "mukey": "1882921",
  "mapUnitName": "Diablo clay, 15 to 30 percent slopes, MLRA 15",
  "soilSeries": "Diablo",
  "componentKind": "Series",
  "componentPct": 85,
  "texture": "Clay",
  "drainageClass": "Well drained",
  "hydrologicGroup": "C",
  "awc": 0.17,
  "ph": 6.83,
  "organicMatter": 2.5,
  "nccpi": null,
  "nccpiClass": null,
  "depthWindowCm": [0, 30],
  "dataVintage": "2025-08-29",
  "stale": false,
  "dataSource": "USDA-NRCS SSURGO via Soil Data Access",
  "surveyDisclaimer": "Provided as-is; not a substitute for on-site investigation; not for regulatory determinations; US coverage only. Not a USDA-endorsed product."
}

Same fields, one GET, no T-SQL, no key. Here’s the Python equivalent of the whole first half of this article:

import requests

def soil_at(lat, lon):
    r = requests.get("https://soilpoint.dev/v1/soil",
                     params={"lat": lat, "lon": lon}, timeout=30)
    r.raise_for_status()
    return r.json()

print(soil_at(42.0308, -93.6319))
# Iowa: Spillville loam, drainage "Moderately well drained",
# hydrologic group "B/D", pH 6.5, AWC 0.2, organic matter 4.5%

How it works (the short version)

So you can trust the numbers, here’s exactly what happens behind that GET — it’s the same pipeline from the first half:


Honest limits

I’d rather you hit these in this post than in production:


So which should you use?

curl "https://soilpoint.dev/v1/soil?lat=YOUR_LAT&lon=YOUR_LON"

If you try it and something’s wrong or missing, tell me — I’m a solo dev and I read everything. The fastest way to make this better is to hear where it breaks for your use case.


SoilPoint wraps USDA-NRCS Soil Data Access. It is not a USDA-endorsed product. Soil data is provided as-is and is not a substitute for on-site investigation.

Want to try it with your own coordinates?