How to Get USDA Soil Data (SSURGO) from a Lat/Lon in Python — Without the T-SQL Pain
Last updated: June 2026. I'm a solo developer (not a soil scientist) — I built the JSON API at the end of this post because I got tired of the workflow in the first half. Everything before the reveal is the genuinely-useful, free way to do it yourself.
You have a latitude and longitude. You want the soil at that point: the dominant soil series, texture, drainage class, hydrologic group, pH, available water capacity (AWC), organic matter. In the US, the authoritative source is SSURGO — the USDA-NRCS Soil Survey Geographic Database — and getting it programmatically is harder than it has any right to be.
This post walks the whole path:
- What SSURGO actually is, and why a single point is a multi-table join
- The official way: POST T-SQL to USDA's Soil Data Access (SDA), with a runnable Python example
- Why that's only step one — the depth-weighted aggregation you still have to write yourself
- The gotchas (query hangs, the SSURGO+STATSGO mix, row/byte limits, the
JSON+COLUMNNAMEquirk) - A one-line JSON alternative
- Honest limits, and when you should just use the free R/Python packages instead
If you only want a working answer right now, jump to “The one-line version”. Otherwise, let’s do it the real way first so the shortcut makes sense.
What SSURGO actually is
SSURGO isn’t a grid of values you can sample. It’s a vector polygon survey with a normalized relational schema bolted onto it. To get from a coordinate to a number, you traverse a hierarchy:
spatial polygon (a "map unit" on the ground)
└─ mapunit ← identified by mukey; e.g. "Diablo clay, 15 to 30 percent slopes"
└─ component ← the soil series that make up that map unit, each with a % (comppct_r)
└─ chorizon ← the soil horizons (layers) within a component, with depths
└─ pH, AWC, organic matter, sand/silt/clay... measured per horizon
So “the pH at this point” is really:
- Point → mukey: a point-in-polygon spatial query.
- mukey → component: a map unit is usually several soils. You pick the dominant component (highest
comppct_r), or aggregate across components. - component → horizon → property: properties live on horizons, each spanning a depth range (e.g. 0–18 cm, 18–46 cm). To get a single “0–30 cm pH” you must depth-weight the horizon values across your window.
That last step is the part every tutorial quietly skips, and it’s where most people get stuck. More on it below.
SSURGO values come as
_l/_r/_h(low / representative / high). Use_r(representative) unless you specifically need the range.
The official way: POST T-SQL to Soil Data Access
USDA exposes SSURGO through Soil Data Access (SDA) — a web service that takes T-SQL in the body of an HTTP POST and returns the result set. The REST endpoint is:
https://SDMDataAccess.sc.egov.usda.gov/Tabular/post.rest
The docs are at sdmdataaccess.nrcs.usda.gov/WebServiceHelp.aspx. It’s authoritative, it’s free, and there’s no API key — credit where due. It’s also an ASP.NET help page with no quickstart and no lat/lon recipe, which is why you’re reading this.
Step 1: coordinate → mukey
SDA ships a spatial helper, SDA_Get_Mukey_from_intersection_with_WktWgs84, that takes a WKT point in EPSG:4326. Note the WKT order is POINT(lon lat) — longitude first.
import requests
SDA_URL = "https://SDMDataAccess.sc.egov.usda.gov/Tabular/post.rest"
def sda_query(sql: str) -> list[dict]:
"""POST a T-SQL query to SDA and return rows as dicts."""
resp = requests.post(
SDA_URL,
json={"query": sql, "format": "JSON+COLUMNNAME"},
timeout=120, # SDA can hang ~a minute; don't set this low
)
resp.raise_for_status()
table = resp.json().get("Table", [])
if not table:
return []
header, *rows = table # JSON+COLUMNNAME puts column names in row 0
return [dict(zip(header, r)) for r in rows]
lat, lon = 37.368402, -121.771
mukey_sql = f"""
SELECT mukey
FROM SDA_Get_Mukey_from_intersection_with_WktWgs84('point({lon} {lat})')
"""
mukey = sda_query(mukey_sql)[0]["mukey"]
print(mukey) # -> 1882921
That JSON+COLUMNNAME format is an SDA quirk: instead of an array of objects, you get a Table array whose first row is the column names and the rest are values. The helper above re-zips it back into dicts.
If you stop here — as the most commonly-cited SDA point query does — you have a mukey and a map-unit name. That’s it. No texture, no drainage, no pH. To get usable numbers you keep going.
Step 2: mukey → dominant component → horizon properties
Now the join nobody warns you about. You need to:
- Find the dominant component for the mukey (
MAX(comppct_r)). - Pull its horizons from
chorizon, each with a top/bottom depth (hzdept_r,hzdepb_r). - Depth-weight each property over your target window (here, 0–30 cm).
The depth-weighting is the real work. For a property like pH over 0–30 cm, you weight each horizon’s value by how much of that horizon falls inside the window:
weighted_value = Σ (horizon_value × overlap_thickness) / Σ (overlap_thickness)
where overlap_thickness = max(0, min(horizon_bottom, 30) − max(horizon_top, 0)).
You can push some of this into SQL, but it gets ugly fast — and SDA’s T-SQL doesn’t make windowed depth math pleasant. A pragmatic split is: get the dominant component and its horizons from SDA, then do the weighting in Python where it’s readable and testable.
comp_sql = f"""
SELECT TOP 1 co.cokey, co.compname, co.comppct_r,
co.drainagecl, co.hydgrp
FROM component AS co
WHERE co.mukey = '{mukey}'
ORDER BY co.comppct_r DESC
"""
comp = sda_query(comp_sql)[0]
hz_sql = f"""
SELECT ch.hzdept_r, ch.hzdepb_r,
ch.ph1to1h2o_r AS ph,
ch.awc_r AS awc,
ch.om_r AS om
FROM chorizon AS ch
WHERE ch.cokey = '{comp["cokey"]}'
ORDER BY ch.hzdept_r
"""
horizons = sda_query(hz_sql)
def depth_weight(horizons, field, top=0, bottom=30):
num = den = 0.0
for h in horizons:
val = h.get(field)
if val in (None, ""):
continue
ht, hb = float(h["hzdept_r"]), float(h["hzdepb_r"])
overlap = max(0.0, min(hb, bottom) - max(ht, top))
if overlap <= 0:
continue
num += float(val) * overlap
den += overlap
return round(num / den, 2) if den else None
result = {
"series": comp["compname"],
"componentPct": comp["comppct_r"],
"drainageClass": comp["drainagecl"],
"hydrologicGroup": comp["hydgrp"],
"ph_0_30": depth_weight(horizons, "ph"),
"awc_0_30": depth_weight(horizons, "awc"),
"om_0_30": depth_weight(horizons, "om"),
}
print(result)
This works. But notice what you just built: a brittle multi-query pipeline with hand-rolled depth math, no caching, and a hard dependency on a service that — as anyone who’s used it knows — likes to hang.
The gotchas nobody documents up front
If you go the SDA route in production, budget time for these:
- Queries hang ~a minute. SDA is genuinely slow under load and occasionally just stalls. USDA’s own guidance is to wrap every call in error handling because “the web service connection may be down.” Don’t set a tight client timeout and assume failures are your bug.
- SSURGO and STATSGO2 live in the same database. Coarse STATSGO map units can match your point too. If you don’t filter, you can silently get the wrong-resolution answer. (The
SDA_Get_Mukey_...helper targets SSURGO, but be aware of the mix when you write broader queries.) - Hard limits per query: 100,000 rows and 32 MB of JSON. Batch jobs over many points must be chunked — this is a real, recurring pain even in the expert tooling. The maintainers of the R
soilDBpackage have open issues for exactly this (chunkingSDA_query, multi-line record breakage, “too many mukeys”). You are not doing it wrong; the interface is just like that. - The
JSON+COLUMNNAMEshape. As shown above, you re-zip a header row, not parse objects. Easy to miss. POINT(lon lat)order and EPSG:4326 are mandatory. Swap them and you’ll get plausible-looking soil from the wrong place.
None of this is a knock on USDA — SDA exposes an enormous, genuinely valuable dataset for free. It’s just built for SQL-literate batch analysts, not for “give me JSON for this coordinate in a request handler.”
The clunky-but-good alternatives
Before the shortcut, the honest menu:
- R:
soilDB+FedData. ThesoilDBSDA tutorial is the canonical reference, andaqpmakes depth aggregation much nicer than hand-rolling it. If you’re an R user doing research or batch analysis, this is genuinely the right tool and it’s free. Its point query, though, returns mukey + map-unit name; you still add the component/horizon joins and weighting for properties. - Python:
pysda. A community SDA client. Works, but it’s a library to install and wire up (plus geopandas), and you’re still managing SDA’s flakiness yourself. - Web Soil Survey (the GUI). Fine for one or two points by hand; it does not scale to hundreds of coordinates in a pipeline — which is the exact complaint that sends people looking for an API in the first place.
If one of those fits your workflow, use it. The rest of this post is for the case where you just want hosted JSON per coordinate, no install, no SQL, no babysitting SDA outages.
The one-line version
That’s the API I built. It does the point-in-polygon lookup, picks the dominant component, depth-weights the 0–30 cm properties, and returns clean JSON — caching results and serving stale-but-labeled data when SDA is down, so a flaky upstream doesn’t take your request handler with it.
curl "https://soilpoint.dev/v1/soil?lat=37.368402&lon=-121.771"
Real response (San Jose, CA — Diablo clay), live as of this writing:
{
"lat": 37.368402,
"lon": -121.771,
"mukey": "1882921",
"mapUnitName": "Diablo clay, 15 to 30 percent slopes, MLRA 15",
"soilSeries": "Diablo",
"componentKind": "Series",
"componentPct": 85,
"texture": "Clay",
"drainageClass": "Well drained",
"hydrologicGroup": "C",
"awc": 0.17,
"ph": 6.83,
"organicMatter": 2.5,
"nccpi": null,
"nccpiClass": null,
"depthWindowCm": [0, 30],
"dataVintage": "2025-08-29",
"stale": false,
"dataSource": "USDA-NRCS SSURGO via Soil Data Access",
"surveyDisclaimer": "Provided as-is; not a substitute for on-site investigation; not for regulatory determinations; US coverage only. Not a USDA-endorsed product."
}
Same fields, one GET, no T-SQL, no key. Here’s the Python equivalent of the whole first half of this article:
import requests
def soil_at(lat, lon):
r = requests.get("https://soilpoint.dev/v1/soil",
params={"lat": lat, "lon": lon}, timeout=30)
r.raise_for_status()
return r.json()
print(soil_at(42.0308, -93.6319))
# Iowa: Spillville loam, drainage "Moderately well drained",
# hydrologic group "B/D", pH 6.5, AWC 0.2, organic matter 4.5%
How it works (the short version)
So you can trust the numbers, here’s exactly what happens behind that GET — it’s the same pipeline from the first half:
- Point → map unit. Point-in-polygon against SSURGO via SDA’s
SDA_Get_Mukey_from_intersection_with_WktWgs84, EPSG:4326. - Dominant component. The map unit’s highest-
comppct_rcomponent is reported assoilSeries(withcomponentPctso you can judge how “dominant” it is — 85% vs 40% is a real difference). This is a dominant-component answer, not a component-weighted average across the whole map unit. - 0–30 cm depth-weighting.
ph,awc, andorganicMatterare weighted across the horizons overlapping the 0–30 cm window (depthWindowCm), using each horizon’s representative (_r) value — the same overlap-thickness weighting shown above. - NCCPI (productivity index) is included when available, best-effort — it is frequently
null(as in the example above), so code defensively. dataVintageis the survey publication date for that area, so you know how current the underlying survey is.staletells you whether you got a fresh SDA result (false) or a cached fallback served because SDA was unreachable (true). Honest staleness beats a hung request.
Honest limits
I’d rather you hit these in this post than in production:
- US only. That’s SSURGO’s coverage, not a SoilPoint choice. Outside the US, look at SoilGrids / ISRIC.
- Provided as-is. Not for regulatory determinations (wetlands, septic permitting, hydric-soil calls). SSURGO is survey-scale; for anything official you need an on-site investigation by a licensed professional.
- Survey-scale, not a point measurement. SSURGO map units generalize over an area. The value at your exact GPS pin can differ from the map-unit representative value. If you need probabilistic point estimates with uncertainty, POLARIS (30 m) is the better dataset for that question.
- NCCPI is best-effort and often null, as noted.
- Dominant-component model. If your use case needs every component in the map unit weighted together, the single-component answer isn’t what you want — use
soilDB/FedDataand aggregate yourself. - Free while I validate demand. No key required right now. If it’s useful to you, that’s the signal I’m looking for — and the honest caveat is that “free” is a current state, not a forever promise.
So which should you use?
- Doing research or batch analysis in R? Use
soilDB+aqp. It’s free, it’s excellent, and it’s the right tool. - Python person who wants a library and doesn’t mind managing SDA yourself?
pysda. - One or two points by hand? Web Soil Survey.
- Building an app that needs hosted JSON per coordinate, no install, and resilience when SDA flakes? That’s the gap I built SoilPoint for. Try it with your own coordinate — there’s a live box on the homepage, or just:
curl "https://soilpoint.dev/v1/soil?lat=YOUR_LAT&lon=YOUR_LON"
If you try it and something’s wrong or missing, tell me — I’m a solo dev and I read everything. The fastest way to make this better is to hear where it breaks for your use case.
SoilPoint wraps USDA-NRCS Soil Data Access. It is not a USDA-endorsed product. Soil data is provided as-is and is not a substitute for on-site investigation.
Want to try it with your own coordinates?