Surface Water Body Detection¶
Surface Water Body Detection is a staged vector workflow that starts from LULC-derived water pixels and turns them into reusable, published waterbody layers for the stack.
This is not a single raster operation. It is a multi-step pipeline that:
- derives waterbody polygons from LULC outputs
- attaches stable micro-watershed identifiers
- enriches the features with census and hydrological context
- publishes the result into the platform's metadata and GeoServer surfaces
Why This Workflow Matters¶
This is one of the clearest examples of a real CoRE Stack core workflow because it combines:
- a scientific source layer from LULC
- staged vector enrichment
- hydrological joins
- publication into platform delivery surfaces
If you want to understand how CoRE Stack turns one analytical signal into a durable platform layer, this workflow is one of the best places to look.
Main Route Surface¶
Primary route files:
Related SWB routes in the current computing surface:
| Route | Purpose |
|---|---|
/api/v1/generate_swb/ |
run the main staged SWB workflow |
/api/v1/generate_ponds/ |
compute pond layers through the local compute surface |
/api/v1/generate_wells/ |
compute well layers through the local compute surface |
/api/v1/merge_swb_ponds/ |
merge SWB and ponds outputs into a combined layer |
The main handler shape is simple: parse the request, collect state, district, block, start_year, end_year, and gee_account_id, then hand work off to the Celery task in generate_swb_layer.
Core Code Surfaces¶
The most important backend files for this workflow are:
- computing/surface_water_bodies/swb.py
- computing/surface_water_bodies/swb1.py
- computing/surface_water_bodies/swb2.py
- computing/surface_water_bodies/swb3.py
- computing/surface_water_bodies/swb4.py
- computing/surface_water_bodies/merge_swb_ponds.py
The orchestration entry point is generate_swb_layer() in swb.py.
Workflow Shape¶
flowchart LR
A[LULC assets for ROI and year range] --> B[SWB1: vectorize water pixels]
B --> C[SWB2: intersect with MWS and assign UID]
C --> D[First publication sync]
C --> E[SWB3: intersect with Water Body Census]
E --> F[SWB4: catchment, stream order, drainage flag]
C --> F
F --> G[Final publication sync]
G --> H[GeoServer layer and DB metadata]
G --> I[Optional follow-on merge with ponds]
The important thing to notice is that the workflow is staged. It does not try to do everything in one function.
Inputs And Execution Context¶
The main task accepts:
statedistrictblockstart_yearend_yeargee_account_id
When state, district, and block are provided, the workflow builds an ROI from the filtered micro-watershed asset for that place.
The time window is converted into:
start_date = {start_year}-07-01end_date = {end_year}-06-30
That means the workflow is reasoning across seasonal LULC periods rather than a simple January-to-December window.
Stage Breakdown¶
SWB1: Vectorize Water Pixels¶
swb1.py is where the workflow first turns LULC into waterbody candidates.
The key logic is:
- load each yearly LULC image for the date range
- mark LULC classes
2to4as water-related presence - combine the yearly masks with an OR operation
- convert the result into polygons with
reduceToVectors
| Core idea from swb1.py | |
|---|---|
This stage also computes useful feature-level metrics such as:
- total detected water area
- seasonal water presence percentages
- optional class percentages when
is_all_classes=True
So SWB1 is not just detection. It is already building the first analytical summary around each polygon.
SWB2: Intersect With Micro-Watersheds¶
swb2.py attaches the waterbody polygons to the stack's main hydrological registry.
This stage:
- loads
swb1 - intersects each waterbody with the ROI micro-watersheds
- aggregates
uidvalues intoMWS_UID - generates a stable feature
UID
That is what turns an isolated polygon into a joinable CoRE Stack object.
This stage matters because later tables, overlays, and merges need stable identifiers.
SWB3: Intersect With Water Body Census¶
swb3.py enriches the detected polygons with Water Body Census attributes when a state context is available.
The workflow:
- loads the state-specific WBC feature collection
- buffers census points by
90meters - spatially joins them to
swb2 - chooses the closest polygon match by comparing spread area
- copies census attributes onto the detected feature
If a state-level census join is not possible, the workflow can continue without this enrichment.
SWB4: Add Catchment, Stream Order, And Drainage Context¶
swb4.py adds the hydrological context that makes the final layer much more useful.
It:
- prefers
swb3if it exists, otherwise falls back toswb2 - computes maximum stream order and catchment-related properties
- adds an
on_drainage_lineflag - exports the enriched
swb4layer back to GEE
This is where the workflow becomes much more than detected water polygons. It becomes a planning-oriented hydrological layer.
Publication And Platform Surfaces¶
The workflow is published more than once, not just at the very end.
In swb.py, sync_asset_to_db_and_geoserver() is called after swb2, and then again after swb4.
That publication logic does three important things:
- save layer information to the database
- make the GEE asset public
- sync the final feature collection to GeoServer
The published layer name is shaped like:
So this workflow is a good example of how CoRE Stack promotes an intermediate computation into a discoverable platform layer.
Ponds, Wells, And The Merge Step¶
The SWB family is bigger than only generate_swb/.
Two adjacent routes call separate local compute surfaces:
generate_ponds/generate_wells/
Then merge_swb_ponds/ combines SWB and ponds outputs into one merged layer.
The merge step in merge_swb_ponds.py is different from the earlier GEE-heavy stages:
- it pulls features into GeoPandas
- falls back to GCS export when
getInfo()is too large - clips everything to the dissolved MWS outer boundary
- preserves standalone SWBs
- preserves standalone ponds
- merges intersecting SWB and pond geometries into one result
So the SWB family is a useful example of a workflow that crosses multiple execution styles:
- Earth Engine vector processing
- Celery orchestration
- GeoPandas merge logic
- GeoServer publication
How To Trace This Workflow In Code¶
If you are debugging or extending this workflow, follow this order:
- start in
computing/urls.pyatgenerate_swb/ - inspect
generate_swb()incomputing/api.py - move into
generate_swb_layer()inswb.py - follow the stages in order:
swb1.py,swb2.py,swb3.py,swb4.py - inspect
sync_asset_to_db_and_geoserver()to understand publication - inspect
merge_swb_ponds.pyif your real question is about the merged water layer rather than raw SWB output
This is one of the clearest "read the route, then the task, then the stages, then the publication layer" workflows in the stack.
What Builders Should Learn From It¶
This workflow is a strong template when you need a new pipeline that:
- begins from one analytical source layer
- becomes a vector feature collection
- gains identifiers in one stage
- gains external enrichment in another stage
- gains hydrological context in another stage
- gets published into delivery surfaces as a reusable layer
That is why it is more useful to study than a one-step clipping workflow when you want to understand real CoRE Stack pipeline architecture.
Good Next Reads¶
- How They Work Programmatically if you want the wider request-to-task pattern around workflows like this.
- Pipeline Integrations if you want the publication and dataset-surface view.
- Drainage Lines if you want to understand one of the hydrological layers reused inside SWB4.
- Build New Pipelines if you want to use this staged design as a template for a new workflow.