Skip to content

Surface Water Body Detection

Surface Water Body Detection is a staged vector workflow that starts from LULC-derived water pixels and turns them into reusable, published waterbody layers for the stack.

This is not a single raster operation. It is a multi-step pipeline that:

  • derives waterbody polygons from LULC outputs
  • attaches stable micro-watershed identifiers
  • enriches the features with census and hydrological context
  • publishes the result into the platform's metadata and GeoServer surfaces

Why This Workflow Matters

This is one of the clearest examples of a real CoRE Stack core workflow because it combines:

  • a scientific source layer from LULC
  • staged vector enrichment
  • hydrological joins
  • publication into platform delivery surfaces

If you want to understand how CoRE Stack turns one analytical signal into a durable platform layer, this workflow is one of the best places to look.


Main Route Surface

Primary route files:

Related SWB routes in the current computing surface:

Route Purpose
/api/v1/generate_swb/ run the main staged SWB workflow
/api/v1/generate_ponds/ compute pond layers through the local compute surface
/api/v1/generate_wells/ compute well layers through the local compute surface
/api/v1/merge_swb_ponds/ merge SWB and ponds outputs into a combined layer

The main handler shape is simple: parse the request, collect state, district, block, start_year, end_year, and gee_account_id, then hand work off to the Celery task in generate_swb_layer.

Shape of generate_swb() in computing/api.py
@api_view(["POST"])
def generate_swb(request):
    state = request.data.get("state")
    district = request.data.get("district")
    block = request.data.get("block")
    start_year = request.data.get("start_year")
    end_year = request.data.get("end_year")
    gee_account_id = request.data.get("gee_account_id")
    generate_swb_layer.apply_async(
        kwargs={...},
        queue="nrm",
    )

Core Code Surfaces

The most important backend files for this workflow are:

The orchestration entry point is generate_swb_layer() in swb.py.


Workflow Shape

flowchart LR
    A[LULC assets for ROI and year range] --> B[SWB1: vectorize water pixels]
    B --> C[SWB2: intersect with MWS and assign UID]
    C --> D[First publication sync]
    C --> E[SWB3: intersect with Water Body Census]
    E --> F[SWB4: catchment, stream order, drainage flag]
    C --> F
    F --> G[Final publication sync]
    G --> H[GeoServer layer and DB metadata]
    G --> I[Optional follow-on merge with ponds]

The important thing to notice is that the workflow is staged. It does not try to do everything in one function.


Inputs And Execution Context

The main task accepts:

  • state
  • district
  • block
  • start_year
  • end_year
  • gee_account_id

When state, district, and block are provided, the workflow builds an ROI from the filtered micro-watershed asset for that place.

The time window is converted into:

  • start_date = {start_year}-07-01
  • end_date = {end_year}-06-30

That means the workflow is reasoning across seasonal LULC periods rather than a simple January-to-December window.


Stage Breakdown

SWB1: Vectorize Water Pixels

swb1.py is where the workflow first turns LULC into waterbody candidates.

The key logic is:

  • load each yearly LULC image for the date range
  • mark LULC classes 2 to 4 as water-related presence
  • combine the yearly masks with an OR operation
  • convert the result into polygons with reduceToVectors
Core idea from swb1.py
1
2
3
4
5
lulc_image = ee.Image(...)
lulc_water_pixel_collec.append(lulc_image.gte(2).And(lulc_image.lte(4)))
ored = lulc_water_pixel_collec[0].Or(...)
vector_polygons = multi_band_image.reduceToVectors(...)
water_bodies = vector_polygons.filter(ee.Filter.eq("water", 1))

This stage also computes useful feature-level metrics such as:

  • total detected water area
  • seasonal water presence percentages
  • optional class percentages when is_all_classes=True

So SWB1 is not just detection. It is already building the first analytical summary around each polygon.

SWB2: Intersect With Micro-Watersheds

swb2.py attaches the waterbody polygons to the stack's main hydrological registry.

This stage:

  • loads swb1
  • intersects each waterbody with the ROI micro-watersheds
  • aggregates uid values into MWS_UID
  • generates a stable feature UID

That is what turns an isolated polygon into a joinable CoRE Stack object.

This stage matters because later tables, overlays, and merges need stable identifiers.

SWB3: Intersect With Water Body Census

swb3.py enriches the detected polygons with Water Body Census attributes when a state context is available.

The workflow:

  • loads the state-specific WBC feature collection
  • buffers census points by 90 meters
  • spatially joins them to swb2
  • chooses the closest polygon match by comparing spread area
  • copies census attributes onto the detected feature

If a state-level census join is not possible, the workflow can continue without this enrichment.

SWB4: Add Catchment, Stream Order, And Drainage Context

swb4.py adds the hydrological context that makes the final layer much more useful.

It:

  • prefers swb3 if it exists, otherwise falls back to swb2
  • computes maximum stream order and catchment-related properties
  • adds an on_drainage_line flag
  • exports the enriched swb4 layer back to GEE

This is where the workflow becomes much more than detected water polygons. It becomes a planning-oriented hydrological layer.


Publication And Platform Surfaces

The workflow is published more than once, not just at the very end.

In swb.py, sync_asset_to_db_and_geoserver() is called after swb2, and then again after swb4.

That publication logic does three important things:

  • save layer information to the database
  • make the GEE asset public
  • sync the final feature collection to GeoServer

The published layer name is shaped like:

surface_waterbodies_<asset_suffix>

So this workflow is a good example of how CoRE Stack promotes an intermediate computation into a discoverable platform layer.


Ponds, Wells, And The Merge Step

The SWB family is bigger than only generate_swb/.

Two adjacent routes call separate local compute surfaces:

  • generate_ponds/
  • generate_wells/

Then merge_swb_ponds/ combines SWB and ponds outputs into one merged layer.

The merge step in merge_swb_ponds.py is different from the earlier GEE-heavy stages:

  • it pulls features into GeoPandas
  • falls back to GCS export when getInfo() is too large
  • clips everything to the dissolved MWS outer boundary
  • preserves standalone SWBs
  • preserves standalone ponds
  • merges intersecting SWB and pond geometries into one result

So the SWB family is a useful example of a workflow that crosses multiple execution styles:

  • Earth Engine vector processing
  • Celery orchestration
  • GeoPandas merge logic
  • GeoServer publication

How To Trace This Workflow In Code

If you are debugging or extending this workflow, follow this order:

  1. start in computing/urls.py at generate_swb/
  2. inspect generate_swb() in computing/api.py
  3. move into generate_swb_layer() in swb.py
  4. follow the stages in order: swb1.py, swb2.py, swb3.py, swb4.py
  5. inspect sync_asset_to_db_and_geoserver() to understand publication
  6. inspect merge_swb_ponds.py if your real question is about the merged water layer rather than raw SWB output

This is one of the clearest "read the route, then the task, then the stages, then the publication layer" workflows in the stack.


What Builders Should Learn From It

This workflow is a strong template when you need a new pipeline that:

  • begins from one analytical source layer
  • becomes a vector feature collection
  • gains identifiers in one stage
  • gains external enrichment in another stage
  • gains hydrological context in another stage
  • gets published into delivery surfaces as a reusable layer

That is why it is more useful to study than a one-step clipping workflow when you want to understand real CoRE Stack pipeline architecture.


Good Next Reads