Surface Water Body Detection¶

Surface Water Body Detection is a staged vector workflow that starts from LULC-derived water pixels and turns them into reusable, published waterbody layers for the stack.

This is not a single raster operation. It is a multi-step pipeline that:

derives waterbody polygons from LULC outputs
attaches stable micro-watershed identifiers
enriches the features with census and hydrological context
publishes the result into the platform's metadata and GeoServer surfaces

Why This Workflow Matters¶

This is one of the clearest examples of a real CoRE Stack core workflow because it combines:

a scientific source layer from LULC
staged vector enrichment
hydrological joins
publication into platform delivery surfaces

If you want to understand how CoRE Stack turns one analytical signal into a durable platform layer, this workflow is one of the best places to look.

Main Route Surface¶

Primary route files:

Related SWB routes in the current computing surface:

Route	Purpose
`/api/v1/generate_swb/`	run the main staged SWB workflow
`/api/v1/generate_ponds/`	compute pond layers through the local compute surface
`/api/v1/generate_wells/`	compute well layers through the local compute surface
`/api/v1/merge_swb_ponds/`	merge SWB and ponds outputs into a combined layer

The main handler shape is simple: parse the request, collect state, district, block, start_year, end_year, and gee_account_id, then hand work off to the Celery task in generate_swb_layer.

Shape of generate_swb() in computing/api.py
@api_view(["POST"])
def generate_swb(request):
    state = request.data.get("state")
    district = request.data.get("district")
    block = request.data.get("block")
    start_year = request.data.get("start_year")
    end_year = request.data.get("end_year")
    gee_account_id = request.data.get("gee_account_id")
    generate_swb_layer.apply_async(
        kwargs={...},
        queue="nrm",
    )

Core Code Surfaces¶

The most important backend files for this workflow are:

The orchestration entry point is generate_swb_layer() in swb.py.

Workflow Shape¶

flowchart LR
    A[LULC assets for ROI and year range] --> B[SWB1: vectorize water pixels]
    B --> C[SWB2: intersect with MWS and assign UID]
    C --> D[First publication sync]
    C --> E[SWB3: intersect with Water Body Census]
    E --> F[SWB4: catchment, stream order, drainage flag]
    C --> F
    F --> G[Final publication sync]
    G --> H[GeoServer layer and DB metadata]
    G --> I[Optional follow-on merge with ponds]

The important thing to notice is that the workflow is staged. It does not try to do everything in one function.

Inputs And Execution Context¶

The main task accepts:

state
district
block
start_year
end_year
gee_account_id

When state, district, and block are provided, the workflow builds an ROI from the filtered micro-watershed asset for that place.

The time window is converted into:

start_date = {start_year}-07-01
end_date = {end_year}-06-30

That means the workflow is reasoning across seasonal LULC periods rather than a simple January-to-December window.

Stage Breakdown¶

SWB1: Vectorize Water Pixels¶

swb1.py is where the workflow first turns LULC into waterbody candidates.

The key logic is:

load each yearly LULC image for the date range
mark LULC classes 2 to 4 as water-related presence
combine the yearly masks with an OR operation
convert the result into polygons with reduceToVectors

Core idea from swb1.py
lulc_image = ee.Image(...)
lulc_water_pixel_collec.append(lulc_image.gte(2).And(lulc_image.lte(4)))
ored = lulc_water_pixel_collec[0].Or(...)
vector_polygons = multi_band_image.reduceToVectors(...)
water_bodies = vector_polygons.filter(ee.Filter.eq("water", 1))

This stage also computes useful feature-level metrics such as:

total detected water area
seasonal water presence percentages
optional class percentages when is_all_classes=True

So SWB1 is not just detection. It is already building the first analytical summary around each polygon.

SWB2: Intersect With Micro-Watersheds¶

swb2.py attaches the waterbody polygons to the stack's main hydrological registry.

This stage:

loads swb1
intersects each waterbody with the ROI micro-watersheds
aggregates uid values into MWS_UID
generates a stable feature UID

That is what turns an isolated polygon into a joinable CoRE Stack object.

This stage matters because later tables, overlays, and merges need stable identifiers.

SWB3: Intersect With Water Body Census¶

swb3.py enriches the detected polygons with Water Body Census attributes when a state context is available.

The workflow:

loads the state-specific WBC feature collection
buffers census points by 90 meters
spatially joins them to swb2
chooses the closest polygon match by comparing spread area
copies census attributes onto the detected feature

If a state-level census join is not possible, the workflow can continue without this enrichment.

SWB4: Add Catchment, Stream Order, And Drainage Context¶

swb4.py adds the hydrological context that makes the final layer much more useful.

It:

prefers swb3 if it exists, otherwise falls back to swb2
computes maximum stream order and catchment-related properties
adds an on_drainage_line flag
exports the enriched swb4 layer back to GEE

This is where the workflow becomes much more than detected water polygons. It becomes a planning-oriented hydrological layer.

Publication And Platform Surfaces¶

The workflow is published more than once, not just at the very end.

In swb.py, sync_asset_to_db_and_geoserver() is called after swb2, and then again after swb4.

That publication logic does three important things:

save layer information to the database
make the GEE asset public
sync the final feature collection to GeoServer

The published layer name is shaped like:

surface_waterbodies_<asset_suffix>

So this workflow is a good example of how CoRE Stack promotes an intermediate computation into a discoverable platform layer.

Ponds, Wells, And The Merge Step¶

The SWB family is bigger than only generate_swb/.

Two adjacent routes call separate local compute surfaces:

generate_ponds/
generate_wells/

Then merge_swb_ponds/ combines SWB and ponds outputs into one merged layer.

The merge step in merge_swb_ponds.py is different from the earlier GEE-heavy stages:

it pulls features into GeoPandas
falls back to GCS export when getInfo() is too large
clips everything to the dissolved MWS outer boundary
preserves standalone SWBs
preserves standalone ponds
merges intersecting SWB and pond geometries into one result

So the SWB family is a useful example of a workflow that crosses multiple execution styles:

Earth Engine vector processing
Celery orchestration
GeoPandas merge logic
GeoServer publication

How To Trace This Workflow In Code¶

If you are debugging or extending this workflow, follow this order:

start in computing/urls.py at generate_swb/
inspect generate_swb() in computing/api.py
move into generate_swb_layer() in swb.py
follow the stages in order: swb1.py, swb2.py, swb3.py, swb4.py
inspect sync_asset_to_db_and_geoserver() to understand publication
inspect merge_swb_ponds.py if your real question is about the merged water layer rather than raw SWB output

This is one of the clearest "read the route, then the task, then the stages, then the publication layer" workflows in the stack.

What Builders Should Learn From It¶

This workflow is a strong template when you need a new pipeline that:

begins from one analytical source layer
becomes a vector feature collection
gains identifiers in one stage
gains external enrichment in another stage
gains hydrological context in another stage
gets published into delivery surfaces as a reusable layer

That is why it is more useful to study than a one-step clipping workflow when you want to understand real CoRE Stack pipeline architecture.

Good Next Reads¶

How They Work Programmatically if you want the wider request-to-task pattern around workflows like this.
Pipeline Integrations if you want the publication and dataset-surface view.
Drainage Lines if you want to understand one of the hydrological layers reused inside SWB4.
Build New Pipelines if you want to use this staged design as a template for a new workflow.