How Our Pipelines Work Algorithmically¶

This page is about the scientific and analytical logic behind the pipeline families.

It answers:

what question does each workflow family answer?
why did CoRE Stack choose to compute these things this way?
what are the key inputs and modeling moves?
what kind of outputs come out?
where can future contributors improve the science or extend the stack?

The Core Pattern¶

Most CoRE Stack pipelines follow the same conceptual shape:

flowchart LR
    A[Source rasters, time series, boundaries, or networks] --> B[Scientific or analytical transformation]
    B --> C[Vectorization or aggregation onto standard units]
    C --> D[Joinable outputs keyed to watersheds, villages, or other entities]
    D --> E[APIs, STAC, dashboards, and reports]

This matters because CoRE Stack is not trying to answer one question once. It is trying to build reusable public infrastructure.

So the stack favors:

standard units
repeatable computations
outputs that are interpretable outside the original notebook or script
data that can be recombined later

Why CoRE Stack Computes In This Shape¶

1. Water planning needs hydrological logic¶

Administrative names are useful for people, but water moves through catchments, drainage lines, and watershed networks.

That is why CoRE Stack starts from hydrological units and then crosswalks back to administrative ones.

2. Rasters are powerful but hard to use directly¶

Many foundational inputs are raster-like:

elevation
rainfall
satellite observations
land-cover predictions
derived terrain surfaces

But most downstream users want:

tables
vector layers
ranked units
reports

So CoRE Stack repeatedly turns raster evidence into vectorized, indexed, joinable outputs.

3. Public infrastructure should expose reusable building blocks¶

The stack does not try to centrally publish every possible policy question.

Instead it publishes canonical layers and indicators that others can recombine for:

river rejuvenation
restoration planning
drought analysis
irrigation and cropping questions
community-facing tools

Scientific Ideas Already Embedded In The Stack¶

Idea	How it shows up in CoRE Stack	Why it matters
watershed hydrology	micro-watershed registry, catchments, drainage, runoff-oriented reasoning	water moves through connected landscapes
land-surface classification	LULC workflows and change layers	many planning questions begin with what is on the land now and how it changed
water balance reasoning	rainfall, evapotranspiration, runoff, and related hydrology layers	gives a first planning lens on water availability and stress
zonal statistics	raster signals summarized onto MWS, villages, or other units	makes outputs comparable and joinable
network analysis	upstream and downstream watershed connectivity	important for river and catchment-scale interventions
spatial enrichment	admin overlays, aquifer joins, facilities proximity, asset layers	connects hydrological analysis to implementation reality

The Four Repeated Questions¶

Most CoRE Stack computations are trying to answer one of four broad questions:

What is on the land or water surface now or over time?
How does water move, accumulate, or become available across a landscape?
Which administrative or ecological boundaries matter for planning?
Which places deserve attention, restoration, protection, or follow-up action?

Read By Workflow Family¶

Core WorkflowsBoundary and EnrichmentRaster and DrainageTime Series and Ops

LULC generation

Inputs: satellite imagery, time windows, region of interest
Logic: classify land into stable categories over a place and period, then use those classes to support later derived indicators such as cropping intensity and change
Outputs: land-use rasters, class summaries, downstream planning layers

Surface water body detection

Inputs: imagery and water-sensitive signals over time
Logic: identify waterbody extent and track changes or supporting measures, then attach those measurements back to stable hydrological units
Outputs: vector waterbodies, area summaries, waterbody analytics

Hydrology

Inputs: rainfall, terrain, watershed boundaries, derived coefficients
Logic: estimate runoff, recharge, evapotranspiration, and related watershed behavior so that water availability can be reasoned about at watershed scale
Outputs: MWS-level indicators, seasonal summaries, hydrological layers

Terrain analysis

Inputs: DEM-derived surfaces
Logic: derive slope, depressions, terrain classes, drainage support layers
Outputs: terrain rasters and planning-oriented derived layers

Administrative and clipping workflows

Inputs: state, district, block, watershed, or pan-India source layers
Logic: clip, filter, and normalize boundaries or thematic datasets to the requested geography while preserving stable keys
Outputs: ready-to-publish vector layers and metadata

Enrichment workflows

Inputs: base geometries plus tabular or external thematic sources
Logic: attach nearest-distance, class, dominant condition, or joined statistics
Outputs: enriched vector layers and planning-ready attributes

Drainage and stream derivatives

Inputs: DEM, hydrological derivatives, watershed context
Logic: derive drainage lines, stream order, catchment structure, connectivity support, and proximity surfaces
Outputs: raster or vector drainage-support layers

Raster planning derivatives

Inputs: terrain, hydrology, and related modeled inputs
Logic: produce interpretable surfaces for restoration or prioritization
Outputs: clipped rasters and summary-ready layers

Temporal vegetation workflows

Inputs: NDVI-like series, interpolation support, seasonal time windows
Logic: summarize vegetation dynamics over time so that seasonality, change, and resilience can be compared between units
Outputs: time-series tables, class summaries, comparison-ready outputs

Operational helpers

Inputs: credentials, service wiring, reusable pipeline inputs
Logic: make the analytical workflows runnable and publishable
Outputs: successful execution, not new science by itself

Output Thinking¶

One useful way to think about the pipeline families is by output type:

Output type	Typical families	Why it matters
raster	LULC, terrain, drought, slope, restoration	good for continuous or classified surfaces
vector	waterbodies, boundaries, facilities, aquifer joins	good for inspection, download, and attribute tables
tabular	tehsil summaries, MWS indicators, time series	good for ranking, reporting, and joins
mixed	hydrology, restoration, proximity, SWB ecosystems	most useful public products combine multiple output types

Why Public Data Looks The Way It Does¶

The public API is not a random collection of endpoints.

It reflects the fact that CoRE Stack’s pipeline families naturally create:

stable geometries
joinable tables keyed by watershed identifiers
downloadable layers
metadata that points onward to GeoServer, reports, STAC items, or Earth Engine assets

That is why pages like Public API References, STAC Specs, and How Current Data Was Computed all depend on the pipeline section.

What Can Become Better¶

This part of the docs should not sound finished, because the stack is intentionally open to improvement.

Important directions for future work include:

better DEMs and updated watershed delineations
richer groundwater-aware modeling
clearer uncertainty reporting and validation notes
more open truthing datasets for classification and waterbody workflows
better administrative and hydrological crosswalks for village and panchayat-level planning
new canonical outputs where the public value is high enough to justify long-term maintenance

If you think the theory, assumptions, or modeling choices can be improved, that is a feature request for the stack, not a problem to hide.

Move From Concepts To Code¶

When you are comfortable with the analytical logic, the next useful question is usually:

“Where is this implemented, and how is it exposed?”

That handoff lives in the programmatic page.

If you are ready to trace the code path, continue into How They Work Programmatically. If you are ready to extend the stack, continue into Build New Pipelines.