How Current Data Was Computed¶

This page explains why CoRE Stack computes the public data products it does, how those products relate to the CoRE Stack data structure, and why the public surface is curated instead of trying to publish every possible derived combination.

Read this page as the bridge between:

CoRE Stack Data Structure
How Our Pipelines Work Algorithmically
the public APIs, STAC assets, and downloadable layers in this section

Why This Page Exists¶

When people first meet CoRE Stack data, a natural question is:

“Why were these layers precomputed, and not some other thousand combinations?”

The answer is that CoRE Stack is building public geospatial infrastructure, not only running analyses for one internal use case.

That means the public products have to be:

scientifically meaningful
structurally consistent with the stack’s registry and joins
useful across many places, not only one project site
understandable enough to publish publicly
stable enough to maintain through APIs, metadata, STAC, styles, and documentation

So the stack tries to precompute the layers that become durable building blocks for many later uses.

Start From The Data Structure¶

The most important fact behind the current public data is this:

CoRE Stack does not treat each dataset as an isolated map. It treats datasets as things that should attach to a stable landscape structure.

That structure is explained in CoRE Stack Data Structure, but the practical version is:

people search through state, district, and block
many computations make most sense on micro-watersheds
some outputs are about waterbodies, villages, assets, or enrichment layers
the public surface becomes useful only when those layers can be compared, joined, and reused together

This is why the public data is not centered on arbitrary polygons or ad hoc study areas. It is centered on standard units and repeated crosswalks between:

hydrological correctness
administrative usability

That is also why many public outputs are keyed to:

uid-like micro-watershed identifiers
tehsil or block-level discovery routes
standard layer names
repeatable geometry products

The Main Principle¶

CoRE Stack usually prefers to precompute:

foundational layers that many other workflows depend on
canonical derived outputs that answer recurring planning questions
joinable indicators that can be recombined in many ways later

It usually avoids precomputing:

every possible weighted index
every possible policy ranking
every one-off overlay combination
every presentation-specific summary

That is a deliberate choice.

If the platform centrally published every possible downstream interpretation, users would face:

too many overlapping layers
unclear provenance
duplicated logic
weak trust in what is “official”
a maintenance burden that would outgrow the actual science

So CoRE Stack computes the layers that are most reusable, and expects many final interpretations to happen one layer above them.

Why These Particular Families Were Prioritized¶

The current public data families are not random. They reflect recurring planning questions.

1. Land use and land cover¶

These are foundational because many later questions depend on what is physically on the land:

cropland
vegetation
built-up area
water-related classes
changes through time

Without LULC, many hydrology, drought, cropping, and restoration interpretations become weaker or much harder to compare.

2. Hydrology and water balance¶

These were prioritized because CoRE Stack is built for water-grounded planning.

The stack therefore needs outputs that help people reason about:

rainfall and runoff behavior
recharge-related logic
watershed response
upstream-downstream relationships

These are not only scientific outputs. They are planning primitives.

3. Surface water and waterbody workflows¶

These matter because waterbodies are directly visible, socially important, and often central to local planning, rejuvenation, and monitoring.

They also act as feature-level hydrological objects that can be related back to watersheds, zones of influence, and administrative planning.

4. Terrain, drainage, and raster derivatives¶

Terrain, slope, depressions, catchments, stream order, and drainage support layers were prioritized because they provide the physical logic beneath many water and land-use decisions.

These layers help answer:

where water accumulates
how water moves
which places are steep, flat, connected, or constrained
which restoration or conservation moves are physically plausible

5. Enrichment and implementation context¶

The stack also computes enrichment layers such as facilities, NREGA assets, aquifers, SOGE, and other overlays because planning never happens only in hydrological space.

Implementation happens through institutions, infrastructure, livelihoods, and administrative realities.

So CoRE Stack tries to keep both worlds connected:

the hydrological landscape
the implementation landscape

Why Micro-Watersheds Matter So Much¶

The micro-watershed is the most important practical planning unit in the current stack.

That is not because villages or administrative boundaries are unimportant. It is because:

water does not move according to administrative boundaries
micro-watersheds are much closer to the real flow logic of landscapes
they are standardized enough to act as a registry
they make outputs comparable across large geographies

Once a signal can be attached to standardized micro-watersheds, many things become possible:

comparison across years
comparison across places
joining hydrology to LULC
joining terrain to restoration logic
joining drought signals to public planning units

This is one of the strongest reasons the current public data looks the way it does. CoRE Stack is not only publishing maps. It is publishing data that can live inside a shared analytical structure.

Why Administrative Units Still Matter¶

If micro-watersheds are so important, why does the public API still care so much about state, district, and block?

Because real users usually begin there.

People ask for data through names they know:

state
district
block or tehsil
village

So CoRE Stack’s public surface is designed to let people discover and retrieve data through administrative entry points, while many core computations still attach to hydrologically meaningful units beneath that interface.

This is one of the most important design choices in the platform:

entry through administrative usability
structure through hydrological logic

The Actual Computation Shape¶

Most current public data follows a repeated shape:

start from source rasters, imagery, networks, or pan-India thematic layers
run a scientific or analytical transformation
clip, summarize, vectorize, or aggregate to stable units
store outputs with stable names and geometry conventions
publish them through layer metadata, APIs, GeoServer, and STAC

That means CoRE Stack usually does not stop at “we produced a raster.”

To become a public product, a computed layer often also needs:

a predictable layer name
a place in the dataset registry
public metadata
styling or download affordances
documentation that explains what it is

This is why public data publishing is much narrower than internal computation possibility.

What Was Computed, And Why¶

The best way to understand the current public data is by family.

Family	What CoRE Stack tends to compute	Why this family is publicly valuable	How it fits the data structure
LULC and cropping-related workflows	land-use classes, cropping intensity, change-ready outputs	many downstream planning questions depend on land state and change	becomes joinable with watersheds, blocks, and later indicators
Hydrology	runoff and water-balance-style watershed indicators	supports water planning and watershed reasoning directly	naturally aligns with micro-watersheds and their connectivity
Surface water bodies	waterbody extents, supporting analytics, related summaries	visible, practical, and central to monitoring and rejuvenation	links waterbody features back to hydrological units and public layers
Terrain and drainage	slope, depressions, catchments, stream order, drainage derivatives	gives the physical basis for interpreting water movement and suitability	often summarized to watershed or block-scale structured outputs
Enrichment layers	facilities, NREGA assets, aquifer, SOGE, agroecological overlays	connects scientific layers to implementation and planning context	adds social, infrastructural, or thematic meaning to standard units
Time series	NDVI and temporal vegetation summaries	helps compare seasonality, change, and resilience	produces comparison-ready signals across standard units

Why This Matters For Users Of Precomputed Data¶

If you are using CoRE Stack data through APIs, STAC, GeoServer, or downloads, this page should change how you read the catalog.

You should not think:

“Where is the exact finished answer to my exact question?”

You should think:

“Which canonical outputs does the stack expose, and how do I combine them responsibly?”

That is the intended use pattern.

The public data was designed so that you can:

start with administrative discovery
move into stable hydrological or implementation layers
join outputs by standard units
build your own planning logic on top

A Good Mental Model¶

CoRE Stack’s current public data is best understood as:

not a giant warehouse of every derived map
not a single-purpose application dataset
not only a raw raster archive

It is:

a curated set of reusable public layers
organized around a stable landscape structure
computed through repeatable pipeline families
exposed in ways that support both humans and software

If The Data You Need Is Missing¶

That does not necessarily mean the stack is incomplete in a bad way. Often it means the exact combination you want sits one layer above the canonical outputs.

Your practical next moves are:

use Public API References and STAC Specs to combine existing outputs yourself
move into CoRE Stack Data Structure if you want to understand how joins should be done
read How Our Pipelines Work Algorithmically if you want the modeling logic behind a family
go to Pipeline Development if the missing output should be computed directly
contribute or open an issue if the missing product deserves to become a canonical public layer