Pipeline Integration Pattern¶
This page explains how existing CoRE Stack pipelines become fully integrated into the stack.
Read it as a pattern library, not as a single algorithm description.
Primary code surfaces:
What "Fully Integrated" Means Here¶
An integrated pipeline usually does most of the following:
- accepts a stable ROI such as
state,district,block, or project ROI - exposes an entry point through API, Django shell, or Celery
- uses shared naming and asset-path helpers
- runs the actual processing step
- saves resulting layer metadata to the database
- publishes vector or raster outputs to GeoServer when needed
- optionally creates STAC or related catalog metadata
- gets documented in this docs site
flowchart LR
A[Route or shell trigger] --> B[Task or direct function]
B --> C[Processing logic]
C --> D[Asset creation or local output]
D --> E[Metadata in DB]
D --> F[GeoServer]
D --> G[STAC or related specs]
Reusable Building Blocks¶
| Concern | Typical helpers | Where to look |
|---|---|---|
| GEE setup | ee_initialize() |
utilities/gee_utils.py |
| naming and path normalization | valid_gee_text(), get_gee_asset_path(), get_gee_dir_path() |
utilities/gee_utils.py |
| vector export to GEE | export_vector_asset_to_gee(), upload_shp_to_gee() |
utilities/gee_utils.py |
| raster export and publication | sync_raster_to_gcs(), sync_raster_gcs_to_geoserver() |
utilities/gee_utils.py |
| vector publication | sync_fc_to_geoserver(), sync_layer_to_geoserver(), push_shape_to_geoserver() |
computing/utils.py |
| layer bookkeeping | save_layer_info_to_db() and sync status helpers |
computing/utils.py |
| route exposure | DRF views and task triggers | computing/api.py, computing/urls.py |
Note
If your pipeline uses ee_initialize() or accepts gee_account_id, document the setup prerequisite explicitly and point operators to Google Earth Engine. In the current stack, that setup is required for most real compute runs.
Recurring Pipeline Shapes¶
1. Vector Clip and Publish¶
Used by pages such as:
Typical pattern:
- build or load ROI
- filter an external vector dataset
- export to GEE or write local shapes
- save layer metadata
- publish to GeoServer
2. Raster Clip and Publish¶
Used by pages such as:
Typical pattern:
- build ROI
- load a source raster
- clip or derive the raster
- export raster to GEE or GCS
- publish through GeoServer
3. Vector Enrichment or Spatial Join¶
Used by pages such as:
Typical pattern:
- load ROI or administrative geometry
- join with an external table or vector layer
- compute attributes or dominant classes
- export and publish the resulting feature collection
4. Mixed Raster and Vector Outputs¶
Used by pages such as:
These pages are useful when you want to study both raster publication and vector summary logic in one place.
5. Time-Series Helpers¶
Used by pages such as:
These are useful when your new pipeline depends on temporal compositing, per-class aggregation, or derived vegetation signals.
Builder Workflow¶
Step 1: Decide the smallest useful output¶
Choose early whether the first useful version of your pipeline is:
- vector only
- raster only
- tabular plus geometry
- mixed raster and vector
That choice determines which shared helpers you will reuse.
Step 2: Start with the processing function¶
Write the core logic first. Make it clear what inputs the function needs and what output it returns before you add publication concerns.
Step 3: Wrap it in a task or callable entry point¶
Depending on the use case, follow one of these patterns:
- direct function for shell use
- Celery task for async execution
- DRF view in
computing/api.pyfor HTTP callers
Step 4: Reuse shared integration helpers¶
Do not re-invent:
- naming conventions
- asset path generation
- GeoServer upload paths
- layer metadata persistence
Those are already shared across the stack.
Step 5: Document the pattern, not just the result¶
A good pipeline page should tell future contributors:
- where the code lives
- what entry point calls it
- whether it is vector, raster, or mixed
- which shared helpers it relies on
- what cloud dependencies are required for full integration
- how an operator obtains the
gee_account_idif the pipeline needs one
Good Pages to Borrow From¶
If you are building a new pipeline, compare these examples:
| If you need... | Start from |
|---|---|
| a vector clipping workflow | Drainage Lines |
| a raster clipping workflow | Catchment Area |
| a table-plus-geometry enrichment workflow | Facilities Proximity |
| mixed raster and vector publication | Stream Order |
| time-series processing ideas | NDVI Time Series |
Incremental Adoption¶
A new contributor does not need to build the whole cloud publication chain on day one.
A practical way to start is:
- understand an existing pipeline page in this section
- prototype the data logic locally
- expose a direct function or shell workflow
- add API or Celery integration
- add GEE, GeoServer, or STAC publication only when the science is stable
That staged approach is the same one reflected across the developer pages in this site: start with direct execution, then expose stable entry points, then add publication and metadata integration.