Skip to content

LULC Generation Pipeline

Generate high-resolution Land Use Land Cover (LULC) maps using satellite imagery and machine learning.


Overview

graph TB
    subgraph "Input"
        A[Sentinel-2 Imagery]
        B[Training Samples]
        C[Auxiliary Data]
    end

    subgraph "Processing"
        D[Preprocessing]
        E[Feature Extraction]
        F[Classification]
        G[Post-processing]
    end

    subgraph "Output"
        H[LULC Raster]
        I[Accuracy Report]
        J[Change Detection]
    end

    A --> D
    B --> D
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H
    G --> I
    G --> J

The LULC pipeline classifies land into categories such as:

  • Agriculture
  • Forest
  • Water Bodies
  • Built-up Areas
  • Barren Land
  • Vegetation

Classification System

pie title LULC Classes Distribution
    "Agriculture" : 45
    "Forest" : 18
    "Water" : 8
    "Built-up" : 12
    "Barren" : 10
    "Vegetation" : 7

Class Definitions

Class Code Description Color
Agriculture 1 Croplands, plantations #ffff00
Forest 2 Dense tree cover #006400
Water 3 Rivers, lakes, reservoirs #0000ff
Built-up 4 Urban areas, roads #ff0000
Barren 5 Bare soil, rock #a52a2a
Vegetation 6 Grasslands, shrubs #7cfc00

Algorithm Flow

sequenceDiagram
    participant U as User
    participant A as API
    participant G as GEE
    participant C as Classifier
    participant S as Storage

    U->>A: Request LULC Generation
    A->>G: Query Sentinel-2
    G->>A: Return ImageCollection

    A->>A: Cloud Masking
    A->>A: Compute Indices<br/>NDVI, NDWI, NDBI

    A->>C: Train Random Forest
    C->>C: Classify Pixels

    A->>A: Majority Filter
    A->>A: Sieve Small Polygons

    A->>S: Export Results
    S->>U: Download Link

Input Requirements

Required Parameters

{
  "state": "karnataka",
  "district": "raichur", 
  "block": "devadurga",
  "year": 2023,
  "resolution": "10m"
}

Input Data Sources

graph LR
    A[LULC Pipeline] --> B[Sentinel-2 L2A]
    A --> C[Ground Truth Data]
    A --> D[Existing LULC]

    B --> E[10m Resolution<br/>Multi-spectral]
    C --> F[Training Samples<br/>Reference Points]
    D --> G[Validation Data<br/>Comparison]
Data Source Purpose Resolution
Sentinel-2 L2A Primary imagery 10m
Training samples Supervised classification Point data
Existing LULC Validation 30-100m

Processing Steps

1. Data Acquisition

# Query Sentinel-2 for date range
start_date = f"{year}-01-01"
end_date = f"{year}-12-31"

# Filter by region and cloud cover
image_collection = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
    .filterDate(start_date, end_date)
    .filterBounds(geometry)
    .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)))

2. Preprocessing

graph LR
    A[Raw Imagery] --> B[Cloud Masking]
    B --> C[Atmospheric Correction]
    C --> D[Median Composite]
    D --> E[Clip to Region]

3. Feature Extraction

Calculated spectral indices:

Index Formula Purpose
NDVI (NIR-Red)/(NIR+Red) Vegetation health
NDWI (NIR-SWIR)/(NIR+SWIR) Water detection
NDBI (SWIR-NIR)/(SWIR+NIR) Built-up areas
EVI 2.5*(NIR-Red)/(NIR+6*Red-7.5*Blue+1) Enhanced vegetation

4. Classification

graph TB
    A[Input Features] --> B[Random Forest]
    A --> C[Training Samples]

    B --> D{Bootstrap<br/>Aggregation}
    C --> D

    D --> E[Decision Tree 1]
    D --> F[Decision Tree 2]
    D --> G[Decision Tree N]

    E --> H[Majority Vote]
    F --> H
    G --> H

    H --> I[Classified Map]

5. Post-processing

  • Majority filter: Remove salt-and-pepper noise
  • Sieve filter: Remove small polygons (< 0.5 ha)
  • Smoothing: Gaussian kernel smoothing

Output Products

Raster Output

graph LR
    A[LULC Raster] --> B[GeoTIFF Format]
    A --> C[Color-mapped PNG]
    A --> D[Class Statistics]
Property Value
Format GeoTIFF
Resolution 10m
CRS EPSG:4326
Bands 1 (class values)
Compression LZW

Statistics Report

{
  "classification_stats": {
    "total_pixels": 1250000,
    "resolution_m": 10,
    "area_sqkm": 125.0,
    "class_distribution": {
      "agriculture": {"pixels": 562500, "area_sqkm": 56.25, "percentage": 45.0},
      "forest": {"pixels": 225000, "area_sqkm": 22.5, "percentage": 18.0},
      "water": {"pixels": 100000, "area_sqkm": 10.0, "percentage": 8.0},
      "built_up": {"pixels": 150000, "area_sqkm": 15.0, "percentage": 12.0},
      "barren": {"pixels": 125000, "area_sqkm": 12.5, "percentage": 10.0},
      "vegetation": {"pixels": 87500, "area_sqkm": 8.75, "percentage": 7.0}
    }
  },
  "accuracy": {
    "overall": 88.5,
    "kappa": 0.85,
    "producer_accuracy": {...},
    "user_accuracy": {...}
  }
}

API Usage

Request

curl -X POST "https://geoserver.core-stack.org/api/v1/lulc_for_tehsil/" \
  -H "Content-Type: application/json" \
  -d '{
    "state": "karnataka",
    "district": "raichur",
    "block": "devadurga",
    "start_year": 2022,
    "end_year": 2023,
    "version": "v3",
    "gee_account_id": 1
  }'

Response

{
  "Success": "generate_lulc_v3_tehsil task initiated"
}

Local Mode

Local-first LULC documentation is still being aligned with the backend. For the current route surface, use the task-submission handlers in computing/api.py and treat the Local-First section in this docs repo as roadmap material until that alignment is complete.

graph LR
    A[Local Request] --> B[STAC Catalog]
    B --> C[Download Pre-computed]
    C --> D[Return Result]

Change Detection

Compare LULC between two time periods:

graph TB
    A[LULC 2019] --> C[Change Detection]
    B[LULC 2023] --> C

    C --> D[Change Matrix]
    C --> E[Hotspot Analysis]
    C --> F[Transition Map]

Change Matrix Example

From/To Ag Forest Water Built-up
Ag 85% 5% 2% 8%
Forest 10% 88% 1% 1%
Water 5% 2% 92% 1%
Barren 45% 10% 3% 42%

Accuracy Assessment

Confusion Matrix

pie title Agriculture Predictions
    "Correct (85)" : 85
    "Forest (5)" : 5
    "Water (2)" : 2
    "Built-up (8)" : 8

Accuracy Metrics

Metric Value Description
Overall Accuracy 89.3% Correctly classified / Total
Kappa Coefficient 0.87 Agreement beyond chance
Producer's Accuracy 85-92% Correct / Actual class
User's Accuracy 84-93% Correct / Predicted class

Troubleshooting

Low Accuracy

Issue Cause Solution
Cloud contamination Poor cloud masking Increase cloud threshold
Class confusion Similar signatures Add more training data
Mixed pixels 10m resolution Use sub-pixel analysis

Processing Errors

Error Solution
"No clear images" Expand date range
"Memory exceeded" Reduce region size
"Export failed" Check storage quota

Best Practices

  1. Use dry season imagery - Better visibility of land features
  2. Multi-year composites - Reduce year-to-year variation
  3. Ground truth validation - Essential for accuracy
  4. Consistent classification - Use same scheme for change detection