Crop Data Warehouse Solution Overview
Purpose
The Crop Data Warehouse solution provides a structured process for importing operational crop management data from CSV files into a SQL Server data warehouse for reporting and analysis.
The solution performs the following functions:
Imports source CSV files into raw staging tables.
Validates and transforms source data into typed staging tables.
Loads reporting tables using Slowly Changing Dimension Type 2 (SCD2) processing.
Preserves historical changes to support auditing and trend analysis.
Solution Architecture
The solution consists of three logical layers:
Raw Layer
The raw layer stores data exactly as received from source CSV files.
Tables:
stage.Job_Raw
stage.JobActivity_Raw
stage.JobActivityMaterial_Raw
This layer provides:
Data lineage
Source traceability
Recovery from load failures
Historical source file retention
Staging Layer
The staging layer converts raw text values into business-ready data types and applies validation rules.
Tables:
CrpRpt.Stage_Job
CrpRpt.Stage_JobActivity
CrpRpt.Stage_JobActivityMaterial
This layer provides:
Data cleansing
Data type conversion
Standardisation of source values
Reporting Layer
The reporting layer stores validated business data using Slowly Changing Dimension Type 2 (SCD2) methodology.
Tables:
CrpRpt.Job
CrpRpt.JobActivity
CrpRpt.JobActivityMaterial
This layer provides:
Full history tracking
Current and historical reporting
Auditability of source changes
Data Flow
CSV Files
↓
Raw Tables
↓
Staging Procedures
↓
Stage Tables
↓
SCD2 Load Procedures
↓
Reporting Tables
Key Business Entities
Job
Represents agricultural work performed against a field and crop combination.
Business Key:
Company
CropYear
FieldId
CropId
Job Activity
Represents individual activities performed as part of a job.
Business Key:
ActivityId
Job Activity Material
Represents materials used during an activity.
Business Key:
ActivityId
ActivitySeq
Historical Tracking
The reporting tables use SCD2 methodology.
When a source record changes:
The existing record is marked as inactive.
A new current version is inserted.
Historical records remain available for reporting.
This allows reporting to be performed against both current and historical data states.