Skip to main content

Schema Drift Detection

Schema drift occurs when vendors change their log formats without warning—field names shift, structures reorganize, or data types change. These silent changes break normalization pipelines that continue running as if nothing happened, causing data quality issues downstream.

The Challenge

Organizations ingesting logs from multiple vendors face a critical question: what happens when a vendor changes their field names or log structure? Without detection mechanisms, pipelines continue processing malformed data, leading to:

  • Missing fields in normalized output
  • Type mismatches causing ingestion failures
  • Extra fields consuming storage without value
  • Compliance gaps from incomplete data

Detection Approaches

DataStream supports two validation strategies that can be combined based on operational requirements:

Schema-on-Write

Strict enforcement at ingestion time keeps stored data clean. Events failing validation are flagged immediately, enabling real-time alerting and fallback processing.

processors:
- check_schema:
schema: "ASimNetworkSessionLogs"
target_field: "schema_check"
check_mode: "both"
- reroute:
if: "schema_check.is_valid == false"
destination: "quarantine"

Schema-on-Read

Flexible adaptation allows analytics to continue despite minor deviations. Validation results are stored with the event for later analysis without blocking ingestion.

processors:
- check_schema:
schema: "ASimNetworkSessionLogs"
target_field: "schema_check"
check_mode: "missing"
validate_recommended: false
validate_optional: false

Validation Levels

The check_schema processor validates events against official schema definitions, checking for:

LevelBehaviorImpact on Validity
Required fieldsAlways checked when check_mode includes missingMissing required fields invalidate the event
Recommended fieldsChecked only when validate_recommended: trueConfigurable impact on validity
Optional fieldsChecked only when validate_optional: trueConfigurable impact on validity
Extra fieldsChecked when check_mode includes extraNever affects validity (informational only)
Type mismatchesAlways checked for present fieldsFollows field requirement level

Check Modes

The check_mode parameter controls what the processor validates:

ModeDetects Missing FieldsDetects Extra Fields
missingYesNo
extraNoYes
bothYesYes

Validation Results

The processor writes a structured result to the specified target_field:

{
"is_valid": false,
"missing_required_fields": ["EventSchema", "EventVendor"],
"missing_recommended_fields": ["DvcAction", "EventSeverity"],
"missing_optional_fields": ["SrcNatIpAddr"],
"extra_fields": ["CustomField1", "VendorSpecific"],
"type_mismatches": [
{
"field": "EventCount",
"expected_type": "INT32",
"actual_type": "STRING"
}
]
}

Conditional Processing Chains

The processor supports conditional processor chains that execute based on validation findings:

processors:
- check_schema:
schema: "ASimNetworkSessionLogs"
target_field: "schema_check"
check_mode: "both"
on_missing:
- set:
field: "drift_type"
value: "missing_fields"
on_extra:
- set:
field: "drift_type"
value: "extra_fields"
on_type_mismatch:
- set:
field: "drift_type"
value: "type_mismatch"

Automated Response

Detected schema drift can trigger automated responses through notification processors:

Alerting

Send immediate notifications when drift is detected using notification processors like slack or pagerduty:

processors:
- check_schema:
schema: "ASimNetworkSessionLogs"
target_field: "schema_check"
check_mode: "both"
on_missing:
- slack:
title: "Schema Drift Detected"
message: "Missing fields in {{ .EventVendor }} logs"
color: "warning"
- pagerduty:
summary: "Schema drift: {{ .schema_check.missing_required_fields }}"
severity: "warning"

Fallback Normalization

Route events with drift to alternative processing:

processors:
- check_schema:
schema: "ASimNetworkSessionLogs"
target_field: "schema_check"
check_mode: "missing"
- reroute:
if: "schema_check.is_valid == false"
destination: "fallback_normalizer"
- reroute:
if: "schema_check.is_valid == true"
destination: "sentinel"

Field Enrichment

Automatically populate missing fields with defaults:

processors:
- check_schema:
schema: "ASimNetworkSessionLogs"
target_field: "schema_check"
check_mode: "missing"
on_missing:
- set:
if: "EventVendor == null"
field: "EventVendor"
value: "Unknown"
- set:
if: "EventProduct == null"
field: "EventProduct"
value: "Unknown"

Supported Schemas

The processor supports validation against:

  • ASIM schemas: Microsoft Sentinel's Advanced Security Information Model tables (ASimNetworkSessionLogs, ASimAuthenticationEventLogs, etc.)
  • OCSF schemas: Open Cybersecurity Schema Framework categories

Schema names are specified in the schema field and can use template syntax for dynamic selection:

processors:
- check_schema:
schema: "{{ .target_table }}"
target_field: "schema_check"
check_mode: "both"

Integration with Multi-Tier Pipelines

Schema drift detection integrates with staged routing to validate data at each normalization tier. See Multi-Tier Pipelines for patterns combining schema validation with progressive normalization.