Architecture Overview
Processing Pipeline
Section titled “Processing Pipeline”Log Sources → OTel Collector → Drain3 (template extraction) → Feature Engineering → Anomaly Detection → Alerting1. Ingestion
Section titled “1. Ingestion”OpenTelemetry Collector as gateway. Supports:
- CloudWatch, GCP Logging, Azure Monitor
- Syslog (UDP/TCP)
- File tailing
- Kafka
- OTLP (gRPC/HTTP)
2. Parsing
Section titled “2. Parsing”Drain3 for streaming log template extraction. Reduces millions of raw log lines to thousands of structured patterns in real-time using a fixed-depth parse tree.
3. Detection
Section titled “3. Detection”All algorithms are online/streaming — they update with every event, no batch retraining:
| Algorithm | Detects | Use Case |
|---|---|---|
| Half-Space Trees | Content anomalies | Unusual log patterns |
| Holt-Winters | Volume anomalies | Traffic spikes/drops |
| CUSUM | Change points | Regime shifts |
| Markov chains | Sequence anomalies | Unusual event ordering |
| DSPOT | Auto-thresholds | Self-tuning alert levels |
4. Correlation
Section titled “4. Correlation”Entity-centric igraph (40-250x faster than NetworkX):
- Entity types: users, IPs, hosts, processes, files, domains
- Three strategies: entity-temporal window joins, risk accumulation, graph-structural
- Temporal watermarking for event ordering
5. Security
Section titled “5. Security”- pySigma with 3,000+ SigmaHQ rules
- MITRE ATT&CK framework mapping
- Kill-chain tracking across entities
Data Model
Section titled “Data Model”The core SeerflowEvent struct unifies four log schema standards:
- OpenTelemetry LogRecord
- Elastic Common Schema (ECS)
- OCSF numeric taxonomy
- Sigma logsource categories
Storage
Section titled “Storage”Protocol-based interfaces — backend switchable via one config line:
| Protocol | Methods | Purpose |
|---|---|---|
LogStore | write_events, query_events, search_text | Event persistence |
AlertStore | write_alert, query_alerts, update_feedback | Alert management |
ModelStore | save_state, load_state | ML model checkpoints |
EntityStore | get_timeline, get_related | Entity exploration |
Backends: SQLite (zero-config default) and PostgreSQL (production scale).