Skip to content

Architecture Overview

L5 · Surface

L4 · Correlate

L3 · Detect

L2 · Parse

L1 · Ingest

Log sources

syslog · OTLP · file · webhooks

Receivers + bounded async queue

10K events · backpressure

Drain3

template extraction

Entity extraction

UUID5 resolution

ML ensemble

HST · Holt-Winters · CUSUM · Markov

Sigma engine

63 rules · MITRE tags

Sliding window

+ watermarks

Entity graph

igraph / FalkorDB / AGE

Risk register

+ kill-chain

Storage

SQLite / PostgreSQL

Alert router

webhooks · PagerDuty · email · SMS · Telegram · WhatsApp · OTLP

Dashboard

React + REST + WebSocket

All five stages run in a single Python process. No JVM, no broker, no cluster.

Concurrent receivers, all feeding one bounded async queue (default 10K events, backpressure built in):

  • Syslog UDP/TCP (RFC 3164 + RFC 5424)
  • OTLP gRPC and OTLP HTTP (protobuf + JSON)
  • File tailing (glob, rotation, checkpointed offsets, debounce)
  • Webhook HTTP (JSON / form, per-endpoint auth token, field mapping)
  • Drain3 for streaming log template extraction (~120K msgs/sec).
  • Regex-based entity extraction (IPs, users, hosts, files, domains, processes).
  • UUID5 entity resolution — the same entity yields the same ID across all sources, no matter which receiver saw it first.

All algorithms are online/streaming — they update with every event, no batch retraining:

AlgorithmDetectsNotes
Half-Space TreesContent anomaliesRiver ensemble, constant memory
Holt-WintersVolume anomaliesTrend + seasonal decomposition
CUSUMChange pointsBidirectional cumulative sum
Markov chainsSequence anomaliesPer-entity transition matrices
biDSPOTAuto-thresholdsBidirectional EVT (scipy GPD)
pySigmaKnown threat patterns63 bundled rules, logsource-indexed dispatch

The DetectionEnsemble blends the four ML scores with z-normalization and per-source weights (weights_content, weights_volume, weights_sequence, weights_pattern).

The correlation engine fuses signals across entities and time:

  • Sliding window per entity with watermark-based late-arrival tolerance
  • Risk accumulation — per-entity risk register with exponential decay; catches slow-burn attacks
  • Graph-structural scoring — community-crossing edges, betweenness centrality, fan-out outliers
  • Kill-chain detector — MITRE ATT&CK tactic progression (default: ≥3 distinct tactics within 24h)

Three pluggable backends, switchable via storage.graph_backend:

BackendWhen to useExtras install
igraph (default)Single-process, in-memory, fastest(none)
falkordbExternal, shared graph, query via Cypheruv sync --extra graph-falkordb
postgres_ageCo-locate graph with relational storeuv sync --extra graph-postgres-age

Migrate between backends with seerflow graph migrate --from <a> --to <b>.

  • UEBA — per-user / per-host behavioural baselines with a configurable warm-up (default 7 days / 50 events) before scoring.
  • Threat intelligence — pull IoCs from TAXII feeds and match them against ingested events with a Bloom-filter matcher tuned for low false-positive rate.
  • pySigma with 63 bundled SigmaHQ rules; add custom directories via detection.sigma_rules_dirs.
  • MITRE ATT&CK tagging on every rule (tactic + technique).
  • Logsource-indexed dispatch keeps throughput high even with thousands of rules.

Outbound channels:

  • Generic webhooks (raw JSON)
  • Slack and Microsoft Teams (formatted)
  • PagerDuty Events API v2
  • Dedup window (default 15 min) on the alert dedup key

Bundled with the wheel — no separate uvicorn process needed:

  • React SPA on http://127.0.0.1:8080/
  • REST API on http://127.0.0.1:8080/api/v1/
  • WebSocket live stream on ws://127.0.0.1:8080/api/v1/ws

The CLI’s seerflow start boots receivers, the detection engines, the correlation engine, and the FastAPI dashboard in the same process.

The core SeerflowEvent is a frozen msgspec.Struct that unifies four log schema standards:

  • OpenTelemetry LogRecord (ns timestamps, trace context, severity 1-24)
  • Elastic Common Schema (event.kind / category / type / outcome)
  • OCSF (numeric taxonomy: category_uid / class_uid / type_uid)
  • Sigma (logsource.category / product / service)

Key fields: dual timestamps (event vs observed), Drain3 template metadata, entity references, MITRE ATT&CK mapping, risk and anomaly scores.

Protocol-based interfaces — backend switchable via one config line. See the Storage page for SQLite vs PostgreSQL details and the schema layout.

ProtocolMethodsPurpose
LogStorewrite_events, query_events, search_textEvent persistence + FTS
AlertStorewrite_alert, query_alerts, update_feedbackAlert management + analyst feedback
ModelStoresave_state, load_stateML model checkpoints across restarts
EntityStoreget_timeline, get_relatedEntity exploration

Backends: SQLite (zero-config default, WAL + FTS5) and PostgreSQL (asyncpg pool, production scale).