Skip to main content

File

Pull

Synopsis

Director polls glob patterns on its own filesystem (or any mounted path) at a configured interval and forwards matched log lines through an optional pipeline. No Agent is required.

Schema

- id: <numeric>
name: <string>
description: <string>
type: file
tags: <string[]>
pipelines: <pipeline[]>
status: <boolean>
properties:
path: <string>
pipeline_name: <string>
poll_interval: <numeric>
file_log_concurrency: <numeric>
start_date: <numeric>
ignore_cache: <boolean>
ignore_old_date: <boolean>
ignore_retention: <boolean>
ignore_time: <boolean>
date_format: <string>
line_parser: <string|map>
encoding: <string>
filter_mode: <string>
filter_rules: <map[]|string[]>

Configuration

Device

FieldRequiredDefaultDescription
idY-Unique numeric identifier
nameY-Device name
descriptionN-Optional description
typeY-Must be file
tagsN-Optional tags
statusNtrueEnable/disable the device

File Source

FieldRequiredDefaultDescription
pathY-Comma-separated glob pattern(s) to scan. Each entry is treated as an independent glob; whitespace around commas is trimmed and empty segments are dropped. Supports ** for recursive directory matches.
pipeline_nameN-Name of the pipeline that pre-processes matched lines. Empty string passes lines through unprocessed.

Polling

FieldRequiredDefaultDescription
poll_intervalN60Polling cadence in seconds. Must be greater than 0. Changing this value restarts the collector.
file_log_concurrencyN1Maximum number of files read in parallel per poll tick. Higher values increase throughput at the cost of memory. Changing this value restarts the collector.
start_dateN300Lookback window in seconds applied against file modification time. 0 falls back to a 1-second window. -1 (or any negative value) disables time-based filtering entirely.

Reader Options

FieldRequiredDefaultDescription
ignore_cacheNfalseSkip the persisted file-position cache and re-read from the beginning of each file.
ignore_old_dateNfalseSkip the reader's old-date filter.
ignore_retentionNfalseSkip retention-based filtering.
ignore_timeNfalseSkip per-line time filtering.
date_formatN-Custom log timestamp format. Uses Go's reference time layout (2006-01-02T15:04:05Z07:00), not strftime.

All reader options are hot-reloaded on the next poll tick without restarting the collector.

Line Parser

Controls how individual lines are grouped into log entries.

FieldRequiredDefaultDescription
line_parserN-Line parser definition. Accepts either a map (preferred) or a bare string shorthand.
line_parser.typeN*-Parser mode: regex, newline (alias new_line), string, or prefix. Numeric aliases: 1 = regex, 2 = newline, 3 = string.
line_parser.regexN*-Regex pattern that detects the start of a new log entry. Alias: value.
line_parser.date_basedNfalseUse date-based multiline merging.
line_parser.has_spaceNfalseTreat leading whitespace as a line-continuation marker.

* Required when using the map form with type: regex.

A bare string value for line_parser is treated as a regex pattern equivalent to type: regex with that pattern.

Encoding

FieldRequiredDefaultDescription
encodingN-Character encoding of the source files. Accepts an alias (case-insensitive; -, _, spaces, and dots are stripped) or a numeric decoder ID.

Supported aliases:

AliasEncoding
utf8UTF-8
utf8bomUTF-8 with BOM
utf16beUTF-16 Big Endian
utf16leUTF-16 Little Endian
utf16bebomUTF-16 BE with BOM
utf16lebomUTF-16 LE with BOM
gbkGBK (Simplified Chinese)
latin1, iso88591ISO 8859-1 / Latin-1
windows1250, cp1250Windows-1250 (Central European)
windows1251, cp1251Windows-1251 (Cyrillic)
windows1252, cp1252Windows-1252 (Western European)
windows1256, cp1256Windows-1256 (Arabic)

Filtering

FieldRequiredDefaultDescription
filter_modeN-Filter direction: include keeps only matching lines; exclude drops matching lines.
filter_rulesN-List of filter rules. Accepts map form or a bare list of strings (treated as regex rules).
filter_rules[].typeN*-Rule type: regex or string.
filter_rules[].regexN*-Regex pattern to match against each line. Required when type: regex.
filter_rules[].sourceN*-Substring or wildcard pattern to match. Alias: value. Required when type: string.

* Required for each rule entry.

A bare list of strings is accepted as shorthand and treated as regex rules.

Details

Path Resolution

path accepts a single string that may contain comma-separated glob expressions. Each entry is processed as an independent glob after whitespace trimming; empty segments (e.g., trailing commas) are discarded. Each path is normalized via filepath.Clean before globbing. The ** double-star pattern matches recursively across directory levels.

Hot Reload vs Restart

Most configuration changes take effect on the next poll tick without interrupting the collector:

  • Hot-reload (no restart): path, start_date, ignore_cache, ignore_old_date, ignore_retention, ignore_time, date_format, line_parser, encoding, filter_mode, filter_rules
  • Restart required: poll_interval, file_log_concurrency

Time-Based Filtering

start_date is applied against each file's modification time before the file is read:

  • Positive value (e.g., 300): Only files modified within the last N seconds are processed.
  • 0: Falls back to a 1-second lookback window.
  • Negative value (e.g., -1): Disables time-based filtering; all matched files are processed regardless of modification time.

Startup Behavior

At startup the collector sleeps for a random interval of 0–20 seconds to spread load when multiple file devices start simultaneously. The first collection run begins immediately after this delay, then repeats at poll_interval.

A heartbeat monitor checks that the collector reports progress within 120 seconds. If the heartbeat threshold is exceeded the collector is stopped and the device connection state is set to error.

Security

Symlink containment and allow-listed root path enforcement are not implemented. The Director follows symlinks without restriction.

warning

Operators are responsible for ensuring that configured paths do not expose unintended parts of the filesystem.

Examples

Single Glob

Collecting all .log files under /var/log/app/ every 60 seconds with a 5-minute lookback window...

- id: 1
name: app-logs
type: file
properties:
path: "/var/log/app/*.log"
poll_interval: 60
start_date: 300

Multiple Globs with Pipeline

Scanning two directory trees with a single comma-separated path and routing matched lines through a preprocessing pipeline...

- id: 2
name: app-and-archive-logs
type: file
properties:
path: "/var/log/app/*.log,/var/log/app/archive/*.log"
pipeline_name: "normalize-timestamps"
poll_interval: 120
file_log_concurrency: 4
start_date: 600

Multiline Log Entries

Merging Java-style stack traces into single log entries using a date-prefix regex to detect the start of each entry...

- id: 3
name: java-app-logs
type: file
properties:
path: "/opt/app/logs/*.log"
line_parser:
type: regex
regex: '^\d{4}-\d{2}-\d{2}'
date_based: true
has_space: true
date_format: "2006-01-02T15:04:05.000Z07:00"

Filter Rules

Including only ERROR and WARN lines while excluding health-check entries using mixed filter rule types...

- id: 4
name: filtered-app-logs
type: file
properties:
path: "/var/log/app/*.log"
filter_mode: include
filter_rules:
- type: regex
regex: '^(ERROR|WARN)'
- type: string
source: '*healthcheck*'

Full Historical Scan

Re-reading all matched files from the beginning by disabling time filtering and resetting the position cache, useful for reprocessing after a pipeline change...

- id: 5
name: historical-scan
type: file
properties:
path: "/mnt/archive/logs/**/*.log"
start_date: -1
ignore_cache: true
file_log_concurrency: 8
pipeline_name: "reprocess-pipeline"