Log File Collection with OTeL

Log File Collection with OTeL

The filelogreceiver is a powerful component within the OTeL Collector. It is a full-fledged receiver that can tail files, and also process them on the fly. Before diving into the receiver configuration specifics and functionality, it is crucial to understand the logs data model.

Logs Data Model

There are various fields within the data model specification. Let's cover them categorically:

  • Time

    • Timestamp: The timestamp from the source (for example, when a log was generated by an SDK. Preferred)

    • ObservedTimestamp: The timestamp from the collection system (for example, when the collector receives the log message)

  • Trace

    • TraceId, SpanId, TraceFlags: For correlation to traces and spans. TraceFlags provides recommendations such as 'sampling' level, etc
  • Severity

    • SeverityText, SeverityNumber: "log level". Optional fields but OTeL provides numerical representations for granular log levels (TRACE, DEBUG, etc)
  • Body

    • Body: the log message body.
  • Attribution

    • Resource: Key / Value pairs that define and scope where messages originate from (for example: pods, instances, services, etc)

    • Attributes: Key / Value pairs that give additional information about the message (for example: http method, status, etc)

    • InstrumentationScope: Provides information on emitting libraries (for example: OTeL SDK library and version)

File Log Receiver

This receiver has a multitude of configuration options. Again, let's explore categorically and highlight important ones:

  • File Match Patterns

    • include, exclude: glob patterns that define files to be read, and specific ones to be excluded (for example: /var/log/folder/*.json )
  • Processing

    • multiline: Specifies a match pattern that can be used to specify new records. Incredibly useful for Error / Stack Trace type logs.

    • storage: Critical to offset tracking (file storage). Points to an "extension" in the OTeL spec.

    • operators: There are various "operators" that can take one of the fields (like the message body) and parse it (for example, parse json, time, uri, etc). These can be chained together via the output option. You can use if conditionals and also embed processors

  • Performance

    • poll_interval: Duration between File System polls

    • max_log_size: Max size of log entry to read to protect from large memory usage

  • Metadata Decoration

    • include_file_*: Various config parameters (boolean) like include_file_name, include_file_path, etc that can be added as useful attributes

    • attributes / resource: Directly add key / value pairs for all log messages from a file referenced in a receiver.

Offset Tracking for Reliability

Offset tracking is crucial to ensure log collection is reliable and accurate if the collector(s) restart. Using the file_storage extension, one can specify a directory for storage.

receivers:
  filelog:
    include: [/var/log/service/sample.log]
    storage: file_storage/filelogreceiver
extensions:
  file_storage/filelogreceiver:
    directory: /var/lib/otelcol/mydir