OTeL: Effective Process Monitoring Solutions

In the last few posts, we have seen how we can utilize the OTeL collector to collect system and K8s metrics, along with metadata. This base configuration state can provide simple health metrics on diverse environment types. From here, the amount of unique metrics practitioners can gather becomes vast. There are various receivers, the prometheus ecosystem, StatsD, custom OTLP, etc. An overlooked category of metrics is often of processes running on a host.

Configuration

receivers:
  hostmetrics:
    collection_interval: 10s
    scrapers:
      process:
          metrics:
            process.cpu.utilization:
              enabled: true
            process.disk.operations:
              enabled: true
            process.memory.utilization:
              enabled: true
            process.threads:
              enabled: true

As part of the hostmetrics receiver, the process scraper can easily be added. This example configuration provides a sufficient instantiation, with gauge like metrics such as cpu and memory utilization being enabled.

Resource Attributes

Valuable resource attributes are populated alongside the metrics. Of these, process.pid, process.owner, process.command and process.command_line are particularly interesting. Process identifiers and owners can help highlight resource contention within a host. Changes in process command line arguments can highlight anomalous changes in metrics due to different command line strings.

Sensitive Data Scrubbing

One thing to note is the potential to leak sensitive command line arguments as part of data collection. It is imperative to add the transform processor

transform:
  metric_statements:
    - context: metric
      statements:
        - replace_pattern(resource.attributes["process.command_line"], "password\\=[^\\s]*(\\s?)", "password=***")

An example where the command_line resource attribute with a password argument is redacted via OTTL (OpenTelemetry Transformation Language).

Conclusion

Process monitoring is an excellent way to isolate long-running processes that are exhausting resources or ephemeral processes that cause sudden anomalies. OTeL provides a clean and concise way to do just this.