Build an Observability Data Lake with OTeL

Build an Observability Data Lake with OTeL

For the past few years, the data lake ecosystem has grown extensively. Various storage backends, streaming platforms, processing frameworks, etc. I’ve always wondered, while working with data lakes, lake houses, and the traditional data warehouse, filled with customer activity scenarios: where is the observability data?

Various reasons:

  1. No apparent use case for broad spectrum data, long-term storage, or both

  2. Inability or hesitation to take different data structures and normalize (via ETL, etc)

  3. Standardization

There, finally, may be a way to build a so called Observability Data Lake, with OpenTelemetry schemas being the avenue. The data models specified by OTeL are extremely conducive to sustainable broad and long-term storage and unlock exciting capabilities:

  1. Organization wide visibility into incidents, incident impacts, resource usage, and cost forecasts

  2. Using observability data for user analytics and making it “joinable” with other analytics (advertising performance, partner data, etc)

  3. Large datasets for machine learning

We seek to enable this through OTeL standard pipelining into object storage as a primary destination. This strikes a balance between the real-time and nuanced views that current observability vendors critically provide, and the full-resolution observability datasets that can be used for strategic purposes.