For the past few years, the data lake ecosystem has grown extensively. Various storage backends, streaming platforms, processing frameworks, etc. I’ve always wondered, while working with data lakes, lake houses, and the traditional data warehouse, filled with customer activity scenarios: where is the observability data?
Various reasons:
No apparent use case for broad spectrum data, long-term storage, or both
Inability or hesitation to take different data structures and normalize (via ETL, etc)
Standardization
There, finally, may be a way to build a so called Observability Data Lake, with OpenTelemetry schemas being the avenue. The data models specified by OTeL are extremely conducive to sustainable broad and long-term storage and unlock exciting capabilities:
Organization wide visibility into incidents, incident impacts, resource usage, and cost forecasts
Using observability data for user analytics and making it “joinable” with other analytics (advertising performance, partner data, etc)
Large datasets for machine learning
We seek to enable this through OTeL standard pipelining into object storage as a primary destination. This strikes a balance between the real-time and nuanced views that current observability vendors critically provide, and the full-resolution observability datasets that can be used for strategic purposes.