Skip to main content

Command Palette

Search for a command to run...

The Case Against "Top-Down" AI in Observability

Updated
3 min read
The Case Against "Top-Down" AI in Observability

LLMs are having an ostensibly large effect on SaaS verticals since the proliferation of agentic AI frameworks and the fruition of proper tool use. There are, finally, approaches and startups attacking the problem in the observability space. However, I would argue that this paradigm is unfortunately stuck in a mindset that is:

  • Obsessed, almost reliant, on network and entity complexity

  • Addicted to promoting big-data digestion

  • Suffers from, or completely disregards, context gaps. Or worse, assume you have the all right data.

Engineers, and observability practitioners, ideally like to manage and forecast for complexity. It’s for this reason we are also seeing services and tools that are combating code complexity:

  • Optimization for runtime execution via profiling

  • Code cleanup for brevity/developer experience

  • Dependency and configuration cleanup

I always like to take a gentle, systems theory approach to these things. We can see a “shift-left” moment happening in programming to make the system and its functioning in the environment more predictable and less “reactive” to changes and drift. Let’s consider a variable C that represents complexity and N that can be considered independently but influences C. The purpose of this exercise is to examine how these variables can increase in magnitude, greatly influence compute time factors, and subsequently increase the probability of false negatives and positives.

Factors influencing C:

  1. Quantity of entities (micro-services, hosts, etc). Increases the “cohorts” potentially needed to be analyzed

  2. Variety of request types. Could be simple HTTP requests, or long-running jobs.

  3. Variety of request attributes. Metadata decoration.

  4. The interaction between entities.

Factors influencing N:

  1. Volume of requests

  2. Repetition rate of requests

What one can imagine is a large network visualization with interacting nodes modeled by weighted edges. This has become a hallmark of Observability marketing propaganda…

I believe that it's necessary to accurately model the system by projecting it as this data structure. However, much of the Observability industry has clung to it as the "finished" product, even when it becomes unintelligible ("network hairball"). Others have tried to add some layer of meaning but feel: "if you don't show it in all its glory, you are missing out." I believe the days of using UI/UX to parse this are over, mostly due to the phase shift we are seeing because of LLMs.

Here lies another problem, however. As C and N increase, so will the resources required to distill. The approach I’m seeing is very top-down, similar to the human SRE approach, which suffers from the same fixation on the holy network visualization. What these systems are doing on this graph structure is running a selective/greedy or full depth-first search for the sake of it. Furthermore, there lies the critical instantiation problem. When should this search trigger? A typical metrics alert? An event?

The above image helps illustrate how boundless this agentic search can become. Worse yet, there is a high chance of missing context (unknown unknowns). The problem with observability isn’t only disjointed data; it's not having the data, as illustrated below.

In this case, a top-down agentic system might get “greedy” and miss edge cases or catastrophically disregard them. It’s worth noting, also, these types of agentic systems are highly incentivized to crave this complexity (“we correlate millions of signals, and charge per run”). This will quickly run opposite to what observability practitioners will want to budget for (efficiency).

This post highlights why we must critically examine the new breed of top-down agents, even if they spur intellectual curiosity. In subsequent posts, we will examine alternative approaches.

More from this blog

O

OMLET: Open Metrics Logs Events Traces

20 posts