AI Agent Misidentifies Diagram Elements When Given Raw SVG: The Rendering vs. Semantics Problem
TLDR
An MCP tool that returns raw SVG causes the AI agent to misclassify elements and infer connections from visual proximity. The fix: return a structured JSON payload that separates component types, positions, and explicit connection records from the rendering artifact.
Symptom
An AI agent is asked to list all resistors in a circuit diagram via an MCP tool call. The tool returns the diagram's SVG. The AI returns a list that includes wire junctions (small circles) as resistors, misses resistors in group elements, and describes connections based on which components are visually close rather than which are actually wired.
Why It Happens
SVG encodes rendering, not semantics. A resistor symbol is a <g> element with a specific set of child paths. A wire junction is a <circle> element. A connection is an SVG <path> that happens to start at one component and end at another, with no explicit "this wire connects these two pins" record.
An LLM parsing SVG must infer semantic meaning from visual structure. The inference is unreliable because:
- Group elements (
<g>) may or may not be diagram components. They might be layers, system groups, or editor UI. - Visual proximity is used as a proxy for connectivity, but components near each other are not necessarily connected.
- Class names used to identify component types are not standardized. A resistor class in one file is
domain-symbol resistor, in another it iselectrical-component r-type.
The Fix
Return a structured payload instead of SVG:
// wrong: returns rendering artifact
function handleReadDiagram() {
return { content: [{ type: 'text', text: svgEl.innerHTML }] };
}
// correct: returns semantic payload function handleReadDiagram() { const payload = { schema: 'diagram-v1', components: getComponents().map(c => ({ id: c.id, type: c.symbolType, // 'resistor', 'capacitor', 'ic' label: c.label, position: { x: c.cx, y: c.cy }, properties: c.properties, // { value: '10k' } })), connections: getWires().map(w => ({ from: { id: w.startComponent, port: w.startPort }, to: { id: w.endComponent, port: w.endPort }, })), }; return { content: [{ type: 'text', text: JSON.stringify(payload, null, 2) }] }; }
With this payload, the AI receives: type: 'resistor' for every resistor (no inference from class names), and explicit from/to records for every connection (no inference from visual proximity). Element identification is exact. Connection identification is exact.
How to Prevent It
Before wiring any AI tool integration, write a test: call the tool, parse the response, and assert that the AI can correctly answer "how many resistors are there?" and "is component A connected to component B?" If the tool returns SVG and either answer requires SVG parsing to verify, the tool is returning the wrong format.
The Generalizable Lesson
AI agents work on text representations of data. Rendering artifacts (SVG, HTML, bitmap descriptions) are optimized for display. Domain-semantic representations (JSON with typed objects and explicit relationships) are optimized for reasoning. Use the format optimized for the consumer, not the format native to the producer.