Schema Editor

Classifying SVG Shapes by Geometry, Not by Class Names

2026-06-04

TLDR

Classifying SVG shapes by tag name or CSS class couples the analyzer to your authoring conventions. Two shapes that look identical but were drawn by different tools will classify differently. Isoperimetric quotient (circularity) and linearity ratio derived from path geometry classify correctly across all tools and domains, with no domain-specific rules.

The Problem Class

Any developer parsing SVG diagrams, building a topology analyzer, or extracting structure from vector graphics eventually faces shape classification: given an SVG element, is it a wire, a component, a connector, or something else?

The instinctive answer is to use the information the author put in: class names, element IDs, custom data attributes. A <path class="wire"> is a wire. A <g data-symbol="resistor"> is a component. This works for diagrams your own tool produced. It fails the moment you process diagrams from any other source.

The Naive Approach

Pattern-match on tag name and class:

function classify(el) {
    if (el.classList.contains('wire')) return 'wire';
    if (el.classList.contains('component')) return 'component';
    if (el.tagName === 'circle') return 'connector';
    return 'unknown';
}

This is fast, simple, and correct for a single controlled environment. It breaks for three common cases: diagrams imported from other tools, diagrams drawn in a different domain (electrical vs. construction), and diagrams where the class naming convention changed between versions.

Why It Breaks

Class-based classification is correct only when every shape is guaranteed to have the expected classes. This guarantee holds only in a closed system where all diagrams are produced by one tool version. In practice: users import SVG from Inkscape, Illustrator, or KiCad; they use domains with different class naming; they open files from before a refactor.

In each case, the class-based classifier returns 'unknown' for shapes it cannot recognize. Unknown shapes are excluded from topology analysis. Connectivity is incomplete. The BOM is wrong.

The more fundamental problem: class names are author intent, not geometric truth. An element can be mis-classed. A wire drawn without the correct class name is still a wire. The geometry makes it a wire; the class name does not.

The Better Model

Derive the classification from the shape's geometry. Two metrics cover the most common diagram element types:

Isoperimetric quotient (circularity): For a closed shape with area A and perimeter P, circularity = 4πA/P². A perfect circle has circularity 1.0. A square has 0.785. A long thin rectangle has near 0. Connectors (dots, junctions) are nearly circular; components (rectangles, complex symbols) are moderately circular; wires are near 0.

Linearity ratio: For a path from first to last point, linearity = endpoint-span / total-path-length. A straight line has linearity 1.0. A curved path has less. A closed loop has near 0. Wires are straight or nearly straight: linearity > 0.85.

function classifyByGeometry(pathPoints, isClosed) {
    if (pathPoints.length < 2) return 'unknown';
const length = pathLength(pathPoints);     if (length === 0) return 'unknown';
const span = Math.hypot(         pathPoints.at(-1).x - pathPoints[0].x,         pathPoints.at(-1).y - pathPoints[0].y     );     const linearity = span / length;
if (linearity > 0.85 && !isClosed) return 'wire';
if (isClosed) {         const area = Math.abs(signedArea(pathPoints));         const circularity = (4  Math.PI  area) / (length * length);         if (circularity > 0.65) return 'connector'; // junction dot         return 'component'; // rectangular or complex symbol     }
return 'component'; // open non-linear path }

This function takes path points (derived from the element's geometry, not its class) and returns a classification that is correct for any SVG, regardless of origin.

The pipeline that feeds it: parse the path d attribute into commands, flatten Bezier curves to polylines using de Casteljau subdivision, apply any transform attributes to get points in global coordinate space. The classification then runs on the normalized points.

For multi-pass analysis: run geometry classification first, then overlay domain-specific rules as a refinement pass. If an element has a data-symbol attribute that says "resistor," trust it. If it does not, the geometry classification provides a reasonable fallback. This makes the system degrade gracefully rather than silently failing.

Tradeoffs

Geometry classification requires computing path length and area, which involves iterating over path points. For a diagram with 1000 elements, this takes a few milliseconds. Class-based classification is O(1) per element. For real-time classification during drawing, you want the fast path. For batch analysis on load, the geometry path is acceptable.

Geometry classification also has edge cases: a very short wire (two adjacent grid points) has near-zero area and moderate linearity. A thick symbol outline can have high linearity. Tuning the thresholds for your domain's typical shapes takes some experimentation.

The One Thing to Watch For

Path lengths change dramatically after transform application. A path with transform="scale(0.1)" looks small visually but its raw d attribute has large coordinates. Always apply the transform before computing geometry metrics. Use DOMMatrix for this: parse the transform string, convert each point, then classify.

Read this post in the full Engineering Journal →