Engineering Journal
Table Formatter
Table Formatter

Seven Parsers, One Interface: How TAFNE Handles Every Table Format

2026-05-11

The first question most people ask when they see TAFNE accept HTML, CSV, TSV, Markdown, JSON, ASCII art tables, and SQL INSERT statements from the same input field is: how does it know which one you pasted?

The short answer: it doesn't always. You tell it. Or the file extension tells it. Or for unknown text files, it makes a content-based guess.

The longer answer is that each of these seven formats requires a genuinely different parsing approach, and the differences are interesting.

The Dispatcher

The entry point is a switch statement in parseInput():

switch (inputType) {
    case 'html':     tableHtml = parseHtmlInput(inputData); break;
    case 'ascii':    tableHtml = parseAsciiInput(inputData); break;
    case 'csv':      tableHtml = parseCsvInput(inputData); break;
    case 'text':     tableHtml = parseTextInput(inputData); break;
    case 'markdown': tableHtml = parseMarkdownInput(inputData); break;
    case 'json':     tableHtml = parseJsonInput(inputData); break;
    case 'sql':      tableHtml = parseSqlInput(inputData); break;
}

Every branch takes the same input: a raw text string. Every branch produces the same output: an HTML table string. What happens in between is completely different.

Two Parsing Strategies

The seven parsers split into two fundamental camps.

Semantic parsers rely on explicit structure. The format declares its own structure using tags, keys, or reserved keywords. The parser just has to read that declaration.

Heuristic parsers rely on pattern recognition. The format is a convention, not a formal specification. The parser makes educated guesses based on what it sees.

HTML and JSON are semantic. CSV, TSV, and ASCII are heuristic. Markdown and SQL sit in the middle.

The Semantic Parsers

The HTML parser uses a regex to find table elements:

const tablePattern = /<table[\s\S]*?<\/table>/gi;
const matches = html.match(tablePattern);

The structure is already there. The parser's job is to extract and normalize it, not reconstruct it.

The JSON parser reads explicit keys and values:

let data;
try { data = JSON.parse(json); } catch (e) { ... }
const headers = Object.keys(data[0]);

The column names are the object keys. The rows are the array elements. No guessing involved. If the input is valid JSON, the structure is unambiguous.

The JSON parser also handles a common real-world case: JSON that wraps the array in an outer object. If JSON.parse returns an object rather than an array, the parser looks for the first key whose value is a non-empty array:

const arrayKey = Object.keys(data).find(k => Array.isArray(data[k]) && data[k].length > 0);
data = arrayKey ? data[arrayKey] : [];

This handles API responses that return { "results": [...] } or { "data": [...] } without requiring the user to drill into the JSON manually.

The Heuristic Parsers

The CSV parser assumes commas separate columns and newlines separate rows. That's it.

const cells = line.split(',').map(cell => cell.trim().replace(/^"|"$/g, ''));

Quotes are stripped from around cells. The first row is assumed to be headers. This works for the vast majority of real CSV files, but it will fail on CSV files that contain commas inside quoted fields (a common edge case in RFC 4180-compliant CSV). The current parser treats every comma as a delimiter regardless of quoting context. For most practical use cases, this is fine.

The ASCII parser skips separator lines:

if (line.includes('+---') || line.includes('+===')) return;
const cells = line.split('|').filter(cell => cell.trim());

ASCII tables use +---+---+ for horizontal separators and | val | val | for data rows. The parser identifies separators by looking for the +--- pattern, skips them, and splits data rows on pipes.

The State Machine: Markdown

Markdown tables have three types of lines: the header row, the separator row, and data rows. The separator row is syntactically distinct but carries no data.

if (/^\|?[\s|:\-]+\|?$/.test(line)) return; // separator

const headerDone = false; // ... tableHtml += headerDone ? &lt;td&gt;${cell}&lt;/td&gt; : &lt;th&gt;${cell}&lt;/th&gt;; headerDone = true;

The headerDone flag is a minimal state machine. Before processing the first data row, cells become <th>. After, they become <td>. The separator line is identified by a regex that matches lines containing only pipes, spaces, colons, and dashes.

The Hunter: SQL

The SQL parser is the most technically specific. It uses a capturing regex to scan for INSERT statements:

const insertRe = /INSERT\s+INTO\s+\S+\s\(([^)]+)\)\sVALUES\s*\(([^)]+)\)/gi;
let match;
while ((match = insertRe.exec(sql)) !== null) {
    if (!headers) {
        headers = match[1].split(',').map(c => c.trim().replace(/["']/g, ''));
    }
    const vals = match[2].split(',').map(v =&gt; v.trim().replace(/^'|'$/g, ''));
    rows.push(vals);
}</code></pre>

The exec method called in a loop advances the regex cursor after each match. Column names come from the first match's first capture group. Values come from every match's second group.

Single quotes around values are stripped. Escaped single quotes ('') are converted back to single quotes. The result is clean cell values, one row per INSERT.

Auto-Detection for File Loads

When a file is loaded instead of pasted, the format is detected from the extension. For .txt` files and unknowns:

if (text.includes('\t')) {
    tableHtml = parseTextInput(text);
} else if (text.includes(',')) {
    tableHtml = parseCsvInput(text);
} else {
    tableHtml = parseTextInput(text);
}

Tabs win over commas. If neither is present, the tab-delimited parser handles it anyway, which will at least produce a single-column table from the line breaks.

Seven formats. One interface. The parsers are the wall between "raw text someone pasted" and "a table you can edit."

Source: github.com/carnworkstudios/TAFNE

Read this post in the full Engineering Journal →