Table Formatter

Why I Built a QC Pipeline Tab Into My Table Editor

2026-05-11

Most data problems aren't problems with the data itself. They're problems with what you assumed about it.

You get a CSV export from a form tool, a spreadsheet someone filled out manually, or a table scraped from a government site. You load it. It looks fine at a glance. You pass it to the next step in your pipeline. Then something downstream breaks, and you spend an hour tracing it back to a blank email field on row 217, or a revenue figure that somehow came back as -9000 because of a formula error nobody caught.

The data was never clean. You just didn't check.

The Problem With "Just Write a Script"

The standard advice here is to write a validation script. And that advice is fine, technically. But it has a practical problem: you write it once, for that specific file, and then you throw it away. The next time you get a similar export, you start over.

What you actually need isn't a script. It's a pattern. A set of reusable checks you can compose and run against any tabular data before it goes anywhere.

That's what I built into TAFNE as Lab Mode.

What Lab Mode Is

Lab Mode is a pipeline tab inside TAFNE, a browser-based table editor I've been building as part of a larger data engineering project. The tab has three phases that run in sequence: Validate, Transform, Analyze. VTA.

You add steps to a pipeline. You run it. The result pane shows you what happened.

Lab Mode running a validate + filter pipeline on a contact list

In Validate mode, you're flagging problems without changing anything. There are eight functions:

flagEmpty: marks rows where a column is blank
flagDuplicate: marks rows where a value isn't unique
flagOutOfRange: marks rows where a numeric value falls outside a defined range
flagPattern: marks rows that don't match a format (email, ISO date, reference designator, etc.)
flagCrossColumn: marks rows where one column's value fails a condition relative to another (for example, Total doesn't equal Qty times Price)
flagMissingAny: marks rows where any column in a given list is empty
warnDuplicate and warnOutOfRange: same as their flag equivalents but at warning level, so they show up yellow instead of red

Multiple Validate steps accumulate. They don't overwrite each other. You can run flagEmpty on Email, then flagOutOfRange on Revenue, and the result pane shows both sets of flagged rows at the same time, with inline badges describing which check each row failed.

In Transform mode, you're reshaping the data. Filter rows, rename columns, sort, split a column on a delimiter, fill empty cells with defaults, merge duplicate rows by a key. Eleven functions, each composable with the others. Each step receives the output of the step before it.

In Analyze mode, you're summarizing. Count rows by a column value. Sum or average a column. Get a frequency distribution. Run a pivot-style aggregation that groups by one column and measures several others.

Why Pure Functions

Every function in the library is pure. It receives rows and params as arguments, and it returns a result. No global state. No DOM access. No network calls.

This wasn't an aesthetic choice. It was a practical one.

When functions are pure, the pipeline executor can run them in a simple loop without worrying about side effects between steps. A Validate function that runs midway through a Transform pipeline is flagging issues in the partially-transformed data, which is exactly what you want when you're debugging a pipeline. You can read the step list from top to bottom and reason about what the data looks like at each stage.

It also makes the functions trivially testable. flagEmpty(rows, { column: 'Email' }): you give it rows, it gives you flags. That's the entire contract.

flagEmpty: function(rows, params) {
    const flags = [];
    rows.forEach((row, i) => {
        const val = row[params.column];
        if (val === undefined || val === null || String(val).trim() === '') {
            flags.push({
                rowIndex: i,
                message: Empty: ${params.column},
                level: 'error'
            });
        }
    });
    return flags;
}

The Performance Story

One thing that surprised me while building this: performance wasn't a problem. A 10,000-row table running 10 pipeline steps completes in under 50 milliseconds on any modern browser.

The functions are array operations on plain JavaScript objects. There's no async needed. No loading spinner. No web worker delegation. The browser just does it.

This only holds because the functions are pre-defined. Lab Mode doesn't execute arbitrary code, the step picker gives you a fixed set of operations from a dropdown. If it allowed free-form script input, you'd need a sandbox, and sandboxing has real costs.

The grayed-out "Custom Script (requires GINEXYS Cloud)" tile in the function picker is a deliberate boundary. Free pre-defined functions that run locally. Custom scripts when you need them, in a sandboxed environment, as a paid tier. The security model and the business model reinforce each other.

The Workflow in Practice

The typical use case looks like this:

You paste or import a messy table. Switch to Lab Mode. Add a flagEmpty step on whichever columns should never be blank. Add a flagPattern step to check email format. Run. Review the flagged rows and decide what to do with them. Add a filterRows step to drop the unfixable ones. Add a fillEmpty step to give defaults to the ones worth keeping. Run again. Submit the cleaned result as a new sheet.

The original data is untouched. The pipeline is reproducible. The next time you get a similar export, you can reconstruct the same steps in about 30 seconds.

That's the QC station the data pipeline was missing.

TAFNE is open source and free: github.com/carnworkstudios/TAFNE

Read this post in the full Engineering Journal →