EDI Runs Global Trade, and Almost Nobody Can Turn It Into Clean Data

If you have ever shipped, sold, insured, or treated anything at scale, your transaction touched EDI. Electronic Data Interchange is the plumbing of global commerce: the purchase orders, ship notices, invoices, and claims that move between trading partners as terse, delimited text. Trillions of dollars flow through it every year.

And turning that text into clean, analytics-ready structured data is still, in 2026, one of the genuinely unsolved problems in enterprise integration. Teams budget weeks to onboard a single trading partner. A misread quantity or price flows silently into a downstream system and becomes a chargeback, a stockout, or a denied claim.

A vast container port at dusk - long rows of shipping containers and gantry cranes silhouetted against a warm amber sun, the machinery of global trade fading into deep navy shadow

This write up details why EDI is so hard, what the industry has tried, and how Strongly's EDI Translator solves it with a learning system that proposes mappings, never values, and watches the review burden decay toward zero.

The problem: the grammar is easy, the meaning is not

A common misconception is that EDI is hard because the format is arcane. It is not. The grammar of X12 or EDIFACT is well defined, and free, deterministic parsers for it have existed for decades. You can tokenize an interchange into segments and elements in an afternoon.

The hard part is that the standard deliberately refuses to pin down what those segments mean, and delegates that to the trading partners. This is not an accident or an oversight. X12 explicitly permits repeating the same segment with the same qualifier (several N1 party segments, say) and states that what each one means is "incumbent on the trade parties" to agree. The same business fact, the bill-to party, the despatch date, the buyer's item number, lands in a different element for a different partner.

Partner A X12 850

bill-to name → N1*BT loop, element N102

Partner B X12 850

bill-to name → 2nd N1 loop, keyed by a REF qualifier

Partner C EDIFACT ORDERS

bill-to name → NAD+BY segment, component C080

The same business fact lands in a different position for every partner. There is no universal map, only one map per partner.

The practical consequence: every trading partner is effectively a new dialect. There is no universal map from EDI to your data model. There is one map per partner, and discovering it is manual, slow, and brittle. That is why onboarding a partner traditionally takes weeks to months, and why the map breaks quietly the first time the partner changes a version or a convention.

Two things make this worse than an ordinary data-cleaning problem:

Error tolerance ≈ 0

"Usually right" is a failure mode

A transposed qualifier or a quantity off by a decimal is not a cosmetic bug in supply chain or healthcare. It is a wrong shipment or a wrong payment.

Variety is unbounded

Every partner adds a dialect

At least five major dialects, hundreds of transaction sets, qualifiers, loops, and nested hierarchies, multiplied by every partner's local conventions.

How the industry has tried to solve it

The incumbents

Drag-line mappers

Boomi, IBM Sterling, Cleo show the partner's tree and yours, and an engineer draws lines field by field. It front-loads all the effort onto a human, is brittle when the partner drifts, and never gets smarter. Every new partner starts from a blank canvas.

The modern wave

API-first platforms

Stedi and Orderful made onboarding faster with API-driven config and partner guides. A real improvement, but the core task is unchanged: somebody still configures the per-partner mapping, and the breadth of international standards often falls outside the happy path.

The tempting shortcut

Point an LLM at the bytes

Hand the raw EDI to a model and ask for JSON. Several vendors tried it. It fails, and it fails in instructive ways.

Why pointing an LLM at the raw bytes fails

It hallucinates codes and qualifiers, and transposes them.
It produces silently wrong values: a quantity or price that looks plausible and is wrong, which in this domain is the worst possible outcome.
It is nondeterministic: the same input yields different output on different runs, which is disqualifying for an audited financial process.
It produces no compliant acknowledgment (the 997 or 999 the partner expects), and it drowns in context when a partner's implementation guide is hundreds of pages.

The lesson is not that the LLM is useless. The lesson is that pointing it at the part of the problem that must be deterministic, copying the actual numbers, is exactly the wrong place to use it.

How we solved it: the LLM proposes a mapping, never a value

The EDI Translator is built on a single rule that we treat as inviolable:

The inviolable rule

The model proposes a MAPPING - which parsed source position maps to which canonical field, and a confidence. It never emits a value. Deterministic code copies every number, code, and date verbatim from the parsed tree.

1 · Parsed candidates

Real source positions

N1*BT · N102 N1*ST · N102 REF*IT · 02

2 · Model proposes

It chooses an index

{ "candidate": 2,
"confidence": 0.94 }

No value emitted

3 · Deterministic copy

Code copies the value

"ACME DIST CENTER 4"

Verbatim from the tree

4 · Canonical field

Clean, structured

ship_to.name =
"ACME DIST CENTER 4"

the value never passes through the model

This one decision eliminates the failure mode that sank the naive LLM attempts. The model's job is to answer "where does the bill-to name live in this partner's shipment?" by choosing among source positions that already exist in the parsed document. It chooses an index into a list of real candidates. It is structurally incapable of inventing a value, because it is never asked for one. If it picks the wrong position, the value is still copied verbatim from the document, and the mistake shows up as a low-confidence field a human reviews, not as a corrupted number in your warehouse.

Around that rule we built a system that does three more things the incumbents do not:

It calibrates confidence

A confident model that contradicts the history of how this partner has been mapped scores low and routes to review. Self-reported confidence alone is unreliable, so we never trust it on its own.

It learns from every correction

A confirmed or corrected mapping becomes partner-scoped memory. The next document from that partner auto-applies the field with no model call at all. Review volume decays toward zero.

It cleans recoverable values

A comma-grouped amount, an odd date format, a mainframe overpunch, a full-width digit: it proposes a deterministic recipe to cast it cleanly, always preserving the original verbatim for audit.

The calibrator is the quiet heart of it. It folds three orthogonal signals into one score with a geometric mean, so any single weak signal sinks the result. That is exactly what sends a confident-but-contradicted proposal to a human instead of into your warehouse.

Retrieval support

Self-consistency

Squashed confidence

⊗

Geometric mean

0.41

Auto-apply

Route to review

One weak signal (self-consistency) sinks the score below the gate, so the field goes to a human.

How it works, end to end

Deterministic parse

The raw interchange is tokenized into one dialect-agnostic structural tree. Delimiters are read from the envelope, never guessed. An unreadable envelope fails loud with a precise reason rather than producing wrong data. Five dialects collapse into one tree the rest of the system reasons over.

Candidate enumeration

For each canonical field, the engine enumerates the source positions in this document that could plausibly supply it. Every candidate is a real, resolvable location. The model only ever chooses from this list.

Recall of priors

Verified mappings from this partner, and semantically similar mappings from other partners (via in-process embeddings), are pulled in as evidence.

The model proposes, the code scores

The model is sampled a few times for self-consistency and returns a chosen candidate index, a confidence, and a rationale. A calibrator folds three orthogonal signals into one score with a geometric mean: retrieval support (do verified priors agree?), model self-consistency, and squashed self-confidence. Because it is a geometric mean, any single weak signal sinks the score, which is exactly what sends a confident-but-contradicted proposal to a human.

Auto-apply or review

Above the confidence gate, the field auto-applies and the value is copied verbatim. Below it, the field is queued for review. The gate is the operator's to tune, globally or per partner, and it adapts to each partner's track record: a partner with a long clean history earns a slightly looser gate, a contested one a stricter gate.

A review cockpit, not a line-drawing tool

A human sees a read-only structural view of the document on the left and the proposed canonical mapping with a confidence heatmap on the right. They confirm a field, or correct it by clicking a node in the tree or picking from a searchable list. They never draw a line between two trees. The review queue is ordered so the work that teaches the system the most comes first, and a single confirmation can clear the same field across a partner's whole pending backlog.

Learn, and prove it

Every confirmation is partner-scoped memory. A per-partner learning dashboard shows review volume falling and auto-apply rising, the crossover being the proof the system is getting smarter. A calibration read tells the operator whether the gate is too loose (an auto-applied field had to be corrected) or too strict (reviews that just confirm the model), so they can tune it with evidence.

A compliant acknowledgment, deterministically

The system generates the functional acknowledgment the partner expects, X12 997 or 999, a TA1, an EDIFACT CONTRL, or an HL7 ACK, from the validation report alone, with no model in the loop, and it re-parses cleanly. The compliance gap the naive approaches ignored is closed by construction.

What we support

Dialects, one canonical model: X12, EDIFACT, HL7 v2, TRADACOMS, VDA

Transaction-set schemas out of the box, each with canonical fields and metadata

Value-recovery recipes for messy-but-recoverable values, original always kept

Five dialects, one canonical model. X12, UN/EDIFACT, HL7 v2, UK TRADACOMS, and German automotive VDA. The dialect is detected from the envelope, and we will continue to grow this list.

350 transaction-set schemas. From the 850 purchase order and 856 advance ship notice to the 810 invoice, 214 shipment status, 204 load tender, 940/945 warehouse shipping, 210 freight invoice, healthcare claims, and far beyond.

Type and required-field validation. Surfaced in the UI, graded by severity, and in the acknowledgment at the precise AK4 (997) or IK4 (999) element position.

Two-way generation. Regenerate a partner-addressed interchange from canonical JSON, in the partner's own delimiters, with envelope fields validated before assembly.

Acknowledgment reconciliation and transport. Outbound over AS2, SFTP, and HTTPS with retry and backoff; inbound 997 or CONTRL matched by control number and marked accepted or rejected.

Lossless numeric precision. A value becomes a real number only when it round-trips to the exact same string, so leading zeros, values beyond float range, and over-precise decimals stay verbatim. The "silently wrong price" failure mode is structurally prevented.

Reconciliation and scale. Per-partner learning memory, drift detection, three-way reconciliation across PO, ASN, and invoice, multi-transaction-per-file handling, bulk upload, and exports as JSON, NDJSON, XML, and CSV (with formula-injection hardening).

Tunable, globally or per partner. The auto-apply gate, gate adaptation, review queue order, within-document field order, and whether a confirmation propagates across a partner's backlog are all settings, because the operator knows their partners better than we do.

The outcome

The drag-line era asked a human to do all of the mapping work up front, forever. The naive LLM era asked a model to do the one job, copying the actual values, that must never be left to a model. We do neither.

The drag-line era

Humans map everything, up front

All the effort before the first document, brittle on drift, and it never gets smarter.

The naive LLM era

The model copies the values

It is handed the one job, emitting the actual numbers, that must never be left to a model.

The EDI Translator

Each layer does its own job

Deterministic code is exact. A calibrated model judges. A human reviews only the genuinely uncertain.

Deterministic code handles what must be exact. A calibrated model handles what is genuinely a judgment call, choosing among real candidates, never inventing. A human reviews only what is genuinely uncertain. And every correction makes the next document easier, until a mature partner needs almost no review at all.

Review volume Auto-apply rate

That is what it looks like to treat EDI as an intelligent parsing problem instead of a configuration chore: faithful structure, calibrated confidence, a human in the loop only where it counts, and a system that earns its way to zero.

“

Deterministic code handles what must be exact. A calibrated model handles the judgment call. A human reviews only what is genuinely uncertain. And every correction makes the next document easier.

See it on a partner that has been painful to onboard

Bring a real interchange. We will show you the parse, the proposed mapping, and the calibrated confidence on your own files - and where the review burden earns its way to zero.

Schedule a walkthrough

EDI Runs Global Trade And almost nobody can turn it into clean data. Here is how we did.