How AI Reads Insurance Documents (And Where It Still Gets It Wrong)

Split screen showing raw policy PDF text on left, structured data output on right

NLP applied to legal text has real limits. We tested our own engine against edge cases in 30 UK policy schedules and documented where accuracy drops. This is that report, written for an audience that understands roughly how language models work but doesn't need the academic citations.

What the Engine Actually Does

When you upload a policy document to Rehuman, the pipeline runs through four stages. First, OCR converts the PDF to text — straightforward for digital-native PDFs, error-prone for scanned certificates. Second, a classifier identifies the document type: combined buildings and contents, motor, life, travel, income protection, or public liability. Third, an extraction model pulls structured data fields: insured name, policy number, coverage limits, excess amounts, renewal dates, named exclusions, and endorsements. Fourth, a normalisation layer maps insurer-specific terminology to our standard schema.

Stages one and two are essentially solved. Our document type classifier reaches 99.3% accuracy across the 12 policy types we support, and OCR error rates on digital PDFs are negligible. The interesting problems are in stages three and four.

Where Extraction Works Well

Policy numbers, insured names, renewal dates, and basic coverage limits are reliably extractable from standard UK policies. These fields follow predictable patterns and appear in consistent locations across most insurer formats. Aviva, AXA, Direct Line, Admiral, and LV= all use document structures that our extraction model trained on accurately — precision on these fields runs above 97%.

Sum insured figures are also reliable when they appear in tables. Tabular data is structurally predictable. Our model reads a premium schedule table and extracts £250,000 buildings cover or £50,000 contents cover with near-certainty.

Renewal date extraction has one known failure mode: policies that quote the renewal date in words ("your policy will expire on the fifteenth of August two thousand and twenty-five") rather than a numeral. This pattern appears in less than 3% of documents but produces an error when it does.

Where It Gets Complicated: Named Exclusions

Named exclusions are the hard part. A "named exclusion" is a specific carve-out from coverage: a pre-existing medical condition in a travel policy, a specific item excluded from a contents policy, a property defect noted at inception. These appear in different sections depending on the insurer, use wildly inconsistent language, and sometimes only exist in an endorsement addendum rather than the main policy body.

In our test set of 30 policies, our engine correctly identified 87% of named exclusions. The 13% misses were almost entirely in two categories: exclusions embedded in continuation sentences ("the insured property is covered for fire, water damage, and theft except where the property has been..." followed by a page break) and exclusions that appear only in the policy endorsement rather than the schedule.

The continuation sentence problem is a parsing artefact. PDF-to-text conversion doesn't reliably preserve reading order when text wraps across columns or pages, and our extraction model loses the clause boundary. We've reduced this error by reordering the text before extraction, but edge cases remain.

The Endorsement Problem

UK insurance policies frequently consist of a base policy wording (a standard document used for all policies of that type) plus one or more endorsements that modify the standard terms for a specific insured. The endorsement might add cover, restrict cover, or replace a specific clause entirely.

Our document pipeline treats each PDF as a single document. When a policy includes an endorsement as a separate attachment or as a distinct section with its own header, we parse it correctly. When an endorsement is embedded mid-document without a clear structural marker — which happens in approximately 18% of commercial policies we've processed — it can be missed entirely.

This is the biggest accuracy gap in our current system on commercial policies, and it's what drives the difference between our 94.1% accuracy on standard consumer policies and the ~81% we see on bespoke commercial documents. The commercial sector is next on our engineering roadmap.

Condition Clauses and Implied Exclusions

Some of the most consequential policy terms are not explicit exclusions at all — they're conditions that must be met for cover to apply. A burglar alarm warranty ("cover is conditional on alarm being set between 10pm and 6am") doesn't look like an exclusion in a naive text search. Neither does a security condition ("windows must have approved locks") or a maintenance requirement ("heating system must be serviced annually").

We classify these as "condition flags" and surface them separately from named exclusions. Our extraction of condition clauses currently runs at around 79% recall — we find four out of five, and miss one in five. The ones we miss most often are conditions embedded in the definitions section ("burglar alarm means a device that is activated and maintained in working order"). That phrasing pattern is on our training roadmap for Q2 2025.

When Plain-English Questions Go Wrong

The Q&A interface — where you ask "Am I covered if my laptop is stolen from my car?" — is built on a retrieval-augmented generation model that pulls the relevant policy clauses and synthesises an answer. This approach works well when the relevant clause is clearly identifiable and the question maps directly to it.

It fails in three scenarios. The first: when the answer requires combining two or more clauses, neither of which alone answers the question. A question about coverage for a home gym involves the personal property limit, the away-from-home extension, and potentially a sporting equipment sub-limit. Retrieving only one of these gives an incomplete answer.

The second: when the question contains an implicit assumption the policy doesn't share. "Am I covered if a contractor damages my property?" assumes the user means third-party liability cover, but the policy may not call it that, and if the retrieval step returns the "damage by the insured" clause instead, the answer is wrong.

The third: when the policy genuinely doesn't say. Many UK policies are silent on specific edge cases because they were drafted for a prior use pattern. When our model can't find a relevant clause, it currently says "I can't find a specific clause covering this — contact your insurer directly." That's the right answer. A confident wrong answer is worse.

What This Means for You as a User

For standard UK consumer policies from the major insurers, Rehuman's extraction is accurate enough to rely on for understanding your coverage picture. For commercial policies, niche products (specialist liability, high-value art, agricultural), or policies with complex endorsement structures, treat the extracted data as a first pass and verify significant items against the original document.

We show confidence indicators on every extracted field. A field marked green was extracted from clear structured text. A field marked amber was inferred from less structured prose. We don't mark anything red and display it — if we're not confident, we don't show the value, and we tell you the field wasn't found.

The AI Q&A interface shows the specific policy clause it drew from in every answer. If the source clause doesn't look relevant to your question, that's a signal to verify manually. We'd rather you double-check than trust an answer that doesn't hold up.

Where We're Going Next

Our Q2 2025 engineering priorities are: improving condition clause extraction, building a specific model for endorsement detection in commercial policies, and reducing the continuation sentence parsing errors in complex multi-page schedules. We'll publish accuracy benchmarks quarterly.

The honest position is that AI-assisted policy analysis is useful but not flawless, and the margin matters most in the edge cases — which are usually the ones that produce denied claims. That's why we show source clauses, confidence ratings, and uncertainty indicators. Making the AI's limitations visible is part of what it means to build a tool people can actually trust.

See what your policies actually say

Upload any UK insurance document and ask plain-English questions about your coverage.

Try the AI Analyser