Let me describe a scenario that every procurement professional will recognize.
You're evaluating a major tender. Five proposals, each between 100 and 300 pages. The RFP has 45 evaluation criteria across 3 lots. Your team has two weeks, and this isn't the only procurement on your desk.
You start strong. Proposal one, criterion one. You read carefully, cross-reference with the requirements, take notes. By the third proposal on the twelfth criterion, you're reading faster. By the second day, you're scanning for keywords rather than reading sentences. By the end of the week, the annexes are getting a look only if something in the main body raised a flag.
This isn't a failure of professionalism. It's a failure of biology. Human attention is a depleting resource. We know this from decades of research on cognitive fatigue, and we see it play out in procurement evaluation every single day.
The math nobody wants to talk about
Let's be specific. A typical complex procurement with 5 bidders and 40 criteria means 200 individual evaluations (5 bidders x 40 criteria). Each evaluation requires reading the relevant proposal section, understanding the requirement, comparing the two, and documenting a finding.
If each evaluation takes just 15 minutes — which is generous for technical criteria — you're looking at 50 hours of pure evaluation work. For one procurement.
Now multiply that by the number of procurements a team handles per quarter. Latvia processed 11,421 procurement procedures in 2024, resulting in 21,558 contracts. And here's the stat that tells the real story: 73% of those contracts were awarded on lowest price alone. Not because quality doesn't matter, but because proper quality evaluation requires time that doesn't exist.
Something always gives. Usually it's depth.
What "thoroughness" actually costs
Here's what bothers us about how AI in procurement is typically marketed: the focus on speed. "Evaluate proposals 10x faster!" "Cut evaluation time by 80%!"
Speed isn't the point. Thoroughness is.
When a procurement specialist spends 4 hours evaluating a bid manually, the problem isn't that 4 hours is too long. The problem is that 4 hours isn't enough to read 200 pages carefully, cross-reference against 40 requirements, and document every finding with evidence.
What you actually want is 12 hours of thorough analysis. You just can't afford 12 hours per bid when you have 5 bids and other procurements queued behind this one.
An AI agent that analyzes a proposal in 2 hours isn't valuable because it saved 2 hours. It's valuable because it read every page with the same level of attention — something that's structurally impossible for a human evaluator working under real-world constraints.
The types of things that hide on page 200
We've been running AI evaluations on real procurement documents for months. Certain patterns keep showing up in what the agent catches that manual evaluation typically doesn't:
Contradictions between sections. A bidder commits to 99.9% uptime in the executive summary but defines "uptime" differently in the SLA annex, effectively lowering the commitment to 99.5%.
Missing specifics behind general claims. "Our team has extensive experience in similar projects" without naming a single project, reference, or relevant credential.
Partial compliance dressed as full compliance. The requirement asks for ISO 27001 certification. The bidder says they "follow ISO 27001 practices" — which is not the same thing as being certified.
Pricing inconsistencies. A line item in the financial proposal that doesn't account for a service described as included in the technical proposal. Or per-unit prices that don't multiply correctly to the totals.
Condition language. "Subject to..." or "assuming that..." clauses that quietly limit what the bidder is actually committing to, buried deep in technical specifications.
None of these are things evaluators are incapable of catching. They're things evaluators are unlikely to catch under normal time pressure, because catching them requires reading multiple sections with perfect recall and comparing language precisely.
The uncomfortable implication
If we accept that human evaluators structurally cannot give equal attention to every page of every proposal — and we should accept this, because it's simply how attention works — then we have to ask what that means for the integrity of procurement decisions.
Not every missed issue changes the outcome. But some do. A hidden condition that limits a commitment. A compliance claim that doesn't hold up under scrutiny. A pricing inconsistency that understates the real cost.
When public money is at stake, "we probably caught the important stuff" is a weaker position than "we can show you exactly what was checked and what was found."
What this means in practice
We're not arguing that AI should evaluate bids. We're arguing that AI should read bids — fully, carefully, with evidence — so that procurement specialists can evaluate based on complete information rather than whatever they had time to catch.
The specialist's job is judgment. Whether a finding is material. Whether a gap can be addressed. Whether the risk is acceptable. Those are human calls that require experience, context, and professional discretion.
The AI's job is making sure those calls are based on everything in the documents, not just what someone had time to read before the deadline.
That's not about replacing anyone. It's about fixing a structural problem that everyone in the profession knows exists but has had no way to solve. Until recently.