What a 3am procurement analysis looks like (and why it matters)

Last Tuesday, one of our test users uploaded procurement documents at 4 in the afternoon. A large multi-lot tender — 3 lots, 2 bidders per lot, combined documentation running to about 600 pages.

At 3am, while everyone involved was asleep, the AI agent was on its second bidder for lot three. Methodically comparing the bidder's proposed project team qualifications against the RFP requirements for that lot. Pulling quotes. Flagging a missing certification for one of the proposed subcontractors.

By 7am, the full analysis was done. Findings organized by lot, by bidder, by severity. Every issue linked to specific pages in both the RFP and the proposals. The procurement team walked in to a complete evaluation package.

There's something both useful and slightly unsettling about that. An analysis running at 3am, nobody watching, churning through documents in the dark. It's not romantic. It's not what anyone pictures when they think about "the future of procurement." But it's genuinely useful.

The always-on question

The concept of AI availability isn't new. Chatbots have been "available 24/7" for years. But there's a difference between a chatbot waiting for your question and an agent actively working through a complex analysis while you're not there.

A procurement chatbot answers: "What are the qualification requirements for Lot 2?" An agent goes through every qualification requirement for every lot, compares them against what every bidder actually submitted, and delivers a finding for each one. Without being asked.

The 3am part isn't the point. The point is that the analysis runs to completion regardless of business hours, team availability, vacation schedules, or competing priorities. The system doesn't have a Friday afternoon mode where attention drops.

This matters more for public procurement than for most other domains. Public procurement has deadlines. Hard ones. Miss the evaluation deadline, and you're rebidding or requesting extensions — both of which cost time and credibility.

What "tireless" actually means in practice

We use the word "tireless" in our marketing, and I want to be specific about what that means in practice, because "tireless AI" sounds like a buzzword.

A human evaluator doing focused document comparison — the kind where you're reading requirement A, then reading proposal section B, then comparing them — can sustain about 3-4 hours before quality drops measurably. After 6 hours, you're skimming. After 8 hours, you're making errors you won't catch until someone else reviews your work.

The AI agent doesn't have that curve. Its comparison of criterion 45 is as rigorous as criterion 1. Its reading of bidder 5 is as thorough as bidder 1. There's no fatigue penalty.

That's not a feature you appreciate when you have one small procurement to evaluate. It's a feature that changes your capacity when you're handling a large, complex tender with tight timelines and multiple lots.

The verification layer (because trust is earned)

Here's the part I find genuinely interesting about building this kind of system: you can't just have the agent analyze and trust its output. That would be irresponsible.

So we built a verification layer. A separate AI model — different from the one doing the main analysis — reviews every finding flagged as critical. It checks whether the evidence actually supports the finding. It verifies that the quotes are accurate. It confirms that the severity assessment makes sense.

Roughly speaking: the analysis agent does the work, and the verification agent checks the work. Two different models, so they don't share the same blind spots.

Is this perfect? No. Nothing is. But it's a better quality control process than most manual evaluations have, where one person reads and one person maybe reviews if there's time.

The single-bidder problem, from a different angle

Latvia has a well-documented single-bidder problem. Out of over 182,000 active companies, only 4,321 unique companies won procurement contracts in 2024. That's 2.4%. The Foreign Investors Council (FICIL) made two public interventions in 2025 — June and November — calling for "urgent" reform and flagging favoritism concerns.

There are many causes — market size, specification design, industry concentration. But here's one angle that doesn't get discussed enough: the evaluation burden itself discourages participation.

When suppliers know that a complex tender means months of preparation and that 73% of contracts go to the lowest bidder anyway, the risk-reward calculation for smaller companies collapses. Why invest weeks preparing a quality proposal when evaluation is often just a price comparison?

More transparent, more thorough, more evidence-based evaluation could, over time, help shift that balance. When bidders know their technical strengths will actually be read and weighed — all of them, properly — the calculus changes. Quality-focused companies start competing again.

This isn't a quick fix. It's a structural argument. But it's one worth making.

The part we're still figuring out

I want to be honest about something. This technology is new, and we're still learning what it does well and where it stumbles.

Multi-language documents — where the RFP is in Latvian and some supporting documentation is in English — can create friction. The AI handles both languages, but nuances in Latvian legal terminology sometimes need human review.

Very large procurements with 10+ lots and hundreds of criteria push the system's capacity. The analysis runs longer, and the cost in AI tokens goes up. We're working on optimization, but there's a real computational cost to thorough analysis.

And occasionally, the agent flags something as a concern that turns out to be a non-issue when a domain expert reviews it. False positives happen. The goal is to minimize them without missing genuine issues — and that's a constant balance.

We'd rather the system over-flags than under-flags. A human expert can quickly dismiss a false positive. They can't find an issue that was never flagged.

Why we built this

I get asked sometimes why we built an AI agent specifically for procurement evaluation, of all things. It's not exactly a glamorous market.

The answer is pretty simple: EUR 2 trillion in public money flows through EU procurement every year. In Latvia alone, it's EUR 5.45 billion. How that money gets spent affects infrastructure, healthcare, education, defense — basically everything a government does.

And the people responsible for evaluating how it gets spent are working with PDFs and spreadsheets, under constant time pressure, with no way to read every page of every proposal.

That felt like a problem worth solving.

Back to blog