From Vibes to Ground Truth: Why Procurement Savings Claims Fail Finance Scrutiny

Coffee cups, crumpled paper, and pencil on table symbolising unclear procurement savings claims and financial scrutiny in Melbourne

Why most procurement decisions cannot survive a serious question from finance.

The short version. Most procurement savings claims fall apart under a finance review. The reason is foundational: spend data is partial, supplier records are stale, baselines are missing, and the working is held in someone’s head. This article shows what a structured data foundation looks like in supplier risk, savings verification, and category strategy – and quantifies what running on intuition actually costs.

Procurement had claimed $14.2m of savings for the financial year. Finance was prepared to credit $3.1m, with another $2.4m if procurement could produce baselines. The remaining $8.7m, roughly sixty-one percent of the claim, went into a category called “unverified” and stayed there.

The room was a savings review at a mid-market professional services group, $680m of addressable spend across legal panels, contingent labour, technology, and facilities. Forty-three minutes in, the CFO had asked one question. “Of that fourteen point two, how much landed in my P&L?” The pause that followed was the kind that finishes careers.

Both sides were convinced they were right. Neither could prove it.

This is the gap I want to talk about. It is not a tooling gap. It is not a talent gap. It is a gap in the data foundation underneath the function. In most procurement organisations I work with, decisions are recorded as outcomes, not as evidence chains. The contract is signed, the supplier is selected, the saving is claimed, and the only artefact left behind is the conclusion. The reasoning, the alternatives, the assumptions about the market, the baseline price, the index reference, the risk weighting, all of it lives in someone’s head, in a sourcing lead’s working file, or in a slide deck that nobody can find six months later.

When the CFO, the audit committee, or the regulator asks “show me the working”, procurement reaches for the deck and finds it does not answer the question.

That is the difference between vibes and ground truth.

The Procurement Data Foundation Problem

Procurement runs, in the main, on opinion dressed as analysis. I do not say that to be unkind. I say it because the operating data underneath most functions is genuinely incapable of supporting anything else. Spend data is partial. Supplier master data is stale. Contract metadata is missing or unstructured. Index references are not linked to the categories they should be benchmarking. Should-cost models, where they exist, are spreadsheets owned by one analyst who left two years ago.

In that environment, a category manager has three choices. Act on intuition dressed up as expertise. Borrow conviction from a supplier or a consultant. Or stop and try to build the evidence base from scratch, which the calendar will not permit. So they choose option one or two, they make defensible-sounding decisions, and the decisions go onto a slide.

The slide does not record what was missing. That is the trap.

What you want is a function where any decision, taken on any day, can be reconstructed later from the data alone. Why this supplier and not the other two. What the baseline was. What the market was doing. What the risk profile looked like. What the should-cost said. What changed between sign-off and renewal. Without that foundation, you do not have a procurement function. You have a sequence of judgements that happened to produce contracts. Some of those judgements were good. You cannot tell which.

Structured data is what gives procurement interpretability – the ability to look inside a decision after the fact and ask whether it held up. It is the only way the answer to “show me the working” exists at all. Most functions discover where this foundation actually sits on the maturity curve only when they run a structured procurement maturity assessment against peers, rather than against their own assumptions.

Three Places the Vibes-vs-Ground-Truth Gap Costs Money

Three places where this gap costs real money, and where the difference between the two ways of working is visible.

Supplier Risk Management

In the unstructured version, supplier risk lives in the relationship. “Mark says they are reliable. We have used them for nine years. They have never let us down.” That sentence appears in risk reviews more often than I can count, and I have heard it from procurement leads I respect. It is not stupid. It encodes real information. The problem is that it cannot be stress-tested, it cannot be aggregated across a portfolio, and it cannot survive Mark leaving the business.

In the structured version, supplier risk is a composite of evidence. On-time-in-full performance over the last twelve and twenty-four months. Defect or rework rates by SKU or service line. Financial health from a third party data provider, including payment behaviour, credit deterioration, and changes in directorship. Concentration risk by tier-one and, where you have line of sight, tier-two. Geographic and geopolitical exposure. Cyber posture, in categories where it matters. Scope 3 emissions data where you have it, with provenance recorded. ESG and modern slavery flags, refreshed on a defined cadence. Each of these is a field. Each field has a source. Each source has a refresh date. The risk score is a function of those fields, and you can reproduce it.

Put the two versions side by side. In every diagnostic I have run, somewhere between fifteen and twenty-five percent of suppliers rated “low risk” by the relationship method are flagged as elevated by the structured method, usually because financial deterioration or concentration risk has moved underneath a long-standing commercial relationship. The relationship is real. The risk has changed. Mark has not noticed.

Procurement Savings Verification

The most expensive sentence in procurement is “we negotiated five percent off”. It is expensive because it cannot be defended.

Five percent off what. Off the previous contract price, which itself was negotiated against a market that has since moved. Off list price, which the supplier sets and adjusts at will. Off a should-cost model, which someone built two years ago and has not refreshed. Off a benchmark, which may or may not reflect the relevant geography, volume tier, and service specification. Each of these answers gives a different number, and only one of them, if any, is the answer the CFO will accept.

The structured version of the same claim is line-item price variance tracked against an indexed baseline. If you are buying a copper-bearing component, the baseline moves with LME copper plus a fabrication margin you have negotiated and recorded. If you are buying transport, the baseline moves with a published freight index plus a fuel pass-through you have specified. If you are buying contingent labour, the baseline moves with a regional rate index for the role family, refreshed quarterly. If you are buying packaging, the baseline moves with the relevant ABS PPI series or its equivalent in your geography. The savings claim is then the delta between what you actually paid, line by line, and what the indexed baseline would have predicted you would pay, holding volume and mix constant. (For the data and dashboarding side of this, Purchasing Index is where the indexed baseline and KPI work sits in the Comprara group.)

That number is defensible. You can walk a CFO through it. You can show that the saving is real even though the absolute price went up, because the index went up faster. You can show where the saving is concentrated, which suppliers delivered it, and which categories absorbed it. You can show where a claimed saving was actually a market tailwind that procurement had nothing to do with, and remove it from the claim. The credibility you buy by removing those line items is worth more than the savings you give up.

In the services group I started with, when we rebuilt the FY claim against indexed baselines for the categories where indices made sense, and against contracted rate cards refreshed against external benchmarks for the rest, the verifiable savings number came in at $9.6m. Lower than the original claim. Higher than what finance had been willing to credit. Defensible at every line.

That is what good looks like.

Category Strategy

The third place it shows up is in category strategy itself. The unstructured version sounds like this. “The steel market is tight. We should lock in a two year deal now.” Or, “the contingent labour market has loosened, we should retender.” These statements are not wrong, exactly. They are just under-specified. Tight relative to what. Loosened how much, over what window, in which segments, in which geographies, against which substitutes.

The structured version starts with a category data sheet. For a steel-bearing category, that means the relevant index series (HRC, scrap, iron ore, energy input where it matters), refreshed at the cadence the market trades on. Volatility over the last twelve months. Forward curve where one exists. Capacity utilisation in the relevant region. Trade flow data and any tariff exposure. A should-cost model that takes those inputs and produces a target price for the specification you actually buy, not a generic spec. Supplier capacity at the tier you transact at. Substitute material economics where they are credible.

When you swap one for the other, the conclusions change. I have seen “the steel market is tight, lock in now” become “the input cost stack is tight, but mill margin has compressed, the right play is a shorter contract with an indexed pass-through and a renegotiation trigger if scrap moves more than fifteen percent”. I have seen “contingent labour has loosened, retender” become “headline rates have softened in two of seven role families, the rest are flat or up, retendering the whole panel will cost us more in transition than it saves, we should retender selectively”. Those are different decisions. They lead to different contracts and different outcomes. The data did not change the direction of travel. It changed where you spent your effort and what you committed to.

The Cost of Running Procurement on Vibes: A Worked Example

This is the number sceptical CPOs ask me for, and I am going to show the maths so it is defensible rather than rhetorical. One caveat before I run it: this is not a benchmark. It is a worked diagnostic model using assumptions I would expect to test in discovery on any specific client. The point of showing the working is so a CPO can swap their own assumptions in and rerun it for their own function, not so they can quote my numbers back at the board.

Take a mid-market enterprise with $500m in addressable spend. Reasonable assumptions, drawn from the diagnostics I run, are as follows.

First, savings claim leakage. In organisations without indexed baselines, the gap between procurement-claimed savings and finance-credited savings runs at fifty to seventy percent. If procurement claims savings of three percent of addressable spend, that is $15m of claimed value. If finance credits forty percent of it, $9m of claimed value is rejected or unverified. The leakage is not all real saving foregone, because some of those claims were never real to begin with, but the credibility cost is real. For modelling, assume half of the rejected claims represent genuine value that the function failed to defend, so $4.5m of real, foregone, defensible savings.

Second, decisions made on stale market data. In categories without live index tracking, my rough working estimate is that fifteen to twenty percent of contract decisions are made against a market reference that is more than six months out of date. On a $500m base, with roughly forty percent of spend in categories that move with traded inputs, that is $200m of spend exposed. A conservative mispricing of two percent on those decisions, weighted by the share that goes wrong, gives $1.5m to $2m a year in value left on the table.

Third, supplier risk events that were foreseeable. Across the diagnostics I have run, the median client experiences one to two material supplier disruptions a year that, in hindsight, were visible in third party data eighteen to thirty-six months before they happened. The cost of a single disruption, including expedited freight, line-down events, customer penalty, and emergency resourcing, runs from $500k for a small supplier to multiple millions for a strategic one. Assume one foreseeable event a year at a blended cost of $1.2m.

Fourth, category strategy mis-prioritisation. When effort is allocated by intuition rather than by data-driven category segmentation, between ten and twenty percent of sourcing capacity is spent on categories where the upside is small relative to the work involved. On a function of fifteen FTE at a fully loaded $180k each, that is between $270k and $540k of capacity misapplied. Use the midpoint, $400k.

Add it up. $4.5m of foregone defensible savings. $1.75m of mispriced contract decisions. $1.2m of foreseeable supplier risk events. $400k of mis-prioritised effort. That is approximately $7.85m a year, or roughly 1.6 percent of addressable spend, as the cost of running the function on vibes rather than ground truth.

You can quarrel with each of those assumptions. Halve all of them and you still get close to four million a year. Double them, which is what I see in functions where the data foundation is genuinely poor, and you are at fifteen million. The point is not the precision. The point is that the cost-of-inaction is a real number, it is bigger than what most CPOs are arguing for in their tooling and data budgets, and a sceptical CFO will respond to it more readily than to another upside savings pitch.

If you walk into your next budget conversation with “doing nothing costs us approximately one and a half percent of addressable spend annually, here is the working”, you are having a different conversation than if you walk in with “platform X promises twelve percent savings”.

Self-Assessment: Four Questions Every CFO Will Ask

Take the last three savings claims your function reported. Pick them at random. For each one, ask whether you could walk a sceptical CFO through the working. Then ask the four questions a sceptical CFO will actually ask.

1. What was the baseline, and where did it come from? What good looks like: a baseline price tied to a specific contract line, a specific volume assumption, and either an external index or a documented benchmark with a date attached.

2. What did the market do over the same period, and how do you know? What good looks like: an index series or basket of references, refreshed at a cadence the market trades on, with the saving expressed as a delta against the indexed expectation rather than against the absolute prior price.

3. Where did the saving land, line by line, and which P&L did it flow to? What good looks like: a line-item bridge from baseline to actual, mapped to cost centres and to GL lines, with the share that hit P&L distinguished from cost avoidance and from working capital impact.

4. What changed in the supplier’s risk profile while you were saving the money? What good looks like: a structured supplier risk record showing financial, operational, and concentration risk, with refresh dates, before-and-after the contract event, so you can show the saving did not come at the cost of a risk you failed to surface.

If three out of four answers are “we would need to go and pull that together”, you are running on vibes. That is not a moral failing. It is a foundation problem, and foundations are fixable. They are just not fixable on the timeline of a single sourcing event, which is why most functions never get to it.

If your last savings claim would not survive this self-assessment, the conversation to have is not with another sourcing tool vendor. It is with whoever owns the architecture that produced the data you tried to defend. If you would rather start with your team’s capability gaps before the architecture conversation, the Skills Gap Analysis diagnostic is the other entry point.

What’s Next in This Series

Post 3 takes the argument one step further. Even if you do the work to structure your data, the question of who owns the structure is rarely the one being asked. When your spend taxonomy, your supplier master, your savings ontology, and your risk schema all live inside a vendor’s data model, you have not gained ground truth. You have rented it. The next post is about the switching cost trap, and what it looks like when you realise, three years in, that your data lives in their model.

Get Procurement Insights That Matter

Join 10,000+ procurement professionals getting monthly expert cost-optimisation strategies and exclusive resources. Unsubscribe anytime.

Join