The Anomaly: A FinOps Detective Story

This case is reconstructed from a customer engagement; identifying details are removed. It begins with a 14% month-over-month spend spike on what should have been a stable workload. It ends, two weeks later, with a recovery of roughly 1.4 million BDT and a FinOps practice that did not exist before the investigation started. Names are fictional; sequence is real. The story is told here as a detective procedural rather than as a framework summary, because the framework summary is what every FinOps vendor delivers and is not, in our experience, what teaches the discipline.

+14%

Month-over-month spend spike that started it

1.4M

BDT recovered after the investigation

11 d

Time from anomaly detection to fix

Tag

Single discipline that made the rest possible

The investigation

Eleven days, six discoveries, one FinOps practice

Day 0

Anomaly fires

Cost-and-usage dashboard flags a 14% spend spike vs a stable trailing baseline.
Day 1

Untagged drill-down

Roughly 11% of the spike sits in untagged resources. We cannot allocate them to a BU.
Day 2

Tagging audit

Tagging coverage is 84%. The 16% gap explains most of the visibility loss.
Day 4

Idle disks discovered

320 unattached EBS-equivalent volumes from a six-month-old test campaign. Combined: 28 TB at premium tier.
Day 6

Oversized DB instance

A reporting database had been sized for a launch peak in early 2024 and never resized. Memory utilisation was 12%.
Day 8

Reserved instance shortfall

Steady-state compute that should have been on RIs was on demand. ~22% over-spend on that footprint alone.
Day 11

Recovery + practice spec

1.4M BDT recovered. FinOps charter signed: 1 central role, 4 BU ambassadors, monthly review cadence.

Source: Reconstructed customer engagement, 2025.

What the investigation taught

84%

Starting tag coverage that hid most of the recoverable spend

Targeted 95%+ within a quarter. Below that threshold, allocation breaks and chargeback fictions take over.

What an idle disk taught about ownership

The 320 unattached volumes are worth a closer look. They were created during a marketing-led test campaign six months prior, by an engineer who had since rotated off the team. Nobody owned them. They were not tagged with an owner, an environment, or a cost-centre. The cleanup itself was trivial — fifteen lines of Terraform — but the reason they existed was the practice’s first lesson. Without tagging at provision time and without a regular orphan-resource sweep, idle assets accumulate at roughly the rate of organisational change. The cleaner the provisioning workflow, the smaller the orphan tail.

Where the savings actually came from

Distribution of the 1.4M BDT recovery, by source

Idle resource cleanup Stopped VMs, orphaned disks, unused IPs

31 %

Compute rightsizing Smaller SKU, fewer cores

24 %

Commitments (RI / Savings Plans)

21 %

Storage class & retention tuning

13 %

Spot / preemptible adoption

11 %

Source: Single-engagement breakdown; consistent with the broader FinOps composite.

The maturity that emerged

The investigation made one thing concrete: FinOps is not a tool you buy, it is a practice you run. A small central group sets standards and runs the data pipeline; engineering ambassadors in each BU implement optimisations on the ground; finance handles invoicing, chargeback mechanics, and reconciliation. The role mix matters more than the headcount. A central team without engineering credibility is ignored; ambassadors without central guidance freelance into inconsistent practice; finance without engineering partnership invoices people for things they cannot control.

Why tagging is the y-axis of every FinOps decision

The investigation’s second lesson, and the one we now teach first to new customers, is that tagging is the foundational discipline. Without tags, allocation breaks. Without allocation, chargeback fictions take over. Without chargeback, the feedback loop that drives behaviour change does not exist. Mature FinOps practices enforce tagging at provisioning time — Terraform modules that fail without owner, environment, and cost-centre tags; cloud-native policy engines that block untagged resources; nightly reports that name the senior leader of any BU below 95% coverage. Each of these is unglamorous and high-leverage.

What the FinOps Foundation framework gets right

The framework’s three phases — Inform, Optimise, Operate — are the right framing. Inform is visibility, allocation, and benchmarking. Optimise is rightsizing, commitment management, and cleanup. Operate is the continuous improvement loop, the engineer-facing cost signals, and the cultural posture that makes the savings durable. Most organisations get stuck at the boundary between Inform and Optimise, because Inform is largely a data problem and Optimise is a behaviour-change problem. The FinOps practice that ships savings is the one that takes the behaviour-change problem seriously.

The first ninety days

Tagging to 95% coverage. Allocation to BU. Three rightsizing waves, quarterly. Commitment-management cadence. Anomaly-detection alerts flowing to BU leadership. Six months in, the practice pays for itself. Two years in, the practice is one of the most strategically important cost-management capabilities the CFO has — because it is the only capability that scales linearly with the cloud bill.

The cultural posture that distinguishes a working practice

Three traits show up consistently in FinOps practices that work in their second and third year. The CFO and the platform-team head meet monthly with the same dashboard in front of them. Engineers see cost-per-environment in their CI pipeline output, alongside test results. And the architecture review board includes a cost criterion on every new design — not as an after-the-fact challenge but as a day-zero constraint. Each of these takes time to build; none of them is hard once the practice exists.