
Key takeaways
Maintenance backlogs grow for boring reasons. A work order opens for "vibration on Line 3 motor", the operator reports it, the technician swaps a bearing, the symptom is gone, but the original work order is never closed because closing it requires opening the CMMS, finding the right row, and clicking three things. A week later another operator reports the same symptom and opens a new work order. Now there are two. Multiply by 200 assets and 18 months and the backlog is unrecognisable.
The second source is duplicates from different reporters. Production opens "Line 3 packer slow." Maintenance opens "Line 3 packer bearing." Quality opens "Line 3 packer rejects up." All three are about the same asset, possibly the same root cause, but they sit as three separate rows. Without a deduplication pass, the backlog inflates without reflecting more actual work.
The third source is the absence of a clear close criterion. A work order is "in progress" because someone is working on it; it stays "in progress" indefinitely because nobody defined what "done" means for that work order type. The article on work order management systems covers the lifecycle states this method depends on.
The whole backlog gets exported to a spreadsheet, sorted by age. Rows older than 90 days at the top. This is the only step where the CMMS UI is not the primary tool, sorting and bulk-editing in a spreadsheet is faster for triage, and the results get re-imported.
For every row, one of:
Most of the 30-day work happens here. A two-person triage team, one maintenance lead, one production representative, can move through 300 rows in a week if they are not also closing the work orders themselves. Resist the urge to merge triage with execution; they require different brains.
Ghosts get closed with a "post-hoc verified, already done" reason code. Duplicates get merged into the oldest row. Wrong-type rows get routed to the right system (PM schedule for recurring work, parts request system for parts-only items). At the end of day 10 the backlog should be 40-60% smaller than the starting count, before any actual fix has been done.
Most CMMS priorities are wrong. They were set by the original reporter, who had context only for that one work order. Now that the backlog is half its original size, the maintenance manager can re-rank the survivors against each other:
A clean rule of thumb: if the plant has 100 assets, P1 should be under 10 rows. If P1 has 40 rows, "P1" no longer means anything and the whole priority system has collapsed. Re-rank until P1 fits the rule.
Every surviving row gets a named owner, one person, not a department. Without this, P1 work sits because everyone assumes someone else has it. This is the second-highest leverage move in the method. See the framing in our piece on manufacturing KPIs for how ownership ties into trend metrics.
For the last 10 days, every working day has one named technician responsible for closing two P1 rows. Not "the team will close", one person, two specific row IDs. This produces 20 closed P1s in 10 days, which is usually most of the surviving P1 stack.
This is what makes the cleanup last. Three standing rules to lock in:
For a typical 100-asset mid-market plant, a healthy backlog is 30-60 open rows. Bigger than that and demand is overwhelming capacity (or duplicates are creeping back). Smaller than that and the system is no longer capturing real demand. Watch the count weekly; the trend matters more than the absolute number. Pair the backlog count with the broader preventive maintenance schedule so the team knows what proportion of effort is reactive vs planned.
A plant that starts at 220 open work orders typically lands at:
The "real" backlog turns out to be much smaller than the open-row count suggested. The plant has not added capacity. It has just stopped pretending that 220 rows reflected 220 separate units of work.
The triage steps above are tool-agnostic, they work in any CMMS, including a spreadsheet. What changes when the CMMS is built on a unified OEE+CMMS foundation is the prevention side: cluster threshold rules and OEE-event linking (covered in the article on work order management systems) catch the recurring causes that produce duplicates in the first place. Fabrico is built for that workflow. To see what cleanup looks like against your live data, book a demo.
Scale the triage team. The method is the same, but the day-1-to-10 phase needs more bodies. A 500-row backlog with two triagers takes ~3 weeks of triage; with four, it fits in 10 days. Do not let the method stretch to 60+ days, the freshness of context decays.
Yes, in plants that have not done a cleanup in 18+ months, ghosts are usually the single largest delete bucket. The longer the gap since the last cleanup, the higher the ghost share. Plants on a quarterly cleanup cadence keep it low.
Open PM tasks are a separate population. They have their own due dates and their own success measure (on-time completion rate). Mixing them into the reactive backlog hides the picture of both. Keep them separate even if they live in the same CMMS.
Yes, with care. The number can drop dramatically in the first 30 days and look like a miracle. Then it stabilises in the 30-60 range and looks like nothing is happening. Pair the count with the trend of P1 rows specifically, which is the more meaningful signal.
Letting duplicate detection lapse. If the reporter is not prompted to confirm or merge on every new open, duplicates reappear within a quarter and the backlog re-inflates. The duplicate-detection rule is the one with the lowest engineering cost and the highest payoff over time.