Menu
What Is an Equipment Failure Mode in Manufacturing? A Plain-English Guide

What Is an Equipment Failure Mode in Manufacturing? A Plain-English Guide

Key Takeaways

 

  • A failure mode is a specific way in which a piece of equipment fails to perform its required function. The same equipment can have multiple distinct failure modes, each with different causes, different detection characteristics, and different appropriate maintenance responses.
  • Understanding failure modes is the foundation of every effective maintenance program. PM intervals, condition monitoring strategies, spare parts stocking decisions, and inspection task content all flow from failure mode analysis.
  • Failure modes are not the same as failure causes or failure effects. These three concepts are distinct, and confusing them produces maintenance interventions that address symptoms rather than root causes.
  • Most manufacturing assets have between three and eight significant failure modes that account for the majority of their maintenance cost and unplanned downtime contribution.
  • The failure mode determines the appropriate maintenance strategy. A failure mode with a detectable precursor signal warrants condition-based maintenance. A failure mode with no detectable precursor signal warrants preventive replacement or run-to-failure depending on the failure consequence.
What Is an Equipment Failure Mode in Manufacturing? A Plain-English Guide

What a Failure Mode Is

 

A failure mode is a specific way in which an asset fails to perform what it is required to do.

The word "specific" is important here.

"The pump stopped working" is not a failure mode.

It is a failure event.

The failure modes that could produce that failure event are many and distinct.

The motor driving the pump failed due to bearing seizure.

The impeller eroded through abrasive service and can no longer produce the required flow rate.

The mechanical seal failed, allowing process fluid to leak into the motor.

The pump casing cracked due to water hammer from valve operation upstream.

Each of these is a specific failure mode.

Each has a different physical cause, a different rate of development, a different detectability characteristic, and a different appropriate maintenance response.

Lumping them together as "pump failure" and applying a single PM interval to address all of them simultaneously is the maintenance planning error that produces over-maintained components on one failure mode and missed failures on another.

 

Failure Mode vs. Failure Cause vs. Failure Effect

These three terms are frequently confused in manufacturing maintenance discussions and the confusion produces maintenance programs that address the wrong level of the failure chain.

 

Failure effect

The failure effect is what happens as a result of the failure mode occurring.

Production stops.

Product quality falls outside specification.

A safety hazard is created.

The failure effect is what the operator or supervisor notices first and what generates the maintenance work request.

 

Failure mode

The failure mode is the specific way the asset failed to perform its required function.

The impeller eroded and can no longer produce the required flow rate.

The bearing seized and the shaft can no longer rotate.

The seal failed and process fluid is leaking.

The failure mode is what the technician diagnoses when they arrive at the asset.

Failure cause

The failure cause is the specific physical, chemical, or operational event that produced the failure mode.

The impeller eroded because the process fluid contains abrasive particles above the design specification for this pump model.

The bearing seized because the lubrication interval was too long for the actual duty cycle of this pump.

The seal failed because the process fluid temperature exceeded the seal material's operating range.

The failure cause is what root cause analysis reveals after the failure mode is diagnosed.

 

Why the distinction matters for maintenance

A maintenance program designed to address the failure effect replaces the pump every time it stops working.

A maintenance program designed to address the failure mode inspects and replaces the impeller when erosion measurement shows it approaching minimum thickness, lubricates the bearing at the correct interval for the duty cycle, and monitors seal condition as a leading indicator of failure.

A maintenance program designed to address the failure cause installs an upstream filtration system to remove abrasive particles, adjusts the lubrication frequency to match actual duty cycle, and upgrades the seal material specification to match the process temperature.

Each level of intervention is more effective than the previous one.

Each requires more specific knowledge of the failure chain.

Failure mode analysis is the step that makes cause-level intervention possible.

 

How to Identify Failure Modes for Manufacturing Assets

Failure mode identification is the analytical starting point for every effective maintenance program improvement.

Four information sources provide the input to failure mode identification.

 

Maintenance work order history

The corrective work order records from the last 12 to 24 months contain the most operationally relevant failure mode information available for a specific asset in a specific facility.

Reviewing closed work orders for a specific asset and grouping them by fault description reveals the failure modes that have historically occurred, their frequency, and their associated repair cost and downtime.

The quality of this analysis depends directly on the quality of fault code specificity in the work order records.

Work orders closed with "mechanical fault" as the failure code cannot support failure mode analysis.

Work orders closed with "bearing failure, outer race fatigue, motor drive end" provide the specific failure mode information that analysis requires.

 

Manufacturer technical documentation

Equipment manufacturers typically document known failure modes for their equipment in maintenance manuals, service bulletins, and reliability data sheets.

This documentation provides a starting point for failure mode identification, particularly for new equipment with limited operating history in the specific facility.

Manufacturer documentation should be validated against actual facility experience rather than accepted uncritically, because the failure modes a manufacturer documents reflect average operating conditions that may differ significantly from the specific facility's duty cycle, process fluids, and environmental conditions.

 

Operator and technician knowledge

The experienced technician who has worked on a specific asset type for several years carries detailed knowledge of how it fails in practice in this facility's operating conditions.

Structured conversations with experienced technicians, often called knowledge elicitation or expert interviews, surface failure mode knowledge that has never been formally documented and would otherwise disappear when the technician retires or changes roles.

 

Engineering analysis

For assets where operating history is limited and manufacturer documentation is insufficient, engineering analysis including stress analysis, material property assessment, and process chemistry review can identify failure modes from first principles.

This approach is most commonly used for new equipment introduction or for assets operating in unusual conditions that their original design did not anticipate.

 

Failure Mode Characteristics That Determine Maintenance Strategy

Once failure modes are identified, two characteristics of each mode determine the appropriate maintenance strategy.

 

Characteristic 1: Detectability — does the failure mode produce a detectable precursor signal?

Some failure modes develop gradually and produce detectable changes in measurable parameters before functional failure occurs.

Bearing wear produces increasing vibration amplitude and characteristic fault frequencies in the vibration spectrum weeks before the bearing fails.

Cutting tool wear produces increasing cycle time deviation and surface finish degradation before the tool fails to hold dimensional tolerance.

Hydraulic seal degradation produces gradually increasing oil consumption before complete seal failure.

For these failure modes, condition monitoring is technically feasible. The P-F interval provides a window for planned maintenance intervention.

Other failure modes occur suddenly with no detectable precursor signal.

Electrical relay failure is often sudden and random, producing no detectable change in the relay's external characteristics before it opens or fails to open on command.

Brittle fracture from an impact event produces instant failure with no precursor development period.

For these failure modes, condition monitoring cannot prevent the failure. The maintenance strategy must be either preventive replacement before the failure can occur or run-to-failure with a documented emergency response plan.

 

Characteristic 2: Failure consequence — how severe is the impact when this failure mode occurs?

The consequence of each failure mode determines how much maintenance investment is justified to prevent it.

A failure mode that stops a Tier 1 production line, creates a safety hazard, or produces a regulatory compliance breach justifies significant maintenance investment including continuous condition monitoring, conservative PM intervals, and pre-staged critical spare parts.

A failure mode that causes minor inconvenience and a low-cost repair with no production impact justifies run-to-failure with a simple documented repair procedure.

 

The matrix of detectability and consequence determines the maintenance strategy.

High consequence, detectable precursor: condition-based maintenance with planned intervention within the P-F interval.

High consequence, no detectable precursor: time-based preventive replacement at an interval shorter than the characteristic failure age, or design modification to reduce failure probability.

Low consequence, detectable precursor: periodic inspection with run-to-failure acceptance if the precursor is not detected.

Low consequence, no detectable precursor: run-to-failure with documented repair procedure.

 

Failure Mode Documentation in Practice

Failure mode knowledge that exists only in people's heads is organizational risk.

When the experienced technician who knows how an asset fails retires or leaves, the knowledge leaves with them.

Failure mode documentation converts that knowledge into institutional memory that persists regardless of personnel changes and serves as the foundation for PM task design, condition monitoring configuration, and spare parts stocking decisions.

A failure mode record for each significant failure mode on each critical asset should contain four elements.

Failure mode description. A specific, observable description of how the asset fails to perform its required function. Specific enough that any technician reading the description can recognize this failure mode when they encounter it.

Failure cause. The specific physical, chemical, or operational event that produces this failure mode. Specific enough to identify a corrective action that addresses the root cause rather than only the symptom.

Detectable precursor signals. The specific condition parameters that indicate this failure mode is developing before functional failure occurs. Including the measurement method and the expected signal change associated with developing failure.

Consequence description. The specific production, safety, quality, and regulatory consequences that occur when this failure mode reaches functional failure. The basis for the criticality-weighted maintenance investment decision.

 

 

Failure Modes and OEE: The Connection

Every OEE loss event in the Six Big Losses framework traces back to one or more specific failure modes on specific assets.

An Availability loss from an unplanned equipment stop is produced by a specific failure mode on a specific asset.

A Performance loss from reduced production speed is produced by a specific failure mode producing mechanical degradation that reduces the asset's output capability.

A Quality loss from increased defect rates is produced by a specific failure mode in tooling condition, seal integrity, or process parameter control.

 

OEE monitoring that is not connected to failure mode knowledge produces data without diagnosis.

It shows that OEE declined.

It does not reveal which failure modes are responsible.

 

Failure mode knowledge that is not connected to OEE data produces diagnosis without prioritization.

It identifies what can go wrong.

It does not reveal which failure modes are producing the largest OEE losses and therefore warrant the most urgent attention.

The combination of machine-connected OEE data and structured failure mode knowledge produces both diagnosis and prioritization.

The failure modes producing the largest OEE losses are identified.

The appropriate maintenance intervention for each failure mode is designed based on its detectability and consequence characteristics.

The improvement program addresses the right failure modes with the right interventions in the right order of financial priority.

 

Frequently Asked Questions

 

How many failure modes should be analyzed for each asset?

Analysis depth should be proportionate to asset criticality.

For Tier 1 assets, analyze all significant failure modes that have occurred in the last 24 months plus any failure modes identified through engineering analysis that have not yet occurred but would produce significant consequences if they did.

For Tier 2 assets, focus the analysis on the three to five failure modes with the highest historical frequency and consequence.

For Tier 3 assets, failure mode analysis adds limited value relative to the analysis effort. Run-to-failure is typically the appropriate strategy regardless of specific failure mode characteristics.

 

What is the difference between a failure mode and a fault code?

A fault code is the code assigned to a maintenance event in a CMMS work order, typically from a predefined list that technicians select when closing a work order.

A failure mode is the underlying physical mechanism of failure.

The relationship between fault codes and failure modes depends entirely on how specifically the fault code list is designed.

A fault code list with entries like "electrical fault," "mechanical fault," and "operator error" does not map to failure modes.

A fault code list with entries like "bearing failure, rolling element," "seal failure, mechanical seal, process side," and "impeller erosion, abrasive service" maps closely to failure modes and supports failure mode analysis from work order data.

 

How often should failure mode analysis be updated?

Failure mode analysis should be reviewed when new failure modes are discovered in operation that were not in the original analysis.

It should be updated when equipment modifications change the failure mode profile.

It should be refreshed when operating conditions change significantly, such as a process fluid change, a production rate increase, or a product mix change that affects equipment duty cycles.

A static failure mode analysis that accurately described an asset five years ago may be meaningfully incomplete today if the asset's operating context has changed.

 

Every PM interval, every condition monitoring threshold, every spare parts stocking decision, and every inspection task in a manufacturing maintenance program is only as good as the failure mode knowledge it is built on. Getting the failure modes right is the analytical step that makes every subsequent maintenance decision defensible rather than arbitrary.

Related articles

Latest from our blog

Define Your Reliability Roadmap
Validate Your Potential ROI: Book a Live Demo
Define Your Reliability Roadmap
By clicking the Accept button, you are giving your consent to the use of cookies when accessing this website and utilizing our services. To learn more about how cookies are used and managed, please refer to our Privacy Policy and Cookies Declaration