FMEA

Learn how to strengthen your systems by spotting weak points before they lead to costly problems.

Faults in a organisations process and systems can be hard to identify.

An inability to find these risks before they occur, can be costly. And often leads to a reactive stance to problem solving over proactive strategies.

This is why it so important to adopt risk assessment systems that can manage our system, process and indeed, our products.

FMEA stands for Failure Modes and Effects Analysis. It's a structured approach used to identify and evaluate potential failures and their consequences.

A proactive risk assessment tool designed to:

Identify ways a process or product might fail (failure modes)
Analyze the effects of those failures
Prioritize the risks based on severity, occurrence, and detection
Mitigate or eliminate those risks

Let’s break down the key aspects of an FMEA.

Failure Mode

➤ What could go wrong?

A failure mode is essentially the way something can go wrong—how a part, process, or system might fail to do what it’s supposed to. It could be anything from a leaking seal to a software glitch or a part breaking under stress.

By pinpointing these specific types of failures, teams can better understand what might cause them and what the impact could be. This step is key in FMEA because it helps catch issues before they actually happen, making things safer, more reliable, and just better all around.

By almost predicting issues before they occur, you can put preventative measures in proactively.

Effect of Failure

➤ What happens if it fails?

The effect of failure is what happens when something goes wrong—it’s the consequence of a failure mode actually occurring.

This could mean anything from a minor inconvenience, like a slow app, to something serious, like a machine shutting down or a safety hazard.

It’s all about understanding the impact: who or what is affected, and how badly. In FMEA, knowing the effect helps teams gauge how critical a failure really is, so they can prioritize what needs fixing or redesigning first.

The reality is, every process can go wrong somewhere. However, the critical factor is identifying the bigger risks, which poses the largest threat to the organisation and it’s personnel.

Cause of Failure

➤ Why does it happen?

The cause of failure is the underlying reason why a failure happens in the first place. It’s what triggers the problem—maybe a design flaw, poor material quality, human error, or even environmental conditions.

Figuring out the cause helps teams trace the issue back to its source, rather than just treating the symptoms. Identifying the cause is crucial because it points to where improvements or controls should be put in place to prevent the failure from happening again.

Identifying causes needs to be a combination of both conceptual thinking and foresight through experience.

Current Controls

➤ What are we doing to prevent it?

Current controls are the measures already in place to prevent a failure from happening or to catch it if it does.

These could be anything from inspections, alarms, and testing procedures to design features or software safeguards.

They’re basically the safety nets built into a process or product. Looking at current controls helps teams understand how well protected they already are—and whether those controls are strong enough or need improvement to reduce the risk of failure.

Severity (S)

➤ How bad is the effect?

Rated on a scale, usually from 1 (no effect) to 10 (catastrophic). Severity is a measure of how serious the impact would be if a failure actually occurred. It looks at the consequences—would it just cause a minor annoyance, or could it lead to a major safety risk, system shutdown, or customer dissatisfaction?

In FMEA, severity is rated on a scale (usually 1 to 10), with higher numbers meaning more critical effects. This rating helps teams prioritize which issues need the most urgent attention, based on how badly things could go wrong.

🔴 Severity (S) – How bad is it if it happens?

Rating	Description
1	No effect
2–3	Very minor
4–6	Moderate – some impact
7–8	High – affects performance or safety
9–10	Very high – hazardous or regulatory failure

Occurrence (O)

➤ How likely is the failure to happen?

Also rated from 1 (rare) to 10 (very likely). Occurrence refers to how likely a failure is to happen. It’s an estimate of the probability or frequency of a specific cause leading to a failure mode.

This is also rated on a scale, where a higher number means the failure is more likely to happen often or repeatedly. By assessing occurrence, teams can focus on addressing the most frequent or probable issues first, helping to reduce how often problems arise in the real world.

🟠 Occurrence (O) – How likely is it to happen?

Rating	Description
1	Remote – failure almost never happens
2–3	Low – happens rarely
4–6	Moderate – occasional issues
7–8	High – happens frequently
9–10	Very high – almost certain

Detection (D)

➤ How likely are we to detect it before it causes a problem?

Detection is all about how likely it is that a failure will be caught before it reaches the customer or causes a bigger issue.

It looks at how effective the current controls are at spotting the problem in time. 1 means we’re almost certain to detect it; 10 means detection is unlikely. Meaning a lower number being easy to catch and a higher number meaning it’s more likely to slip through unnoticed.

Understanding detection helps teams see where gaps exist in monitoring or quality checks, and where they might need stronger safeguards.

🟡 Detection (D) – How well can we detect it before it happens?

Rating	Description
1	Almost certain to detect – proven detection method in place
2–3	High chance of detection – controls likely to catch it
4–6	Moderate – some chance of detection, but not reliable
7–8	Low – unlikely to detect before it reaches the customer
9–10	Very low – no detection method, or detection is not possible

SOD Formula

To calculate and assess these process risks, the SOD formula is used.

In FMEA stands for Severity × Occurrence × Detection. It's used to calculate the Risk Priority Number (RPN):

FMEA Matrix

The FMEA matrix is a structured table used to identify potential failure modes in a product or process, evaluate their impact, and prioritize corrective actions.

It organizes the key elements—Failure Mode, Effects, Causes, Severity, Occurrence, Detection, and the resulting Risk Priority Number (RPN).

Within a clear, systematic layout. It ranks risks based on their RPN scores, the matrix helps teams focus on the most critical issues, enhance reliability, and proactively prevent problems before they occur.

Example of an effective FMEA Matrix.

Process/Product failure modes with the highest RPN should be prioritised with proactive improvement actions aimed at reducing the probability or occurrence of this issue happening, and increasing the effectiveness of detection strategies.

For example if you had 10 process requirements, ‘Pareto’ (take the vital few) 3 with highest RPN value and prioritise improvement action that will reduce their risk.

After formulating improvement actions, the improved process can be revaluated through the SOD matrix, until a suitable RPN is achieved.

Summary

FMEA is a robust and effective risk assessment technique, that can be utilised for:

Risk management
Quality improvement
Product or process design
Preventing costly errors or safety issues

FMEA helps teams spot potential failures before they happen, making it easier to prevent problems rather than react to them.

By evaluating how serious each issue could be, how often it might occur, and how likely it is to go unnoticed, teams can focus their efforts where the risk is highest.

The result is a smarter, more efficient way to improve reliability, cut costs, and build safer, more dependable systems.

By utilising FMEA as a tool, you can create detailed action plans to facilitate your organisation with improvement action prior to implementing new systems and processes.

A effective usage of this tool will allow you test the feasibility of a project in relation to your companies capabilities.

‹ RACI Matrix

< RACI Matrix

Start your journey from today

Pricing

Start your journey from today

Pricing