The Road to Maintenance and Reliability: What You Need to Know
12 Reliability Challenges Your Organization Should Address for Successful Asset Reliability Management
When it comes to reliability and maintenance, how can you achieve world-class results? The concept is simple: maximize the availability of equipment for the least possible cost— but how do you accomplish it?
Here are 12 key reliability-based topics for your organization to consider on the journey to implementing key asset integrity solutions, strategy and process planning.
1- Good Maintenance Does Not Equal Good Reliability
Designing for Asset Reliability
Good maintenance, including the proper and timely execution of well-written and well-conceived preventive plans, does not always equal good reliability. There is a reason for this: the design of the equipment itself.
Your equipment cannot be maintained to be more reliable than it was designed to be. You can redesign it or change parts to a better design, but then it is no longer the same equipment. That is why the correct design is important to consider before construction. Ask yourself these critical questions:
- When was the last time your company reviewed the lifecycle costs of equipment already in operation to help decide what new equipment should be purchased?
- Do you talk to the people that maintain equipment to get their input about how it could be made easier to maintain?
- Do you run a RAM analysis or set standards for reliability and availability of equipment to suppliers?
We’ve identified that poor design hinders asset reliability. Now we need to address a touchy subject that may result in a finger pointing back at ourselves. We often put in the work, complete the required FMEAs, know what will fail and when, and have detailed instructions on how to perform inspections and repairs, but still have failures! Perform a self-evaluation right now:
- Do you have personnel trained and qualified to do the specific work you are asking them to do?
- Do you have an alignment standard?
- Do you have a vibration standard?
The reality is that we often let our workforces down by not giving them the training and guidelines required to perform precision maintenance.
2- Reliable Operations - Who Is Responsible?
Our Maintenance Department Needs Help
What if you were told that maintenance only accounts for about 20% of failures? The other failures come from two places: design and operations.
Operator care is a vital part of maintenance and can avoid costing your business money. Think about the relationship that you have with your car. Many of us do some amount of maintenance and monitoring for our vehicles—we watch them carefully for odd noises and warnings; we may even change our own oil and brakes. While for many of us it stops just before we become full maintainers and operators, the point stands that if we can do this for our own cars, why can’t it be done in our workplace?
If you wait until maintenance knows about a problem, it is generally past a simple repair, similar to running your car out of oil. Operations must realize that the equipment belongs to them and the maintenance department is there for support. If the equipment fails, we all fail together. Who does your organization hold responsible for breakdowns?
3- Procedures for Asset Management Practices
A Treatise for Organizational Alignment and Continuity
We often hear organizations say, "We don’t have a procedure for that," followed by, "Our people know what they are doing." But what happens if your people are not here tomorrow?
Procedures standardize and align work to ensure everyone does the same job in the same way so that a consistent output is produced. They also allow qualified personnel who are unfamiliar with your organization to be effective from the start. In addition, procedures can be written to pre-make decisions so that action can be taken rapidly, and lag time can be avoided. Do you need to write or update any procedures to align your asset management practices?
4- Organizational Support for Reliability Excellence
The Power of Organization
The journey to improve asset reliability solutions mirrors the journey to improved safety. Good safety records don’t manifest from hiring a group of safety experts and calling it a day. It takes a shift in organizational culture, supported by senior management, to become a reality. Safety becomes the responsibility of everyone in the organization, and each person is empowered to act if they have a concern.
Developing an asset reliability strategy will take the same level of involvement from senior management. Understanding the benefits that reliability excellence brings to an organization is a vital first step. Implementing best practices in reliability and maintenance will result in lower operating costs and increased availability, creating a competitive advantage for your company. It also results in safer working conditions and increased employee satisfaction by addressing the risks associated with asset failure. How have you demonstrated these benefits to your senior management?
5- Master Asset Lists
The Starting Point for Your Organization
So, where should you start? A master asset list is made up of the maintainable assets within the boundaries of your facility. Your organization needs to pre-define maintainable assets, which should meet at least one of the following:
- Is it regularly maintained to preserve the function for which it was acquired?
- Is it within the scope of regulatory requirements to track maintenance history?
- Is it repaired rather than discarded when it fails?
- Does it provide a level of detail desired or required for analytics?
This list will form the basis for all further reliability-based maintenance processes, work reporting and analytics. Do you have a master asset list ready to support your reliability journey?
6- Asset Criticality Ranking
Setting Priorities for Efficient Operations
Asset criticality ranking helps us set priorities for both reliability and maintenance. This means that performing a quality criticality ranking is imperative to effective and efficient operations. The criticality we are referencing is called “business criticality”; it is different from safety criticality and is not interchangeable. Business criticality takes multiple factors such as failure effects on operations, safety, environment, quality and more to produce a numerical score that allows direct comparison to other assets. This scoring reflects the relative severity of impact for reasonable worst-case failures—not how likely it is.
The numbers derived are important for calculations and prioritization of corrective actions. However, for improvement purposes, we must also consider how likely the failure is to occur to determine where to invest resources in preventing failure. By looking at the level of currently invested resources, we can temper the relative ranking for identifying opportunities to prioritize work as well as improve the level of effort put into reliability and maintenance activities. Can your organization point to a procedure that demonstrates why an asset has the criticality ranking assigned?
7- Maintenance Strategy Determination
Going Beyond OEM Recommendations
Original Equipment Manufacturers (OEM) are slowly changing with the times; however, we are not all dealing with new equipment or the OEMs that are changing. This is important as support for why simply following an OEM’s recommendations for maintenance may not be optimal.
Imagine that you are the OEM. You have a piece of equipment you are selling that contains a greased bearing. You know the bearing won’t last forever, so you need to tell the customer when to replace it. You sell thousands of these pieces of equipment, so you don’t create customized maintenance plans based on the customer’s abilities or the operating context of the equipment; you simply write generic maintenance guidelines. You know the L10 life of the bearing is 2000 hours of operation (90% of the bearings will make it at least this long) and the mean time between failures (MTBF) of the bearing is around 5800 hours. When do you tell the customer to replace the bearing?
We can assume the right answer is, "When it needs to be replaced based on vibration measurement," but you as the OEM cannot count on the customer having this capability. The real answer is probably going to be, "I set the interval at a point that we will achieve the fewest complaints." If you set it too low, people won’t like changing bearings all the time and seeing that they are fine. If you set it too high, there will be too many failures. If you set it somewhat high, the failures that occur will be accepted by those who count on OEM maintenance because "these things just happen." Advanced customers will develop their own predictive maintenance plans and replace the bearing only when it needs replacing. How much OEM-based planned maintenance (PMS) are you doing that are not value-added?
8- How an Insurance Mentality Leads to Overstocking
Anticipation vs. Reaction
The question: "Why do you keep spare parts?" The typical answer: "To fix stuff when it breaks." If this is your logic, you may be missing some key concepts of maintenance resulting in reacting to events instead of anticipating them. When we are in reactive mode, the only way to store spare parts is to buy at least one of everything that might break. This is called an "insurance mentality." You don’t want to get caught not having the part when the equipment breaks! This typically occurs when the reasons for equipment failing are not understood and adequate mitigating activities are not in place to prevent or detect the failure starting to occur.
On the surface, this may not appear to be a bad thing. Many organizations have operated this way for so long that it is normal to them. It is even possible that the majority of failures are so minor and fixed so quickly (or there is enough redundancy) that operations are barely affected. So, what’s the problem?
Most organizations fail to consider the costs associated with inventory which we file under the name "Inventory Carrying Costs." These costs include storage, maintenance, manpower for counting and organizing and more nebulous conditions like the weighted average cost of capital (including money tied up in inventory that could be lucrative in other parts of the business). These costs typically range between 20-30% of the inventory value on an annual basis. Does your organization have an "insurance mentality" when it comes to spare parts?
9- Determining Spare Parts
Don't Wait Until it Breaks
We discussed that having one of everything "just in case" is not the way to stock a warehouse. So, what is? If we don't plan to simply fix things after they break, then we should probably fix things before they break (most of the time)! It all starts with identifying the appropriate maintenance strategy, which includes developing maintenance tasks to prevent or mitigate the consequences of functional failure. For example:
- If the strategy is fixed-time replacement, you should develop a repair bill of materials.
- If the strategy is condition-directed, a planned corrective task with a repair bill of materials should also exist.
- If the strategy is no planned maintenance, there should be a planned corrective maintenance task with a repair bill of materials.
Notice this is the caveat mentioned earlier; we do sometimes wait for stuff to break and then fix it, but it is only a strategy if we planned to do it!
These repair bills of materials establish the list of inventory parts, but not the order point or economic order quantity. To determine these, we use equations with inputs like lead time, part cost and frequency of use to mathematically determine when and how many parts to order. Are your stocking levels set correctly?
10- Utilizing Reliability Analytics
The Power of Data
Reliability analytics are the tools we use not only to determine how well our reliability program has been set up but also to guide us through our weak areas to make improvements. In short, reliability analytics is about starting with the end in mind. The analytics we are performing is based on data, so where does the data come from and how can we make sure the data is valuable?
Fortunately, we have these incredible things called Computerized Maintenance Management Systems (sometimes referred to as PMS or EAM systems, depending on where you live and what industry you are in) which allow us to store data in structured databases that make analytics easy — if we use them properly. Setting up equipment classes, failure code/cause lists and standardizing procedures for how work is entered into the system are some of the basics for analytics. Simply putting these in place does not generally result in good data. Training, monitoring and reinforcement are all necessary to achieve good data.
What's most important is that we achieve results. We must use the data provided and demonstrate how it was useful to gain the trust of those who we are asking to enter the data. Seeing the data used and used in a way that makes their lives better, is the fastest way to achieve quality data in the future. How is your data quality?
11- Performing Root Cause Analysis
Starting Your RCA Program
So, you took all the steps towards a successful reliability and maintenance program, but you are still suffering some failures. What’s going on? The good news is that you have set yourself up with a strong asset hierarchy and quality data to help pinpoint the areas we need to investigate!
One of the most important questions to ask at this time is, "When should we perform a Root Cause Analysis (RCA)?" The answer: It depends on whether this is a chronic or acute issue and the severity of it.
- Chronic issues are identified using tools such as Pareto charts. Think of this as "death by a thousand paper-cuts". We first need to determine where the cuts are coming from before we can work on identifying why it’s happening.
- Acute issues require the development of severity criteria. Low impact issues can then be handled by a 5-why analysis performed by the person in the field. Medium impact issues may warrant a small team to investigate. High impact issues may necessitate a larger investigation with engineering support. The purpose of these analyses is to identify the true causes and implement corrective measures to eliminate future occurrences.
The key to starting an RCA program is to set reasonable severity levels. Overloading staff with RCAs will typically yield poor results as they rush to complete them. Set the severity levels higher to begin and as things start to improve, lower your thresholds. Is your RCA program on track to drive improvement?
12- Continuous Improvement: A Living Process
Breaking the Cycle
Although we have determined root causes and what should be done to keep failure from recurring, we often find ourselves repeating the cycle. Typically, continuous improvement processes (or the lack thereof) result in many initiatives failing to achieve the desired outcome.
It is imperative to address the root causes, put the appropriate operational controls in place to mitigate further occurrences, then measure the success of those actions and adjust them if required. The piece lacking in most facilities is the follow-up process. Systems change, materials change, technologies change and people change. Therefore, Continuous Improvement must be a living process. When organizations do not make necessary changes to their business processes, work practices, policies and governance, project success is short-lived, no matter which advanced tools are applied to improve performance.
Has your organization seen a recurrence of identified root causes in your facility that should have been corrected through Continuous Improvement?