The Case for Condition Monitoring
What exactly is condition monitoring? How does it differ from other maintenance philosophies? And in what way does it impact the discipline of control engineering in the typical plant? Steve Sabin provides some answers.
For years, conventional wisdom held that the longer a piece of equipment was in service, the more likely it was to fail. Thus, it was believed that as long as maintenance was performed at the correct interval, asset failure could be avoided. While it is true that one of the primary goals of maintenance is to perform maintenance activities at the right time, the historical approach to this challenge was to rely solely on calendar intervals – whether running hours or simply total hours.
As it turns out, the calendar is a very poor basis for scheduling maintenance activities on the majority of assets. Indeed, the data compiled across several decades, dozens of industries, and hundreds of thousands of assets, suggests that a time-based approach to maintenance is appropriate for only 11 percent of the assets in a typical plant.
Surprising? To many, yes. But to the people that have actually worked in a reliability or maintenance discipline, probably not. They understand that for many items, the most likely time for failure to occur is immediately after maintenance has been performed – either because the parts have hidden defects or because the work was performed improperly. They understand that certain items, such as brake shoes, chains, and sprockets, wear at a rate that is strongly correlated to usage hours, but other items – most notably electronic instruments, do not behave that way at all.
Depicted below are various categories of wear, along with the types of assets typical of each category and the approximately percentage of such assets in a typical plant. Arguably, the single most valuable message conveyed by this table is that for nearly 90 percent of the assets in a typical plant, a calendar-based approach to maintenance is exactly the wrong approach.
Choosing technologies
Certain assets lend themselves to certain condition monitoring technologies. Ideally, direct measurement is preferred, but this is often not possible. For example, when looking for broken or shorted turns in a motor or generator, physical examination of the windings requires the unit to be taken out of service and visually inspected.
Consequently, indirect methods are used, such as thermal imaging cameras that examine the rotor in-situ, looking for temperature variations indicative of increased resistance (hot spots) or no conduction (cool spots). Thermal imaging can also be used on static assets, such as switchgear and transformers, which do not have moving parts. Another technology used on some large motors and generators measures partial discharge as insulation fails and allows arcing to occur.
The key to choosing the correct technology lies in Failure Modes and Effects Analysis (FMEA) studies, where the types of failures are categorized and the failure mode is ascertained. For example, unbalance on a rotating machine is a common problem, as is misalignment. Both manifest as changes in vibration. Other technologies could also detect these problems, but often only after the failure has progressed significantly.
For example, severe misalignment will cause premature bearing wear on machines with rolling element bearings, or gear tooth wear on machines that incorporate contacting gear teeth. As the bearings or gear teeth wear, they will typically deposit metallic debris in the lubricating oil. As such, lube oil analysis – although an important condition monitoring technology in its own right – would be considered a much less sensitive indicator of unbalance or misalignment than would vibration because it relies on collateral damage (bearing or gear wear) rather than direct measurement of the forces acting on the machine (vibration).
Another outcome of the FMEA study is that it quantifies the effects of failure. While a failure mode helps isolate the particular technology that might be useful, the effects of failure help determine whether it is appropriate to let the machine simply run to failure or whether the ramifications of failure are so significant – such as a total loss of production, an environmental leak, or an explosion – that the most sophisticated and comprehensive suite of monitoring technologies is warranted.
It is beyond the scope of this article to provide an in-depth discussion regarding the specific technology to use on a specific type of equipment. However, listed and described below are the most common technologies:
Vibration monitoring
Used primarily on rotating machinery because it is an excellent indicator of the primary forces acting on the machine. Many failure types such as unbalance, misalignment, rubs, resonances, instabilities, bearing wear, and loose or broken parts can be detected and differentiated from one another using vibration analysis.
It is also a particularly effective technology for machinery protection, allowing operators to automatically shut down the unit when vibration becomes excessive. Because vibration monitoring is often present on machinery for protection purposes, extending its use to condition monitoring purposes, rather than strictly protection, is straightforward and makes it the most prevalent of all condition monitoring technologies
Lubrication analysis
Another very common condition monitoring technology, through examining the types of particulates present in machine lubrication oil, such as metallurgy and shape, certain problem can be ascertained and isolated to the affected parts. By looking at the chemistry of the lubricant, contaminants can be found, such as water, which can indicate a different class of problems unrelated to wear.
This is a very sophisticated technology and hundreds of commercial lubricant analysis labs exist. It is very commonly used with assets such as heavy construction equipment that does not easily lend itself to permanent monitoring. Instead, oil is manually sampled at regular intervals and sent to a lab for processing. Oil analysis can also be used to assess the condition of electrical oil-filled transformers.
Thermography
This is essentially an infrared camera that records images, looking for temperature anomalies in things such as motor, generator, or transformer windings.
Current analysis
By looking at the waveforms of electrical current, voltage, and power in motors and generators, a number of problems can be ascertained where technologies such as vibration monitoring and oil analysis are less effective.
Non-destructive testing
Eddy current transducers can be used to inspect the surface of turbine blades, pressure vessels, and other structures where the presence of hairline cracks may occur, invisible to the naked eye.
Ultrasonic detection
High-frequency acoustic energy can be useful for detecting some types of failures, such as leaking valves.
The foregoing is only a partial list. Other technologies such as pressure measurements in compressor cylinders, partial discharge measurements on motors/generators, temperature spreads on gas turbine exhaust nozzles, and many others exist. The key point is that the condition monitoring technology is chosen as a function of the equipment failure mode(s).
Continuous categories
The ways in which condition monitoring technologies are applied fall into one of three basic categories:
• Continuous
The measurement is made continuously on the machine in an automated fashion, without the need for human intervention.
• Non-continuous
The measurement is performed manually at specific intervals or when another measurement suggest independent verification. Oil sampling is one example of a non-continuous technology. Thermography is another. Vibration, temperature, and pressure measurements can fall into any of the three categories, and the choice is usually a function of the criticality of the machine.
• Quasi-continuous
These systems are online and do not require manual intervention; however, usually for cost reasons, the instrumentation architecture will employ a so-called “sensor bus” where a centralized processor will intermittently sample readings from hundreds or thousands of connected sensors. These systems are generally used for assets that warrant more than periodic manual data collection, but cannot justify the expense of a continuous monitoring system.
The nature of the architecture is that processing of signals and samples is done in serial rather than parallel, and wiring is shared rather than dedicated to each individual sensor. As a result, overall system cost is lower, but performance capabilities are more limited than a continuous system.
While continuous and non-continuous systems have been in use for many years, there is increasing interest is in quasi-continuous systems as they enable assets traditionally been addressed by labor-intensive manual data collection schemes to be addressed with online systems instead. Wireless technology is playing a significant role in reducing the installation costs of quasi-continuous systems.
Assessing criticality
The choice of condition monitoring is not simply choosing the correct technology. It also involves determining whether the asset warrants any condition monitoring at all. This, too, is a function of a properly performed FMEA study. The failure modes of some assets simply don’t lend themselves to a condition-based approach, as they fall primarily in categories A-C in the diagram above. For these, a maintenance strategy that relies upon calendar intervals may well be appropriate.
Other assets simply do not represent sufficient safety or financial impact to warrant much more than a run-to-failure approach. As such, there will always be a certain population of assets for which “change the oil every quarter and tighten bolts” will be adequate. The economics for such assets dictate little more than performing minimal maintenance and allowing the machine to run until it fails.
Once it has been determined that an asset is an appropriate candidate for condition monitoring, not only are the technologies chosen, but a decision must be made whether the asset warrants continuous, quasi-continuous, or non-continuous monitoring. This is a function of the asset’s criticality, which falls into one of three broad categories: Critical, Non-Essential, Essential.
Critical assets represent one or more (but often all) of the following attributes:
- substantial or total process interruption if they fail
- significant safety risk if they fail, such as a fire, toxic leak, or explosion
- significant repair costs and/or long lead times for part
Clearly, the costs and repercussions of failures on such assets dictates continuous monitoring and multiple condition monitoring technologies. The main air blower for a catalytic cracker unit in a refinery is one such example. Another would be the main turbine-generator trains in a large power plant. For such assets, it is the cost of failure that is of primary concern.
Non-essential assets are at the opposite end of the spectrum from critical assets. The primary driving factor for monitoring these assets (if monitored at all) is not the impact of safety or process downtime. Instead, it is simply the ability to reduce maintenance costs by planning maintenance proactively and eliminating root causes rather than just replacing parts at regular intervals.
Whereas a typical industrial plant may have –at most – a few dozen critical machines, it will often have thousands of non-essential machines. Combined, the maintenance investment required to fix many small problems when these machines fail unexpectedly can be quite significant. Thus, for such assets, it is the cost of maintenance that is of primary concern.
Essential assets are somewhere along the continuum between critical and non-essential. A good example would be the numerous small process pumps in a typical petrochemical facility. Any one pump will not have a significant impact on production – indeed, the pumps may even be spared. However, a seal or bearing failure could result in a leak, overheating, and subsequent fire. Manual data collection has proven to be inadequate, while fully continuous monitoring would be considered “over kill.” A quasi-continuous system is often employed instead, allowing condition to be checked several times an hour, which is generally adequate as the failure mode occurs in tens of minutes, not fractions of a second.
For these assets, cost of safety is typically among the primary concerns. The asset failure itself may represent a safety hazard, or the ability to safely collect data manually might be involved, such as on the hot end of a large paper machine.
The process link
There are several reasons why it is becoming increasingly important for control engineers to understand the elements of condition monitoring.
Firstly, asset wear is often a function of process conditions, which means that an asset can age more in the span of a few minutes when run at off-spec conditions than during years of normal conditions. One example is process pumps that might inadvertently run in cavitation conditions. Another example occurs in the hydro industry where units that were once run under base load conditions are suddenly used for peaking and run up and down numerous times each day, subjecting the asset to stresses and transient process conditions for which it was not originally designed.
The relationship between machine conditions (such as bearing temperatures and vibration) and process conditions are vital, and most modern condition monitoring software using continuous technologies will have a facility for the integration and correlation of process conditions with mechanical conditions.
Operators are often the first line of defense when upset conditions occur in process or machinery. Consequently, the process control system typically becomes the single “dashboard” in which condition monitoring alarms are displayed. It is increasingly important for operators and control engineers to understand the adjustments that can be made to the process to mitigate asset malfunctions and to prolong asset life. And control and instrumentation engineers are often called upon to perform system integration between the condition monitoring system, the process control system, and the process historian system.
However, it should be noted that the unique data capture requirements of certain condition monitoring technologies warrant that they remain separate and distinct from the basic process control system, while still allowing integration between the two. For example, for machinery engineers to properly diagnose certain malfunctions, a continuous stream of high-resolution vibration data must be collected. The bandwidth and waveform sampling capabilities required are very similar to those employed in the world of consumer audio, which exceeds the capabilities of process control systems and historians.
By understanding the unique capabilities of various condition monitoring technologies, the knowledgeable control engineer will be less likely to use a process control or automation system for an application in which it is not well-suited.
Increasing emphasis
Condition monitoring continues to grow in emphasis for many users – often in direct proportion to the competitive environment within which their industry operates. This article has provided an introduction to basic condition monitoring, showing why it is applied, where it is applied, and how it is applied. It is instructive to reiterate the key points.
An FMEA study is typically performed in order to categorize assets according to criticality, to identify the appropriate technologies, and to apply the correct maintenance strategy to the correct assets based on their criticality.
Condition monitoring is not warranted for all assets; however, as technological capabilities increase and installation costs decrease, the population of assets for which condition monitoring can be justified continues to increase.
Finally, the relationship between process conditions and asset conditions is intimately intertwined, requiring that the control engineer understand how condition monitoring systems and process control and trending systems must work together for optimizing the overall performance of the plant.
-----------------
Steve Sabin is with the Optimization & Control division of GE Energy (www.ge-energy.com)
Effective Early Warning Thanks to an online condition-based monitoring system, annual maintenance expenses decreased significantly for this South China Sea oil producer. The CACT Operators Group, an international consortium comprising China National Offshore Oil Corporation (CNOOC), Agip (Italy), Chevron (USA) and Texaco (USA), the CACT Operator’s Group was formed to develop hydrocarbon resources in the Pearl River Basin of the South China Sea. The group’s first exploratory well was drilled in 1984 and today CACT produces more than 100,000 barrels of crude oil each day, destined for refineries in China. Ensuring continuous, safe operation of equipment is a particularly significant challenge for offshore oil producers like CACT. Not surprisingly then, maintenance is a top priority on offshore platforms where costly equipment is typically located hundreds of kilometers from shore. Efficient, non-stop operation of pipeline pumps is key to cost-effective oil production, and initially, preventive maintenance was CACT’s standard approach to keeping the pumps online. However, the effectiveness of CACT’s maintenance system was limited. CACT was forced to react to equipment problems that went undetected during routine inspections. First steps Looking to take a more proactive approach, it was in the mid-1990s when CACT first began to use condition monitoring products for the pumps on three of its offshore platforms, basing the system on portable data collection equipment, namely, Datapac and VISTeC. DataPac collects field data, including process variables and vibration information; VISTeC measures vibration in units of velocity and acceleration and can also take Spike Energy measurements, which can be used for early detection of surface flaws in rolling element bearings. While CACT was pleased with the performance of this offline system, there were limitations. For example, the Datapac portable could detect vibration changes but could not track and analyze the pump’s condition in real-time. Also, the data acquisition process itself was very costly, with CACT having to send maintenance personnel by helicopter to the platforms at a cost of US$2,000 per visit. The offline system was also not able to provide the real-time pump protection outlined by the American Petroleum Institute (API) Standard 670, the global standard for machinery protection. Going online Rockwell Automation, working with CACT to develop a solution that would meet the oil producer’s needs but still preserve the initial investment, proposed a remotely accessible, condition monitoring solution that would incorporate the existing portable equipment with an Enwatch system and XM modules. Via an onboard Ethernet network, the Enwatch system provides scheduled monitoring of all the pumps on the platform, with measurement parameters including vibration and process variables. On the same network, XM intelligent modules process critical parameters used to assess the current health — and predict the future health — of the pumps in real-time. The XM Series is comprised of DIN rail-mounted measurement, relays and process modules. Ideal for critical machinery, the XM system includes protection capabilities, which can be used to safely shutdown a machine before significant damage occurs. For example, upon detecting vibration outside of set parameters, it will send a signal to the motor control center (MCC) to turn off the relevant motors and protect the pump. Appropriately configured, XM meets the API 670 standard. CACT operators onshore can remotely configure the XM modules via a DeviceNet network and view the equipment status through PlantLink, which provides graphical representation of the health of all the machinery being monitored online. Information from the condition monitoring system is also integrated with Rockwell Software Maintenance Automation Control Center (RSMACC), where appropriate maintenance is scheduled in accordance with equipment requirements. Downtime down With the online condition monitoring system in place, CACT has eliminated the need for manual data acquisition – and the associated costs. Onshore operators 200 km from the platform collect, configure and analyze data just as they would on a local server. “The Rockwell Automation solution suits CACT. It saves a great deal of manpower and expense,” said Guo Jinwen, Maintenance Supervisor, CACT. “We can monitor equipment hundreds of kilometers away from our office. The solution preserved our initial investment, while meeting our expectations.” Since applying the Enwatch and XM systems, CACT has reduced their unscheduled downtime from 2.43 to 0.67 percent – a significant 72 percent decrease. In fact, during a five-year period, the system has prevented machines from catastrophic failures more than 20 times. And annual maintenance expenses have also decreased dramatically – the drop-off in service time allowing CACT to save US$100,000 in third party annual maintenance costs. Based on information provided by Rockwell Automation (www.rockwellautomation.com/services/conditionmonitoring)
- Share this article
- Got more on this story? Email Control Engineering Asia
- More About
- Process Control














