by Alan Friedman
Presented October 2009 at CMVA
Although it has been said many times, it can never be said enough, when it comes to vibration, trend, trend, trend! I cannot tell you how many times I have been asked a question like “What are the acceptable levels of vibration in a 200 HP air compressor” or “I have .02 in/s vibration on this blower, is that too much?” or someone has handed me a vibration graph and no other information and asked me what is wrong with their machine – heck if I know! What I hope to explain in this article is how vibration analysis works, and why, within reason, general alarm levels do not work, will not work, cannot work.
Vibration Analysis – The Big Picture
The Big Picture Diagram
Operating State / Test Conditions
Most of this article will pertain to Figure 1 above, so let’s begin with an explanation of this very important diagram. Beginning on the left we have “operating state / test conditions” – this can be thought of as input forces. If we are talking about a machine, say a motor driven pump, then this part of the diagram corresponds to the RPM and load of the machine during the test. Going into a bit more detail, we are putting a certain amount of electrical energy into this motor and both the motor and pump are under a certain amount of load. This is the driving force of the system. If we consider nothing else, it should be clear that if we put more RPM and load – more energy into this system, then we should get more vibration out of it. Think about this for a second. If more energy in equals more vibration out, then how can we create alarms or acceptable limits to vibration independent of test speed and load? If you really understand this, then you can probably stop reading now!
If we want to get into a bit more detail we can also understand that the machine is part of a system in that it is plugged into an electrical outlet that is part of a big electrical grid and the pipes coming out of the pump are connected to other pipes and valves and maybe other pumps that are contributing forces to this system. We can simplify this by stating that the operating state of the machine may be dependent on the larger process within the plant.
Moving to the right we have “Forcing Frequencies”, in the motor pump example, these include components in the machine, that when rotating or moving, will cause forces at particular frequencies. The most obvious of these will be the shaft itself, if it is rotating at 3600 RPM, the movement of the shaft at this frequency, caused by some amount of inherent unbalance, will transmit forces to the bearings at this frequency. If the motor is running a centrifugal pump with 6 blades, then the blade rate will be another forcing frequency causing vibration at 6x shaft rate. Six blades pass the pump outlet in one rotation of the shaft, causing 6 pulsations per revolution; the frequency of this is 6x shaft rate. If we had a gear with 32 teeth on the motor shaft, then 32 teethe will hit the secondary gear with every rotation of the motor shaft – these 32 hits are actual forces being transmitted through the machine. The topic of forcing frequencies is covered in all CAT II equivalent vibration courses.
The input forces in the machine now pass through the machine’s structure. Unbalance forces in the shaft cause the shaft to vibrate against the bearings, pulsations from the pump vanes pass into the pump structure, gear teeth meshing will send vibration through the gears to the shafts, into the bearings and then to the machine structure. Balls hitting pits or cracks on a bearing race will send pulses of energy into the bearing housing and from there into the machine structure. Don’t spend a lot of time contemplating this right now, but just imagine in one moment that the machine structure is made out of a really rigid metal like a bell, now imagine the machine structure is made out of rubber or firm Jello. Do you think it will vibrate differently in response to these forcing frequencies in these two cases? The rigidtiy or mobility of the structure will cause it to respond differently to the same input vibration, which is to say that the vibration will be effected by the characteristics of the machine structure.
If one wishes to measure the vibration in this machine or structure, one needs to use some sort of sensor and this sensor needs to be somehow connected to the structure. There are non-contact eddy probes, microphones and lasers that measure vibration without physically touching the machine structure – in the case of these sensors, we can just ignore sensor mounting and move to the next section. In this example, let’s say we are using an accelerometer and we have to decide what sort of mounting to use. Some common options are hand held probes, various sorts of magnet mounts (with or without permanently installed mounting pads), threaded screw mounts and soldered or permanent mounts. If we remember from our CAT II equivalent course, sensor mounting is discussed and it is noted that different mounting techniques give different frequency responses. This is to say that they have an effect on the measurements.
As mentioned in the prior section, there are a large number of options for sensors. Even if we stick to accelerometers for this example, there are high frequency accelerometers, low frequency accelerometers, general purpose accelerometers, charge amplified high temperature sensors etc. When we purchase a sensor of any type, it should come with a diagram that shows its frequency response characteristics (similar to its calibration information). This describes the frequencies at which the sensor measures accurately and the frequencies which it measures less accurately. No sensor measures perfectly at every frequency. There is always a trade off.
Think about putting the same CD in two different stereo systems – will they sound exactly the same? The music on the CD is just a series of 1’s and 0’s that don’t vary depending on the system on which it is played. Different stereos will sound differently however. This is because speakers (just like sensors or microphones) cannot perfectly reproduce every frequency as it appears on the CD. Big sub woofers do better at low frequencies; little tweeters do better at high frequencies. None of them are reproducing the music exactly as it appears on the CD however.
In any case, at this point in our diagram, a sensor is converting mechanical vibration energy into electricity, imperfectly at best, and transmitting this electric signal to a data collector. In conclusion, we can say that the sensor is effecting the measurement.
Within the data collector, one has presumably defined the type of test one wants to take. At this stage electrical signals are being transformed into digital signals which are then manipulated in a variety of ways depending on what aspects of the data we are interested in viewing. The vibration information will be sampled, filtered, perhaps integrated, transformed to a spectrum via an FFT, demodulated, converted into an RMS reading or any number of other things. The resolution of the data will depend on sampling rate and frequency range selections. Signal to noise ratio will depend on the number of bits the data collector has and how it handled auto ranging and gain settings. What we see in the “data” at the next step will largely depend on how we have configured the data collector to manipulate and present the data. In other words, our choice of data collection parameters, and the attributes of the data collector will effect the data.
Now we arrive at the “data” stage, perhaps in the form of a graph or set of graphs and at this stage, there are also a number of variables that will determine how the data appears or how it is presented. One can change units, scales, log, linear or DB, normalized plots, unnormalized plots, Hz, CPM, velocity, acceleration, metric units etc. The data we view at this stage will be limited to an extent by the decisions that were made at the last stage on what data to collect and with what resolution etc. In any case we can say that the way the data is presented can have an effect on how we view it or interpret it.
If we are not completely overwhelmed yet then we may remember that the reason we are doing this (in this example) is to turn our data into actionable information, i.e. to determine if this motor pump set has any mechanical problems, and if so, how bad they are, how long the machine may continue to run and what we should do next. We need to determine if, based on the data we have collected, the machine is healthy or not. And so, somewhere between “data” and “information”, analysis takes place. We are going to need some alarms or acceptance criteria as well as some idea of how to relate the information in the data back to the forcing frequencies and the structure. This relating back is how we will be able to determine what mechanical faults the machine has, if any.
Let’s review for a second, and although I don’t want you to feel confused, I do want you to feel just slightly overwhelmed. We are talking about a system here with a lot of variables and we are expecting to be able to view one portion of this system – i.e. the “data” and determine based on this if there are mechanical problems in the machine. What I would like to point out here is that at each stage in the diagram we have a new set of variables whose values will ultimately affect the data and yet we want to be able to view the data and be able to say with some amount of certainty that this machine has a problem! How are we ever going to be able to do that? How can we have one alarm that could possibly account for all of those variables and all of the changes this energy went through from the input forces to the vibration graph? How can we create one standard alarm that is independent of all of those variables?
To review: On the far left we have input forces, defined here as RPM, load and other process parameters. These input forces cause the shaft to rotate. The rotating shaft and other moving machine components create forces at various frequencies called forcing frequencies. These forces travel through physical material – i.e. the bearings, and the machine structure which includes the machine foundation and mounting. This structure could be like Jello or rubber or like a giant church bell for all we know – which is to say that it will respond in a certain way to these forces based on its own characteristics of rigidity and mobility, or mass, stiffness and damping. We are attaching a sensor that has its own response characteristics to the structure via some sort of mounting that will alter the vibration reading depending on its type. The sensor will convert this mechanical movement into electricity in an imperfect way and then this electricity will be sampled and digitized based on how we have programmed our data collector. The data will them be manipulated in a number of ways by various algorithms. Data will be displayed in a variety of ways depending on our test setup and graph preferences and then we will have to somehow convert this data into actionable information.
The point I am hoping to make here is that there are too many variables. The data we are going to analyze is dependent on all of the other stages leading up to its production. It will look differently if we use a hand held probe or a stud mounted sensor, if we test the machine at 3500 RPM or 3600 RPM, at 50 psi or 80 psi, with the sensor mounted near the bearing or on the motor cowling, with the sensor mounted vertically on top of the machine or horizontally on the side of the machine, if we look at 1,600 lines of data or 400 lines with an Fmax of 20,000 Hz or 400 Hz, if the machine is mounted on springs or on a cement base. The goal here is to look at the data and say “this data shows that the machine has a moderate amount of unbalance or misalignment or pump vane wear” – but, this is impossible to do when there are so many other factors that will affect the way the data appears.
So: What is the Solution?
Trend, trend, trend! Over the years, my soap box issue has been to explain why many PdM programs fail and I have seen over and over again that successful programs are the product of good methodology, good program organization and standard procedures – not vibration analysis prowess. Now, we can use the example and diagram in this article to explain how we can relate data to machine faults. The simple answer is by simply controlling all of the other variables we mentioned. If we control all of the other variables, then we can confidently say that a change in the data has been caused by a change in the machine (defined as the structure and the forcing frequencies). If we cannot or do not control all of the other variables, there will be no easy way to confidently relate the data to machine condition.
Thus, in a successful vibration monitoring program, each machine will be tested in exactly the same way every time. The operating state and test conditions will be well defined and the data collector person will not collect vibration data on the machine if it is not running in the proper state. In some cases, the machine will need to be setup or aligned in a specific way for testing. Test points should be well defined by permanently mounted test studs. Test procedures must be developed, documented and taught to plant personnel such that they will be adhered to month to month and year to year for many years hence no matter which individual is collecting the data.
Machine faults are related to the structure and the forcing frequencies, i.e. changes in either one may indicate a machine fault. Therefore, one must calculate the forcing frequencies in each asset (i.e. count the number of fan blades, gear teeth, pump vanes etc.) and document this information in your vibration analysis software.
Regarding test points, these should also be well documented so that in five years time, a new technician will test the machine in exactly the same place, preferably on a permanently mounted stud. And when this stud falls off or is removed, a new person in the plant will have the documentation required to replace the sensor mount in exactly the same place in order to remove this variable from the equation. Stud mounted sensors are preferred over magnet mounted sensors as they remove one more variable from the equation, an uneducated data collector person won’t be able to screw the sensor into anything but the test pad, whereas he may be able to stick a magnet mount anywhere on the machine.
Regarding sensors; pick an appropriate one and stick with it. Regarding the data collector, perhaps everyone will be impressed by your ability to take 100 different tests on a particular machine and display the data in 100 different ways. Avoid the temptation to play or if you must play, do so outside the context of your PdM program. Consult with your PdM service provider to select the most appropriate tests to take on this asset – balancing test time against program goals – and from then onwards always test the machine in exactly the same way. Everyone wants a data collector with a big screen and a thousand options, the people that understand PdM prefer a data collector that is a black box with a sensor and 1 button or barcode reader to collect data in a standard / predefined way for each test point.
Regarding data, always display your data in the same format with the same scaling. This will help train your mind to pick out the faults. A log or DB scale with normalized data will allow you to view all of the pertinent information at once without having to rescale to see tiny components (like bearing tones!) in a linear scale. Regarding alarms; use data from this machine, tested under exactly the same conditions, with all of the variables fixed and then look for changes over time. Data from the same machine tested in the same way will include and account for all of the variables we have discussed in this article. If none of the test variables change but the data looks different, we can confidently say that there is a problem.
What if the machine has a problem the first time we test it? Don’t worry about it, remember why we are doing this, we want to know if a machine is failing over time. If it is running today but has faults, it is still running. If its condition begins to change or its health deteriorates more, we should see changes in the data from this first test. This is the information that is important to us, these changes. At some point the machine will be overhauled or repaired, at this time the baseline can be reset to better describe the machine when it is healthy.
Is this really the only way?
This article is intended to convince you of the importance of testing machines in the context of a PdM program, over time, under repeatable conditions and looking for changes. This is the easiest, best and most cost effective way of monitoring ones equipment. That said, there are ways to troubleshoot machines, to glean from all of these variables some indication that a problem exists, to learn something about the structure, to identify forcing frequencies that would not be present at all if the machine were healthy and therefore indicate it is not, regardless of alarm levels. The problem is that not everyone who thinks they are capable of doing this is really capable of doing it and even if they are, in order to do it correctly, they will need to collect a lot of data and spend a lot of time looking at it. Then sometimes they also do a bit of guessing. Good mechanics can often make a good diagnosis without taking any data, so it is not beyond possibility. But, this is also not efficient, objective and applicable to maintaining 1000’s of machines with few personnel, whereas the approach I am promoting, is.
Trend! Trend! Trend! If you need to hire someone to run your PdM program, whether an in-house employee or external service provider, make sure they are taking the correct approach. If you consider the information in this article, it should be clear that 99% of the program is defining and documenting things like standard test conditions, fixed test points, forcing frequencies, baseline configuration etc. If done correctly, analysis becomes just a tiny part of the process (it can even be mostly automated) – just a matter of looking for changes in a standard set of graphs. So beware of analysts who seem to spend more time playing with their data collector and pouring over graphs than documenting standard test procedures and defining test points.
The key to a successful monitoring program is methodology, repeatability, organization and education; making sure that no matter who tests the machine or when, it is tested in exactly the same way so that the data is meaningful and can be trended, month after month, year after year. This way you fix all of the variables we have been talking about and include them in your baseline. Then you can confidently say that changes in vibration indicate a machine fault. Program starts and stops, changes in management, personnel and commitment to the program often cause programs to fail; this is also partly due to the technical constraints defined here. A lack of consistency cannot result in good results, this is just how it works, so, stop playing with your data collector and pouring over reams of graphs. Instead, spend some time organizing your PdM program or ask for help from someone with a long track record of managing successful programs. Good Luck!