No matter what tools are available to assist you in problem determination you will have to select the appropriate ones when the problem occurs. Knowing how to approach problems logically is your best hope of resolving a problem.
A student once told me his first professor in the computer science curriculum said the three most important things you can learn about computers are:
• Read the Screen
• Read the Screen
• Read the Screen
He was right, many times administrators miss their most important clue because they did not fully understand the message or the error on the screen.
The source of your awareness of the problem, may help you determine which tools to use first. For instance a automated error report would logically direct you to the AIX error log.
When a user reports a problem, the odds are there really is one. On the other hand the problem often exists in a different place that the user has assumed. Make sure you have the clearest description of the problem possible. This means ask more questions.
People reporting problems do not normally intend to confuse the issue, but if you do not make sure you understand what the problem is before you stop asking questions, you will not be likely to find it.
The specialist was Leroy Kaump. I knew him from 1965 to 1985, and I never knew him to be wrong about a machine problem in all that time. He was the absolute best diagnostician I have ever encountered. Most of what I learned from Leroy was from observation and questioning. Leroy did not believe you could teach someone to be a better problem solver. He said, “You either have it or you don’t.” that is the one thing I believe Leroy was wrong about.
If you know that the answer to “What Changed” is most likely going to be nothing, it is tempting to just not ask it, and start at the machine. But the person reporting the problem knows more about the problem than he thinks he does. Your job is to discover what it is that he knows, because he may not be able to.
Most people learning something new are intimidated by a structured process. Not only do they have to learn how to approach it, but some guy is trying to get them to memorize a list of actions by their names.
But if you will follow this approach, including the note taking, and stay disciplined as you do so, I am confident you will be a better problem solver than you are today, no matter how good you are right now.
If your name is Leroy Kaump, I retract the previous statement.
Since it is almost impossible to get someone to tell you accurately what changed, your best option is to know what it should do in the circumstances. Comparing what it should do to what it is doing provides essentially the same information that an answer to “what changed” might have provided.
Yes, it was Leroy. The Area Technical Specialist on the system had worked on the problem for two hours and only knew the general area of the problem. But he had to leave for important personal reasons, and Leroy was the only person available to relieve him.
Leroy knew he could not solve the problem until he knew what was supposed to happen. So he set out to learn what he needed to know. He really did read the manual, find the problem and repair it in just over an hour.
By the way, you should note that a principle of operations manual exists for HACMP at the level you are using.
Don’t just make the list in your head, write it on a notepad. Make sure you review the list often in case you have missed something that could be the problem. As new possibilities present themselves write them on the pad also.
There are often several items that could cause a given problem. Some of them will also cause other symptoms as well. Make sure you understand all of the symptoms you are dealing with and the ones you are not. Eliminate the items that would have symptoms you do not see.
There are six books of commands for AIX, the HACMP software adds additional commands. Learn about the commands that will allow you to collect data. If you are not sure what command you need try the man –k option. The HACMP documentation includes a problem determination manual, it can help you determine which commands will be helpful while investigating a problem
If a problem takes more than a few minutes to solve you are not likely to remember everything you have done or what the results were. Get in the habit of making notes as you conduct tests. Document what you did and what the results were. When you need to know again, check your notes.
People who are not sure how to approach a problem are likely to just start trying things they have seen others do or have had success with in the past. If an action does not get them further toward the solution they abandon it, but seldom remember to undo any changes they have made. It is not surprising that they then notice a “change” in the problem.
If you use a process and it does not produce the results you want there are two possibilities, it might be the wrong process, or you may not have noticed all of the indications that would have made it work. Keep in mind that something has changed in the system, or it would still be doing what it has always done.