![]()
![]()

![]()
![]()

![]()
![]()
Beechglen
Development Inc.
3862
Race Road - Cincinnati, Ohio 45211 - (513) 922-0509 - Fax (513)
347-2834
The online home of the
company bringing you quality HP hardware and software support
![]()
| Handling System Aborts and System Failures |
|
Computer system interruptions, while rare, are unfortunately, inevitable. It really isn’t a question of if you will have one, but when. For the purposes of this letter, system interruptions consist of system aborts/failures, system hangs, and hardware failures. My goal is to give you general guidelines and procedures for handling such interruptions. You may already have written procedures in place. Hopefully the procedures contained herein can be an extension to your documentation leading to smoother transitions from a down system to one that is usable and stable. I will begin by defining each type of interruption. A system abort or system failure is MPE’s defense mechanism. When the operating system determines that an event has occurred that is unexpected and uncorrectable and may cause data integrity problems it aborts the system to protect itself. A message appears on the console with the abort or failure number and all system activity is halted and a red halt light should appear on the front of the system. This varies depending upon the model HP 3000 you are running. Furthermore, on an MPE/iX system there should be a rolling hex display on the console. A memory dump can tell which program/process/session/job was executing when the system aborted. System hangs are when the machine appears to be running and the green run light is still lit, but you cannot access it. This can mean there is a process running that is consuming all CPU time, not leaving any available for even high priority system processes. Another type of hang is when the entire system is paused, waiting for some resource to become available. These resources can range from system buffers to console messages to I/O on a disk drive that is offline. Hung systems are the most difficult problems to diagnose. Hardware failures can be categorized into two groups. Some failures that occur on hardware contained in the system such as device adapters and controllers are detected by the CPU itself (not MPE) and will immediately halt the system. You may have seen these with such messages as ‘WCS Parity Error’ or ‘Machine Check’ or simply a rolling hex display on the console that includes ‘FLT DEAD’ These types of errors should be reported to your hardware support supplier. Other times hardware failures are not detected at the CPU level but MPE detects the corruption and causes a System Abort (see above.) What should you do if you experience a system interrupt? First, don’t panic. Most system aborts are isolated events that do not recur and can be ignored. A motto I frequently use is "Once is a fluke, twice is a trend" can be applied to system aborts. If, however, you have a rash of system aborts then certainly they should not be ignored. When you call to report a system interruption you should be prepared to answer the following questions:
Depending on the answers to the above questions it may be suggest that you take a memory dump. A memory dump will dump out the system state to tape(s) and often is the only recourse getting to the root of a problem. Depending on the model and amount of memory in the system this process can take from 15 minutes to 45 minutes and may weigh heavily in the decision process. Can you afford for your system to be down for that length of time for a possible one-time occurrence? If a memory dump is deemed necessary the following procedure can be followed at the system console:
Reading a memory dump can be a fairly CPU intensive process and may have negative impact on your system. In most cases we will ask you to send us the dump via overnight delivery service so we can view it on a system in our lab. We have found this to be the most effective method for reading memory dumps. While HP 3000 hardware is extremely durable and the MPE operating system is probably the most reliable and stable in the business, failures are inescapable. Please make sure you have an action plan to follow when it occurs. |
Send mail to
webmaster@beechglen.com with
questions or comments about this web site.
Copyright © 2006
Beechglen Development Inc.
Last modified:
01/20/06