System Triage

System problems are inevitable. At some point the network will not respond, a disk will fail, data will be come corrupted, or perhaps data will be modified with no explanation who is changing it.

We are here to help solve your problems. This month I offer you planning suggestions and front-line troubleshooting steps that you can perform prior to logging a support call. Armed with this information together we can quickly and efficiently solve your problems.

Disaster Planning

Everyone should have a written disaster recovery plan. Moreover, everyone should perform occasional tests of the disaster recovery plan. In dynamic IT environments a plan can quickly become out of date.
Don’t forget that your we can help get your backup system up and running even if you are at a disaster recovery Hot Site.

Please take advantage of your AMS service by requesting a review of your backup strategy and how it fits into your overall disaster recovery plan. Backup validation testing is also included with your support as well.

Please remember that we offer Disaster Recovery Hot Site service contracts at very competitive rates.

Network Troubles

When users are unable to access the HP3000 the following items should be checked.
Answers to some of these questions help differentiate between a network problem, a performance problem, or a hung system.

  • Do you have a network diagram?
  • Is the console still functioning?
  • Are batch jobs still running?
  • Check the NIC in the 3000. Is the NET FAIL light flashing?
  • Does the hub or switch have a link light?
  • Are serial users or printers affected?
  • Does the problem affect local and remote users or just remote? All remote users or exclusively a specific location?
  • Has anything changed on the LAN? On the WAN? Items such as new routers, hubs, or switches, print servers, workstations.
  • Are other non-HP3000 servers or workstations experiencing the same difficulties?
  • Be prepared to provide output from the following commands:
    :linkcontrol @,all
     :nettool.net.sys "nameaddr;routing;routing;gatelist;quit"
     :nettool.net.sys "resource;display;quit"
    
  • Is the problem inbound, outbound, or both?
  • Is the problem limited to just one protocol (i.e. FTP, Telnet, VT.)
  • Can you ping the device by name? By IP address?
    :nettool.net.sys "ping;ping 192.168.52.8;quit"
    :nettool.net.sys "ping;ping ws42.beechglen.com;quit"
  • Can you reproduce the problem from another machine on the same subnet as the HP. For instance, if you cannot connect via FTP to a server from the HP, can you from a workstation?
  • Are there any out of the ordinary console messages?

Armed with answers to these questions we can promptly resolve your network problems. And don’t forget we are more than happy to speak directly with your network administrator or network consultants. Sometimes MPE has a different name for a concept or protocol. For instance what is known commonly as an ARP Table is called a mapping table on MPE. We can help translate between their nomenclature and MPE to facilitate faster problem solving.

For some problems access to a network analyzer or packet sniffer can be very important. If you don’t have one you can build one on a Linux system using tcpdump or ethereal. Alternatively you can get an inexpensive packet sniffer that runs on Win98/NT/2K from UFASOFT (http://www.ufasoft.com/sniffer/) I have used it myself. It has decent filtering capabilities and you can’t beat the price at $39.00.

Data Corruption

Data corruption comes in many flavors. Here are some examples and how together we would solve your problem.

  • Missing files. Files on the system have been purged and nobody can say who or why.
    We would enable File Close Logging and analyze the system log files for the who (user), what (program), and when the files were purged. Unfortunately if File Close Logging was not enabled previously there is no way to go back after the fact to determine who purged a file. In that case we would analyze system backup listings to ascertain the latest occurrence the file(s) was backed up.
  • Directory errors
    Be prepared to run FSCHECK to verify the extent of directory corruption. Some problems are resolved by FSCHECK, other require intervention.
  • Data in Image databases changing (or worse, disappearing) unexpectedly
    Enable Image Transaction Logging to record all modifications to the Image database(s) in question. We can assist you in analyzing the log files to zero in on exactly when the modification occurred.

Contrary to popular belief, Image Logging has very little impact on the system. There is approximately 30% overhead for Image Logging. In reality that is 30% of the time spent in DBPUT, DBUPDATE, and DBDELETE. Programs spend much more of their time gathering data (DBFIND, DBGET) and manipulating it than updating. Not to mention all of the other non-Image related processes the system deals with. The overhead for Image logging is actually somewhere less than 10% on most systems.