The iSeries just sits there most of the time, plugging away without being challenged. Plus, the system never seems to go down. Why not use it to monitor those unreliable servers sitting around it? I created some CL's and one RPG program to loop though and check the e-mail route and each of our servers, routers, etc. The messages are then sent to our on-call digital phone (that is the reason for the cryptic messages) or any e-mail address we want to send to. I have included an e-mail checker and a device checker. We are able to use the e-mail checker due to having a second e-mail server that is not used. The messages are sent out through this server, not the one we are checking.
NetMonitor
1. Introduction
The NetMonitor program is intended to check the e-mail and specified IP addresses. The whole purpose is to detect and maintain the system with the highest availability possible.
2. Programs
MonitorM
This is the main program that will act as a platform to call other programs to perform special functions. MonitorM loops two different cycles, one for e-mails and one for devices. Data Areas determine each loop. The e-mail switches loop values based on whether the off hour time is hit or not. Off-hours are also determined through data areas. This will allow MIS to set a time when the cycle can be a little relaxed.
This program will send an e-mail to the Your Relay account, then wait.
Requires Free Membership to View
Register today to access targeted resources from our editorial writers and independent industry experts including news, tips, and advice to help you do your job more efficiently and effectively. Stay informed on the hottest topics and biggest challenges faced by IT professionals working with iSeries products and services.
MonoitorE
This program is called from MonitorM to check for the return e-mail from the Your Relay loop. All return messages are sent to QSYSOPR at Your400. This program also deletes the distributions for QSYSOPR, getting it ready for the next check. A variable is sent back to MonitorM to let the main program know if a return message was sent or not.
MonitorD
This program is also called out of MonitorM. This file reads through file DeviceD in Your Library. A ping is performed on each IP address. If the ping is unsuccessful, the loop-counter for that address is incremented. Once the defined threshold is met, a message is sent to the on-call phone, and the distribution list Your List on the AS/400, giving the IP address and Name of the device that is not responding. Also, the date and time of the outage is recorded in the message and a down flag is also set.
The message to the distribution list on the AS/400 is determined by a flag set on each device.
When a device has a down flag, the system goes into aggressive mode. Instead of pinging each defined cycle, the system will start pinging every minute until the device is up. This will happen no matter what the threshold value for the device is set to.
If the connection is subsequently restored, a message is sent to the on-call phone informing them of the recovery. The date and time of the recovery is also sent in the message.
However, if the device goes down and recovers before the threshold is reached, no message is sent.
The loop cycle value for devices is also stored in a data area.
DeviceFlg
This is the RPG program called out of MonitorD to increment the counter or reset it, depending on the situation. It also updates the file with the date and time of the initial outage and sets the down flag.
3. Files to Update
DeviceD
This is the Device Definitions file. Any device with an IP address can be defined in this file to allow MonitorD to check it.
Fields DEVADR - IP address of the device to check DEVNAM - Name of the device to check DEVFLG - Flag to allow the program to check or skip this device (Y - Check, N - Skip) DEVTHR - Threshold, set this to the number of cycles to wait to alert MIS (Each Cycle is determined by the value in the data area) DEVLPF - Loop Flag, leave as is. DEVDAT - The last date the line was not up. Leave as is for new devices. DEVTIM - The last time the line was not up. Leave as is for new devices. DEVDFL - This is the Down Flag used to send the program into aggressive mode for this device. DEVMSG - This is the flag used to determine if a break message is sent to the Your List distribution list or not.
4. Data Areas
NETEMAILSS - Net Monitor off hour Start Time This is set to the start time of the off-hour for e-mail checking. NETEMAILSE - Net Monitor off-hour End Time This is set to the ending time of the off-hour for e-mail checking. NETMONOFFH - Net Monitor Loop Value This is used to set the Email loop cycle value during off-hours. NETMONPRIM - Net Monitor Loop Value This is used to set the Email loop cycle value during prime time. NETMONDEVL - Net Monitor Loop Value This is used to set the Device loop cycle value.
To view the source code click here.
This was first published in July 2001