Ken Graap, System i backup and recovery expert, has been answering Search400.com's member questions for more than five years. Ken has seen many of the same issues and concerns asked over and over. Therefore, we've compiled the top 10 questions asked about system i backup and recovery for you. Do you have a question that's not listed here? Ask Ken your backup or recovery questions.
TABLE OF CONTENTS
1. Can the OS/400 SAVSYS command be run unattended as a submitted batch job?
2. How do I automate a full system backup on my i5 server?
3. How do I prepare my system for a Save_While_Active save?
4. How do I save and restore spool file data as part of my backup/recovery process?
5. How can I set up journaling for my production files?
6. How can I use BRMS to manage my tape library?
7. Why does my BRMS backup process run hours longer than before I used BRMS?
8. How do I know if I have completely saved my system?
9. Should operators and system administrators be the only users authorized to *SAVSYS authority?
10. During a disaster is it important to restore objects in a certain order?
|1. Can the OS/400 SAVSYS command be run unattended as a submitted batch job?||Return to Table of Contents|
This question is probably one of the most common questions asked about backup recovery on the iSeries/i5.
The native SAVSYS process on the i5 system can be run only from the system console while the system is in a restricted state. It cannot be submitted to batch in the QCTL subsystem.
Several third-party backup recovery products have a feature where the SAVSYS process can be run unattended in a controlled interactive environment, but it still requires that it be run from the system console in a restrictive state. One recent enhancement: In V5R3, Backup Recovery & Media Services (BRMS) allows you to submit a batch job to the controlling subsystem even in a restrictive state.
|2. How do I automate a full system backup on my i5 server?||Return to Table of Contents|
The backup process on an i5 server can run for a long time. Especially if your have a medium to large server environment with lots of data. Most operators and system administrators don't want to hang around while a system backup runs for 10 hours!
Therefore, automating this process is high on the list of priorities for just about every systems administrator. IBM has provided basic save processes within OS/400 via the SAVE menu. This menu can be accessed with the command GO SAVE.
If you page down through this menu, you will find a section title "Save System and User Data." Option 21 "Entire System" can be used to completely backup your i5 server. When Option 21 is executed the following things happen:
End all subsystems
Save the Licensed Internal Code
Save the operating system
Save the security data
Save the device configuration objects
Save all user libraries (including libraries for licensed programs)
Save all documents and folders
Save all distribution and mail objects
Save all directories
Start the controlling subsystem
This process will use the defaults defined via Option 20 "Define save system and user data defaults". All of these steps will execute from an interactive job running at the system console.
The advantage to running an Option 21 process is that everything on your system (except for the contents of job queues, output queues, or data queues) will be completely saved.
The major disadvantage is that the system will remain unavailable to users during the entire backup process.
If further automation is required, IBM (and of course it is) and several third-party vendors have created very comprehensive program products for backing up and restoring an i5 server.
These products utilize system functionality such as journaling, Save_While_Active and a secured interactive console environment that allows you to develop the completely automated backup process you probably want to have on your midsize to large i5 server.
You may want to consider what I have done to insure a complete backup of your i5 system.
I run SAVSYS only once a month along with a full IPL of the system in order to apply *DELAYED PTF's. The monthly SAVE process is an "attended" event. However, the operator who runs it is usually onsite for about two hours only as the system IPL's, and the SAVSYS command runs. If there are any issues, someone is there to attend to them immediately. When the SAVSYS command finishes and our SAVE_WHILE_ACTIVE backups reach their synchronization point, the operator can leave when he has determined that our users have access to the applications. The bakups continue to run throughout the night unattended. We have a relatively large system, so the entire BU process can take up to 10 hours. If a problem does occur during this phase of the process, someone is contacted via pager so they can connect from home to resolve the issue.
Between full system saves, I run automated BRMS application backups along with the SAVSECDTA and SAVCFG commands.
I've found this strategy to work very well for me.
|3. How do I prepare my system for a Save_While_Active save?||Return to Table of Contents|
The Save_While_Active (SWA) feature of OS/400 is a wonderful tool to use for saving an i5 system.
Some things to consider before you try this are:
1. Implementing SWA can be a bit complicated to configure but well worth the trouble.
2. You could configure your backup process to eliminate ALL downtime, but in my opinion it isn't worth the effort.
3. Quiesce your system for a short time instead, in order to obtain a synchronization point.
4. In the event of a recovery, you will be able to recover data to the time of this synchronization point.
5. You can NOT use SWA for SAVSYS. To execute a SAVSYS command still requires that the system be in a restrictive state.
6. As with any backup, run it during a time of light system use.
7. Because of the extra processing required for SWA, processing your backup will run longer, but users can use the system while it is running.
6. Full implementation details can be found in the OS/400 Backup and Recovery Manual, SC41-5304.
In my opinion it is best to quiesce the system prior to running a SWA backup.
Quiesce is a French word. In the context of an i5 backup, it means, "to bring the system to a quite state." One in which there is very little activity occurring on the system. This is not the same as a "restrictive state," which as you know, is required for a SAVSYS process.
There is a very important difference between quiesced and restrictive states. If you don't take the system down to a restrictive state, you can still run a backup process as an unattended batch job. (Note: In V5R3, BRMS allows you to submit a batch job to the controlling subsystem even in a restrictive state.)
Prior to starting my unattended SWA backup process, I end the following subsystems:
QINTER, QSPL, QHTTPSVR, QSNADS, QSVCDRCTR, QSERVER and QUSRWRK.
I also end the Mail Server Framework via the ENDMSF command.
I'm now in a "quiesced state." Note, my backup process is still running in the QBATCH subsystem.
In my opinion, it is very important that you have journaling active for any backup/recovery process, too. If you don't, you can never recover any changes made between backups. However, it isn't a requirement of SWA if you "quiesce" the system prior to starting the backup. Once the SWA process has established a synchronization point, you can resume normal system operations. If you had to use these backup tapes to recover from a disaster though, you would only be able to recover to the time of the SWA synchronization. If you had journaling active you would be able to recover to the point of your last journal receiver backup.
|4. How do I save and restore spool file data as part of my backup/recovery process?||Return to Table of Contents|
i5/OS spool support is just a collection files, members and data records. Output queues contain spool file entries that are pointers to this file system. Therefore, spool files are not individual objects that can be saved like other objects on the system. Any save/restore process prior to V5R4 of OS/400 is unable to restore spool file data and maintain the original user job attributes. Once a spool file is restored, the original user won't be able to access it using the WRKSPLF *CURRENT command.
IBM's BRMS product has implemented a solution that allows you to archive spool file data, search for these archived spool files based on several spool file attributes and recreate the spool file.
Type choices, press Enter.
Output queue . . . . . . . . . . OUTQ *ALL Library . . . . . . . . . . . *ALL Auxiliary storage pool . . . . . ASP *ALL File . . . . . . . . . . . . . . FILE *ALL Job name . . . . . . . . . . . . JOB *ALL User . . . . . . . . . . . . . . USER *ALL User data . . . . . . . . . . . USRDTA *ALL Select dates: SLTDATE From date . . . . . . . . . . *BEGIN To date . . . . . . . . . . . *END Save status . . . . . . . . . . SAVSTS *ALL Sequence option . . . . . . . . SORT *DATE From system . . . . . . . . . . FROMSYS *LCL Output . . . . . . . . . . . . . OUTPUT *
Again, this archive is just a 'copy' of the spool files data along with certain format related attributes. Any restore (actually recreation) of the spool file will still result in the loss of original job related attributes.
The latest release of OS/400 (V5R4) does contain a new feature that allows you to save an *OUTQ and the spool file data referenced by it. This new functionality has also been integrated into the latest release of BRMS. This native operating system enhancement only allows saves of entire output queues and can't be used to save individual spool files. Again the BRMS product comes to the rescue. Even though you can only save whole output queues, BRMS provides an interface that lets you find and restore individual spool files back into the same output queue they were saved from. Along with the new spool file save capability there are a couple of new printer file attributes:
Expiration date for file (EXPDATE)
Specifies the expiration date for the spooled file. The spooled file will expire at 23:59:59, system local time on the date specified.
The expiration date does not change.
No expiration date is specified.
The expiration date is to be calculated using the value specified for the
Specify the date after which the spooled file will be eligible for removal from the system by the Delete Expired Spooled Files (DLTEXPSPLF) command. The date must be enclosed in apostrophes if date separator characters are used in the value.
Days until file expires (DAYS)
Specifies the number of days to keep the spooled file.
Note: A value must be specified for this parameter if the Expiration date for file (EXPDATE) parameter has a value of *DAYS. If the EXPDATE parameter has a value other than *DAYS, no value is allowed for this parameter.
Specify an interval in days after which the spooled file will be eligible for removal from the system by the Delete Expired Spooled Files (DLTEXPSPLF) command. The actual expiration date applied to the spooled file is calculated by adding the number of days specified to the date this command is executed.
The DLTEXPSPLF command has been added to V5R4. It will evaluate the spool files on the system and delete them when they have expired.
The following command can be used to create a job schedule entry which causes the DLTEXPSPLF command to delete all expired spooled files on your system every day:
ADDJOBSCDE JOB(DLTEXPSPLF) CMD(DLTEXPSPLF ASPDEV(*ALL)) FRQ(*WEEKLY) SCDDATE(*NONE) SCDDAY(*ALL) SCDTIME(010000) JOBQ(QSYS/QSYSNOMAX) TEXT('DELETE EXPIRED SPOOLED FILES SCHEDULE ENTRY')
It took 30 years, but IBM finally has the System Administrator in a good way to manage system spool files. Good job IBM!
|5. How can I set up journaling for my production files?||Return to Table of Contents|
The preferred way is to do this by library. The steps for setting this up are quite easy, too. Here is how I do it on my system. (Note: I do this process for each individual application library I want to start journaling on):
1. Create a message queue to receive journal messages:
2. Create a journal receiver (Tip: Put receivers in a different library)
3. Create a journal referencing the receiver just created:
CRTJRN JRN(App1Lib/App1JRN) JRNRCV(JrnRcvLib/App1JR0001) MSGQ(QGPL/Jrn_Name) MNGRCV(*SYSTEM) Lets the system manage the receiver names DLTRCV(*NO) RCVSIZOPT(*RMVINTENT *MAXOPT2)
4. Start journaling for all files in the library (Note: STRJRNLIB is a TAATOOL. If you don't have TAATOOLS on your system then you will have to start journaling manually using the appropriate system commands –
STRJRNAP Start Journal Access Path STRJRNOBJ Start Journal Object STRJRNPF Start Journal Physical File STRJRNLIB LIB(App1Lib) JRN(App1Lib/App1JRN) STRJRNAP(*NO) IMAGES(*BOTH) OMTJRNE(*OPNCLO)
You might notice that in step four I also specified DLTRCV(*NO) ... This is because I prefer to delete journal receivers after I have saved them to tape. I don't want the system to automatically delete them for me.
Journaling your production object is a good way to enhance your ability to recover in the event of an outage. However, if you lose the system and you haven't saved your receivers to tape, you're out of luck. Save your receivers to tape often and move these tapes off site along with your regular backup tapes. Even better yet, use "remote journaling" to replicate your journal receivers to another system, in real time. (That's a whole topic I won't go into right now though.)
Another advantage to journaling data is that you have a detailed audit record of everything that has happened to your production data. This can be an invaluable resource to tap when you are trying to figure out "what happened" when something goes wrong. The performance penalty for journaling has been decreasing significantly with just about every new release of OS/400. In today's environments, with faster disk drives and processors, I'd venture to say you wouldn't even be able to notice that you have turned on journaling.
I personally have created a separate ASP (Auxiliary Storage Pool) for my production journal receivers, too. This moves the disk IO associated with journaling to a separate set of disk arms. It also provides additional protection of your data. For example, if you were unfortunate enough to lose two disk drives in a RAID set that contained your production data your would have to recover this data from tape. If your receivers are in a separate ASP, then after restoring your data, you could apply the journaled transactions thereby fully recovering your production application data to the point of failure.
You ask, "How much disk space will this take?" ... It all depends on how "active" your data is. If you have 2,500 changes a day to a journaled set of objects, very little disk space is required. If you have 25,000.000 changes a day (like we do), you could eat up 60-100GB of disk space every few days! You do have control over how much disk space is used, though. Back up your receivers to tape often, and then remove them from disk using the DLRJRNRCV command.
|6. How can I use BRMS to manage my tape library?||Return to Table of Contents|
A while back I was asked the following question:
"We're implementing BRMS on one of our machines. We have 31 tapes pre-assigned for 3583. We use a tape library and the required volumes will be loaded into the library on a daily basis. How can we force BRMS to write specific data, to a specific tape, on a specific day, for incremental backups?"
I thought, "What a perfect time to describe how BRMS manages tapes!" I suggested that this system administrator consider this...
Instead of trying to "force" BRMS to work with your old BU tape control process, let BRMS work as it is designed. Let BRMS decide where to place data on your backup tapes. All you have to do then is assign unique volume serial numbers to each tape and make sure "scratched" tapes are mounted on your tape drive prior to running a backup.
Having BRMS manage your tape media and the placement of data on each volume, is just like allowing OS/400 to manage your disk drives and the placement of data on them. It doesn't mater where the data is put, as long as you can easily find it when you need it. Would you ever consider trying to force data stored on disk to a particular disk drive? Absolutely not! The system can do it much more efficiently that you ever could. The same idea applies when using tape storage and BRMS.
You don't need to worry about mounting certain tapes for certain days or placing certain kinds of data on specific tapes. BRMS will remember where everything is for you.
Let BRMS schedule moving backups data offsite, too.
Once you set up the BRMS policies to accomplish all of this, your backups will run themselves -- it is really slick when it is all set up.
|7. Why does my BRMS backup process run hours longer than before I used BRMS?||Return to Table of Contents|
This great question came my way last year. It provided me the opportunity to explain a rather hidden 'feature' of BRMS.
"I am using BRMS to do a *SAVSYS, but it takes to long so I kill it. I use the BRMS Console. If I use a standard SAVSYS from a straight forward save, it only takes about 10 minutes max. After I kill the BRMS save, I check the tape and all the saves are on there. What's going on?
There is some additional overhead associated with a BRMS save, however, the only scenario I can think of that would add a lot of time to your SAVSYS would be if you have a system with multiple Auxiliary Storage Pools. In this kind of configuration, BRMS has a default option to save authority for every object in each user ASP. If you have a lot of ASP's, with a lot of objects, this can take considerable time. There is a way around this though. You can tell BRMS NOT to save authority data for users ASP's.
From your main BRMS menu enter:
From the Backup Policy Menu, take Option 2. Work with items to omit from backup.
Add the following entries to omit saving user ASP authority from a SAVSYS or SAVSECDTA save:
Type Backup item
Your backup should run much faster now!
|8. How do I know if I have completely saved my system?||Return to Table of Contents|
There is a great toolset available for purchase called TAATOOL. It contains a command called CHKSAV that will check to see if you have saved all the libraries on your system! If you aren't able to acquire this set of tools, you could create something that might suffice using the DSPOBJD command to display information for all the libraries on your system to an OUTFILE and then query this file for "SAVE Date / SAVE Time / SAVE Command" etc...
Example: DSPOBJD OBJ(QSYS/*ALL) OBJTYPE(*LIB) OUTPUT(*OUTFILE) OUTFILE(QTEMP/XX)
runqry *none xx
You will then see all the information available to you. You will probably want to select the following fields for a report:
Display Display Object Object Date Time Type 051605 111141 MyLib *LIB Save Save Save Save Saved Date Time Command Device Volume 051505 212320 SAVLIB Tape E00140
Your auditors will then be able to see when each library on your system was saved, which SAV command was used and what tape volume this library was saved to. Of course, my favorite backup/recovery product IBM's BRMS, provides a set of comprehensive reports that show what is saved but more importantly what hasn't been saved on your system. Along with these reports is the system recovery report that provides step-by-step instructions on how to completely recover your system in the event of a catastrophic disaster.
|9. Should operators and system administrators be the only users authorized to *SAVSYS authority?||Return to Table of Contents|
As with most special user authorities, the *SAVSYS special authority grants a significant level of additional capability to a user and you should think twice about granting it to anyone.
Any user of the iSeries has the necessary rights to use the OS/400 SAV* commands. Therefore, they can save any objects they have sufficient authority too. Sufficient authority in this case would be having *OBJEXIST rights to the object. Object existence rights provide authority to control the object's existence and ownership.
By default, users have *OBJEXIST rights to all the objects they own. Therefore, they can save any of their own data. They don't have the authority to restore anything though because access to the RST* commands are *PUBLIC *EXCLUDE by default.
The *SAVSYS special authority gives a user the additional ability to save objects they don't have *OBJEXIST rights to.
One very important implication of this is the fact that when an object is saved, you can also specify that it's storage be freed.
IBM's definition of this feature is: "Freeing storage during a save means that the storage occupied by the data portion of the specified objects being saved is freed as part of the save operation."
In affect, you can remove from the system everything but the header of an object saved in this manner.
Therefore someone with *SAVSYS authority could do the following:
1. Save any object on the system to a save file specifying that its storage be freed.
2. Delete the save file.
Wow! ... Anyone with *SAVSYS authority has the ability to delete any object from the system.
Do you really want anyone other than an operator or system administrator to have this kind of authority on your system?
|10. During a disaster is it important to restore objects in a certain order?||Return to Table of Contents|
Restore things in this order:
Note: If all objects are in the same library, they will automatically be restored in the correct order.
2. Based-on physical files
3. Other journaled objects associated with those journals (*DTAARA's and stream files)
4. Dependent logical files
5. Journal receivers
Note: Journal receivers can be restored at any time after the journals. They do not have to be restored after the journaled objects.
This information is available with additional details, in this IBM manual:
Backup and Recovery
The "bible" of save/restore for the iSeries system!