System Administration Guide

Administration tasks

In normal operations, only a minimal amount of administration is necessary after you have completed the configuration steps. The administration job is made easier because the queue manager tolerates database managers not being available. In particular this means that:

The queue manager can start at any time without first starting each of the database managers.
The queue manager does not need to stop and restart if one of the database managers becomes unavailable.

This allows you to start and stop the queue manager independently from the database managers, and vice versa if the database manager supports it.

Whenever contact is lost between the queue manager and a database manager, they need to resynchronize when both become available again. Resynchronization is the process by which any in-doubt units of work involving that database are completed. In general, this occurs automatically without the need for user intervention. The queue manager asks the database manager for a list of units of work which are in doubt. Next it instructs the database manager to either commit or roll back each of these in-doubt units of work.

When the queue manager stops, it needs to resynchronize with each database manager instance during restart. When an individual database manager becomes unavailable, only that database manager needs to be resynchronized the next time the queue manager notices that the database manager is available again.

The queue manager regains contact with an unavailable database manager automatically as new global units of work are started. Alternatively, you can use the rsvmqtrn command to resolve explicitly all in-doubt units of work.

In-doubt units of work

A database manager might be left with in-doubt units of work if contact with the queue manager is lost after the database manager has been instructed to prepare. Until the database manager receives the commit or back out (rollback) outcome from the queue manager, it needs to retain the database locks associated with the updates.

Because these locks prevent other applications from updating or reading database records, resynchronization needs to take place as soon as possible.

If for some reason you cannot wait for the queue manager to resynchronize with the database automatically, you can use facilities provided by the database manager to commit or roll back the database updates manually. This is called making a heuristic decision. Use it only as a last resort because of the possibility of compromising data integrity; you might end up committing the database updates when all the other participants roll back, or vice versa.

It is far better to restart the queue manager, or use the rsvmqtrn command when the database has been restarted, to initiate automatic resynchronization.

Displaying outstanding units of work

While a database manager is unavailable, you can use the dspmqtrn command to check the state of outstanding units of work involving that database.

When a database manager becomes unavailable, before the two-phase commit process is entered, any in-flight UOWs in which it was participating are rolled back. The database manager itself rolls back its in-flight units of work when it next restarts.

The dspmqtrn command displays only those units of work in which one or more participants are in doubt, awaiting the commit or rollback from the queue manager.

For each unit of work, the state of each participant is displayed. If the unit of work did not update the resources of a particular resource manager, it is not displayed.

With respect to an in-doubt unit of work, a resource manager is said to have done one of the following things:

Prepared: The resource manager is prepared to commit its updates.
Committed: The resource manager has committed its updates.
Rolled-back: The resource manager has rolled back its updates.
Participated: The resource manager is a participant, but has not prepared, committed, or rolled back its updates.

The queue manager does not remember the individual states of the participants when the queue manager restarts. If the queue manager restarts, but cannot contact a database manager, the in-doubt units of work in which that database manager was participating are not resolved during restart. In this case, the database manager is reported as being in prepared state until such time as resynchronization has occurred.

Whenever the dspmqtrn command displays an in-doubt UOW, it first lists all the possible resource managers that could be participating. These are allocated a unique identifier, RMID, which is used instead of the Name of the resource managers when reporting their state with respect to an in-doubt UOW.

Figure 25 shows the result of issuing the following command:

dspmqtrn -m MY_QMGR

Figure 25. Sample dspmqtrn output

AMQ7107: Resource manager 0 is WebSphere MQ.
AMQ7107: Resource manager 1 is DB2 MQBankDB
AMQ7107: Resource manager 2 is DB2 MQFeeDB
 
AMQ7056: Transaction number 0,1.
    XID: formatID 5067085, gtrid_length 12, bqual_length 4
         gtrid [3291A5060000201374657374]
         bqual [00000001]
AMQ7105: Resource manager 0 has committed.
AMQ7104: Resource manager 1 has prepared.
AMQ7104: Resource manager 2 has prepared.

The output from Figure 25 shows that there are three resource managers associated with the queue manager. The first is resource manager 0, which is the queue manager itself. The other two resource manager instances are the MQBankDB and MQFeeDB DB2 databases.

The example shows only a single in-doubt unit of work. A message is issued for all three resource managers, which means that updates were made to the queue manager and both DB2 databases within the unit of work.

The updates made to the queue manager, resource manager 0, have been committed. The updates to the DB2 databases are in prepared state, which means that DB2 must have become unavailable before it was called to commit the updates to the MQBankDB and MQFeeDB databases.

The in-doubt unit of work has an external identifier called an XID. This is the identifier that DB2 associates with the updates.

Resolving outstanding units of work

The output shown in Figure 25 shows a single in-doubt unit of work in which the commit decision has yet to be delivered to both DB2 databases.

In order to complete this unit of work, the queue manager and DB2 need to resynchronize when DB2 next becomes available. The queue manager uses the start of new units of work as an opportunity to regain contact with DB2. Alternatively, you can instruct the queue manager to resynchronize explicitly using the rsvmqtrn command.

Do this soon after DB2 has been restarted so that any database locks associated with the in-doubt unit of work are released as quickly as possible. Use the -a option, which tells the queue manager to resolve all in-doubt units of work. In the following example, DB2 has restarted, so the queue manager can resolve the in-doubt unit of work:

> rsvmqtrn -m MY_QMGR -a
Any in-doubt transactions have been resolved.

Mixed outcomes and errors

Although the queue manager uses a two-phase commit protocol, this does not completely remove the possibility of some units of work completing with mixed outcomes. This is where some participants commit their updates and some back out their updates.

Units of work that complete with a mixed outcome have serious implications because shared resources are no longer in a consistent state.

Mixed outcomes are mainly caused when heuristic decisions are made about units of work instead of allowing the queue manager to resolve in-doubt units of work itself.

Whenever the queue manager detects heuristic damage it produces FFST information and documents the failure in its error logs, with one of two messages:

If a database manager rolled back instead of committing:

AMQ7606 A transaction has been committed but one or more resource
        managers have rolled back.

If a database manager commits instead of rolling back:

AMQ7607 A transaction has been rolled back but one or more resource
        managers have committed.

Further messages identify the databases that are heuristically damaged. It is then your responsibility to locally restore consistency to the affected databases. This is a complicated procedure in which you need first to isolate the update that has been wrongly committed or rolled back, then to undo or redo the database change manually.

Damage caused by software errors is less likely. Units of work affected in this way have their transaction number reported by message AMQ7112. The participants might be in an inconsistent state.

Figure 26. Sample dspmqtrn output for a transaction in error

dspmqtrn -m MY_QMGR
 
AMQ7107: Resource manager 0 is WebSphere MQ.
AMQ7107: Resource manager 1 is DB2 MQBankDB
AMQ7107: Resource manager 2 is DB2 MQFeeDB
 
AMQ7112: Transaction number 0,1 has encountered an error.
    XID: formatID 5067085, gtrid_length 12, bqual_length 4
         gtrid [3291A5060000201374657374]
         bqual [00000001]
AMQ7105: Resource manager 0 has committed.
AMQ7104: Resource manager 1 has prepared.
AMQ7104: Resource manager 2 has rolled back.

The queue manager does not try to recover from such failures until the next queue manager restart. In Figure 26, this would mean that the updates to resource manager 1, the MQBankDB database, would be left in prepared state even if the rsvmqtrn was issued to resolve the unit of work.

Changing configuration information

After the queue manager has successfully started to coordinate global units of work, do not change any of the resource manager configuration information.

If you need to change the configuration information you can do so at any time, but the changes do not take effect until after the queue manager has been restarted. For example, if you need to alter the XA open string passed to a database manager, you need to restart the queue manager for your change to take effect.

If you remove the resource manager configuration information for a database, you are effectively removing the ability for the queue manager to contact that database manager.

Never change the Name attribute in any of your resource manager configuration information. This attribute uniquely identifies that database manager instance to the queue manager. If you change this unique identifier, the queue manager assumes that the database manager instance has been removed and a completely new instance has been added. The queue manager still associates outstanding units of work with the old Name, possibly leaving the database in an in-doubt state.

Removing database manager instances

If you need to remove a database or database manager from your configuration permanently, ensure that the database is not in doubt before you restart the queue manager. Most database managers provide commands for listing in-doubt transactions. If there are any in-doubt transactions, allow the queue manager to resynchronize with the database manager before you remove its resource manager configuration information.

If you fail to observe this procedure the queue manager still remembers all in-doubt units of work involving that database. A warning message, AMQ7623, is issued every time the queue manager is restarted. If you are never going to configure this database with the queue manager again, use the -r option of the rsvmqtrn command to instruct the queue manager to forget about the database's participation in its in-doubt transactions. The queue manager only forgets about such transactions when syncpoint processing has been completed with all participants.

There are times when you might need to remove some resource manager configuration information temporarily. On UNIX systems this is best achieved by commenting out the stanza so that it can be easily reinstated at a later time. You might decide to do this if there are errors every time the queue manager contacts a particular database or database manager. Temporarily removing the resource manager configuration information concerned allows the queue manager to start global units of work involving all the other participants. Here is an example of a commented-out XAResourceManager stanza follows:

Figure 27. Commented- out XAResourceManager stanza on UNIX systems

# This database has been temporarily removed
#XAResourceManager:
#  Name=DB2 MQBankDB
#  SwitchFile=/usr/bin/db2swit
#  XAOpenString=MQBankDB

On Windows systems, you cannot comment out configuration information. Instead, use the WebSphere MQ Services snap-in to delete the information about the database manager instance.