Locally determined errors

The three most common causes of errors that the queue manager can report immediately are:

Failure of an MQI call

The queue manager can report immediately any errors in the coding of an MQI call. It does this using a set of predefined return codes. These are divided into completion codes and reason codes.

To show whether or not a call is successful, the queue manager returns a completion code when the call completes. There are three completion codes, indicating success, partial completion, and failure of the call. The queue manager also returns a reason code which indicates the reason for the partial completion or the failure of the call.

The completion and reason codes for each call are listed with the description of that call in WebSphere MQ Application Programming Reference. You will also find further information (including some ideas for corrective action) for each completion and reason code in WebSphere MQ Application Programming Reference. You should design your programs to handle all the return codes that could arise from each call.

System interruptions

Your application may be unaware of any interruption if the queue manager to which it is connected has to recover from a system failure. However, you must design your application to ensure that your data is not lost if such an interruption occurs.

The methods you can use to make sure that your data remains consistent depends on the platform on which your queue manager is running:

z/OS
In the CICS and IMS environments, you can make MQPUT and MQGET calls within units of work that are managed by CICS or IMS. In the batch environment, you can make MQPUT and MQGET calls in the same way, but you must declare syncpoints by using the WebSphere MQ for z/OS MQCMIT and MQBACK calls (see Chapter 13, Committing and backing out units of work), or you can use the z/OS Transaction Management and Recoverable Resource Manager Services (RRS) to provide two-phase syncpoint support. RRS allows you to update both WebSphere MQ and other RRS-enabled product resources, such as DB2 stored procedure resources, within a single logical unit of work. For information on RRS syncpoint support see Transaction management and recoverable resource manager services.

OS/400
You can make your MQPUT and MQGET calls within global units of work that are managed by OS/400 commitment control. You can declare syncpoints by using the native OS/400 COMMIT and ROLLBACK commands or the language-specific commands. Local units of work are managed by WebSphere MQ via the MQCMIT and MQBACK calls.

Compaq OpenVMS Alpha, UNIX systems and Windows systems
In these environments, you can make your MQPUT and MQGET calls in the normal way, but you must declare syncpoints by using the MQCMIT and MQBACK calls (see Chapter 13, Committing and backing out units of work). In the CICS environment, MQCMIT and MQBACK commands are disabled as you can make your MQPUT and MQGET calls within units of work that are managed by CICS.

Compaq NonStop Kernel
You can make your MQPUT and MQGET calls within units of work that are managed by the Compaq NonStop Kernel TM/MP product.

VSE/ESA
CICS controls the unit of work in the VSE/ESA environment. If the system fails and is restarted, the logical unit of work rollback occurs automatically.

You should use persistent messages for carrying all data you cannot afford to lose. Persistent messages are reinstated on queues if the queue manager has to recover from a failure. With WebSphere MQ on UNIX systems, MQSeries for OS/2 Warp, and WebSphere MQ for Windows, note that an MQGET or MQPUT call within your application will fail at the point of filling up all the log files, with the message MQRC_RESOURCE_PROBLEM. For more information on log files on AIX, HP-UX, Linux, OS/2, Solaris, and Windows systems, see WebSphere MQ System Administration Guide; for z/OS see WebSphere MQ for z/OS Concepts and Planning Guide; for other platforms, see the appropriate System Management Guide.

If the queue manager is stopped by an operator while an application is running, the quiesce option is normally used. The queue manager enters a quiescing state in which applications can continue to do work, but they should terminate as soon as it is convenient. Small, quick applications can probably ignore the quiescing state and continue until they terminate as normal. Longer running applications, or ones that wait for messages to arrive, should use the fail if quiescing option when they use the MQOPEN, MQPUT, MQPUT1, and MQGET calls. These options mean that the calls fail when the queue manager quiesces, but the application may still have time to terminate cleanly by issuing calls that ignore the quiescing state. Such applications could also commit, or back out, changes they have made, and then terminate.

If the queue manager is forced to stop (that is, stop without quiescing), applications will receive the MQRC_CONNECTION_BROKEN reason code when they make MQI calls. At this point you must exit the application or, alternatively, on WebSphere MQ for iSeries, WebSphere MQ on UNIX systems, MQSeries for OS/2 Warp, and WebSphere MQ for Windows, you can issue an MQDISC call.

Messages containing incorrect data

Note:
In MQSeries for VSE/ESA, BackoutCount is a reserved field. It cannot be used as described in this section.

When you use units of work in your application, if a program cannot successfully process a message that it retrieves from a queue, the MQGET call is backed out. The queue manager maintains a count (in the BackoutCount field of the message descriptor) of the number of times this happens. It maintains this count in the descriptor of each message that is affected. This count can provide valuable information about the efficiency of an application. Messages whose backout counts are increasing over time are being repeatedly rejected--you should design your application so that it analyzes the reasons for this and handles such messages accordingly.

In WebSphere MQ for z/OS, to make the backout count survive restarts of the queue manager, set the HardenGetBackout attribute to MQQA_BACKOUT_HARDENED; otherwise, if the queue manager has to restart, it does not maintain an accurate backout count for each message. Setting the attribute this way adds the penalty of extra processing.

In WebSphere MQ for iSeries, MQSeries for OS/2 Warp, WebSphere MQ for Windows, and WebSphere MQ on UNIX systems, the backout count always survives restarts of the queue manager.

Also, in WebSphere MQ for z/OS, when you remove messages from a queue within a unit of work, you can mark one message so that it is not made available again if the unit of work is backed out by the application. The marked message is treated as if it has been retrieved under a new unit of work. You mark the message that is to skip backout using the MQGMO_MARK_SKIP_BACKOUT option (in the MQGMO structure) when you use the MQGET call. See Skipping backout for more information about this technique.



© IBM Corporation 1993, 2002. All Rights Reserved