The three most common causes of errors that the queue manager can report immediately are:
The queue manager can report immediately any errors in the coding of an MQI call. It does this using a set of predefined return codes. These are divided into completion codes and reason codes.
To show whether or not a call is successful, the queue manager returns a completion code when the call completes. There are three completion codes, indicating success, partial completion, and failure of the call. The queue manager also returns a reason code which indicates the reason for the partial completion or the failure of the call.
The completion and reason codes for each call are listed with the description of that call in WebSphere MQ Application Programming Reference. You will also find further information (including some ideas for corrective action) for each completion and reason code in WebSphere MQ Application Programming Reference. You should design your programs to handle all the return codes that could arise from each call.
Your application may be unaware of any interruption if the queue manager to which it is connected has to recover from a system failure. However, you must design your application to ensure that your data is not lost if such an interruption occurs.
The methods you can use to make sure that your data remains consistent depends on the platform on which your queue manager is running:
You should use persistent messages for carrying all data you cannot afford to lose. Persistent messages are reinstated on queues if the queue manager has to recover from a failure. With WebSphere MQ on UNIX systems, MQSeries for OS/2 Warp, and WebSphere MQ for Windows, note that an MQGET or MQPUT call within your application will fail at the point of filling up all the log files, with the message MQRC_RESOURCE_PROBLEM. For more information on log files on AIX, HP-UX, Linux, OS/2, Solaris, and Windows systems, see WebSphere MQ System Administration Guide; for z/OS see WebSphere MQ for z/OS Concepts and Planning Guide; for other platforms, see the appropriate System Management Guide.
If the queue manager is stopped by an operator while an application is running, the quiesce option is normally used. The queue manager enters a quiescing state in which applications can continue to do work, but they should terminate as soon as it is convenient. Small, quick applications can probably ignore the quiescing state and continue until they terminate as normal. Longer running applications, or ones that wait for messages to arrive, should use the fail if quiescing option when they use the MQOPEN, MQPUT, MQPUT1, and MQGET calls. These options mean that the calls fail when the queue manager quiesces, but the application may still have time to terminate cleanly by issuing calls that ignore the quiescing state. Such applications could also commit, or back out, changes they have made, and then terminate.
If the queue manager is forced to stop (that is, stop without quiescing), applications will receive the MQRC_CONNECTION_BROKEN reason code when they make MQI calls. At this point you must exit the application or, alternatively, on WebSphere MQ for iSeries, WebSphere MQ on UNIX systems, MQSeries for OS/2 Warp, and WebSphere MQ for Windows, you can issue an MQDISC call.
When you use units of work in your application, if a program cannot successfully process a message that it retrieves from a queue, the MQGET call is backed out. The queue manager maintains a count (in the BackoutCount field of the message descriptor) of the number of times this happens. It maintains this count in the descriptor of each message that is affected. This count can provide valuable information about the efficiency of an application. Messages whose backout counts are increasing over time are being repeatedly rejected--you should design your application so that it analyzes the reasons for this and handles such messages accordingly.
In WebSphere MQ for z/OS, to make the backout count survive restarts of the queue manager, set the HardenGetBackout attribute to MQQA_BACKOUT_HARDENED; otherwise, if the queue manager has to restart, it does not maintain an accurate backout count for each message. Setting the attribute this way adds the penalty of extra processing.
In WebSphere MQ for iSeries, MQSeries for OS/2 Warp, WebSphere MQ for Windows, and WebSphere MQ on UNIX systems, the backout count always survives restarts of the queue manager.
Also, in WebSphere MQ for z/OS, when you remove messages from a queue within a unit of work, you can mark one message so that it is not made available again if the unit of work is backed out by the application. The marked message is treated as if it has been retrieved under a new unit of work. You mark the message that is to skip backout using the MQGMO_MARK_SKIP_BACKOUT option (in the MQGMO structure) when you use the MQGET call. See Skipping backout for more information about this technique.