Updated: July 13, 2020 7:51am

Troubleshooting Replication

This topic has various information that can be helpful when troubleshooting replication issues:

  • Replication UI elements
  • Show Successful Records
  • Reprocess Errors
  • Failed or Stopped Initialization, Replication
  • Pause/Resume Initialization
  • GuaranteedInitMessageDelivery setting
  • Pending Tab
  • Limiting the Number of Rows Fetched by GET Requests
  • Restart PrismMQ after Deleting Replication Records


Replication UI Elements

Element Description
to server batches Records that have been sent in to this server.  Click the link to display a list of the records sent in to this server.
from server batches Records that have been sent out from this server. Click the link to display a list of the records sent out from this server.
error count  Errors. Click the link to display a list of the errors.
filter installation name When several servers are available, you can filter the list. Type the name of the machine. Click the icon to filter for only servers with errors.
total link count Shows the total number of links sent to this server and the number of errors.
batches Displays the number of batches included in the replication results.


Initialization Hierarchy
You can navigate the various levels of the initialization using the buttons at the top of the UI: Server, Store, Resource, SID.

Element Description
server_list This will take you to the top level of initialization batches, showing a list of servers.
link to server resource list This button displays that server's list of resources sent in the most recent initialization batch for the server. In this case, the server is "HQ".
vendor resource link  When you drill down to the individual record label, you can use this button to go back to the list of records sent for the selected resource. In this case, the list of vendor records will be displayed.
resource link Link for an individual resource record.

Navigate between batches
When viewing the list of initialization batches for a server, use the arrow buttons to navigate back and forth among the batches. To identify which batch you are viewing, check the date and time the batch launched.

 Show Successful Messages
By default, the successful records sent during replication are not preserved. This helps keep hard drives from getting filled with the files. However, during troubleshooting, you may want to change this setting so that you can drill down and see individual records.
The default value for Show Successful Messages is False, meaning that successful records are not preserved; only errors are preserved. Keeping the default value of False is important for customers with large data sets. If set to true, large quantities of unneeded data (sometimes millions of rows) could be preserved. If you are not careful, this has the potential to overwhelm the system memory. When you run initialization, and everything is successful, but the property is set to false (default), you will see the completed message in the Status screen and that is all.
Now that you understand why success records are not retained, here is how you can change the setting to preserve success records for troubleshooting.
To show success records:

  1. Edit the PrismMQService.ini file so that the PreserveSuccessRecords setting is set to TRUE.
  2. Select the Show Successful Messages checkbox on the Prism Dashboard > Initializations screen.
  3. Initialize the server. Important! You must enable Show Successful Records in the .ini file BEFORE initializing. If you edited the .ini file before initialization, then when you click the Show Successful Messages checkbox, a list of successfully replicated resources will be displayed.

 show successful messages

Reprocess Errors
If any errors occur during initialization, you can correct the problem and then reprocess the specific links that failed. You can click a server's link to drill down. Click the column header to sort the list by the Failed column. This brings the errors to the top of the list. Next, go into the resource with the errors. Select the individual elements you want to reproces and the click the Reprocess Selected button. If you click the View button, you can see details about the selected resource. You can see a lot of info on this screen, so scroll to the right to display more columns.

If Replication, Initialization Fails or is Stopped
If replication services are interrupted (reboot, computer freezes), Day-2-Day replication will resume where it left off. Initialization can take a long time for larger databases and unfortunately, initialization can sometimes fail to complete successfully. If an initialization fails or is stopped, do this:
Create a new Sender profile that starts from the resource after the last COMPLETED resource. For example, if the initialization was in the middle of the Inventory resource when the failure occurred, the new Sender profile should include Inventory and the rest of the resources to the bottom of the list. Run initialization again using the new Sender profile.
It may take a while to process the first resource (the resource that was being initialized when the failure occurred). This is because the program must do a slower UPDATE operation on each of the resource's records that are already in the tables. Once the program finishes the updates and reaches the unprocessed records for the resource, it switches to the much faster INSERT operation. The entire resource in which the failure occurred must be sent again.

Guaranteed Init Message Delivery
This setting, when used in combination with the RESUMEINITONSTARTUP setting, ensures that if initialization fails, no messages are lost are lost and initialization automatically resumes at the point of failure.

There are three key properties in the PrismMQService.ini file that are related to this feature. By default, both INITGUARANTEEDMESSAGEDELIVERY and RESUMEINITONSTARTUP are set to True by default. You can find these settings in the [PRISM] section of the PrismMQService.ini file. The INITGUARANTEEDMESSAGEDELIVERY is also in the [RIL] section of the PrismMQService.ini file.

[PRISM]
INITGUARANTEEDMESSAGEDELIVERY=True
RESUMEINITONSTARTUP=True
[RIL]
INITGUARANTEEDMESSAGEDELIVERY=True

By setting RESUMEINITONSTARTUP to true, if a consumer does down during an initialization, when it is restarted it will resume initialization automatically. In tandem with this property the user must also set the INIGUARANTEEDMESSAGEDELIVERY to true on the sending server for whichever initialization type the user wants to guarantee that if RabbitMQ goes down that no messages are lost.
Here's a typical use case: Let's say you start an initialization and realize the consumer's 20 thread default is too low for the power of the machine. You can pause that consumer, change the thread count and then resume the consumer. Initialization will pick up where it left off with the new thread count. If GMD is set to False, then some messages may be lost if there is a RabbitMQ Failure (not just a PMQ failure). In such a case, even a restart of initialization will likely get stuck and not finish, and the initialization will have to be restarted from the last completed resource. When turning on GMD for a system that has both sender and consumer on same system (i.e., RIL and Prism on same system) will slow down initialization for this system and any others that might be included in an initialization batch with this system.

Pause/Resume Initialization
You can pause/resume the initialization process. You cannot pause the Sender, but each downstream system that is consuming the resources being sent can be paused and/or resumed. Start initialization. The Server List is displayed. Click on a server. Click the Pause button.  The Initialization consumer will stop consuming messages. (THIS WILL NOT STOP THE SENDER). The button caption changes to Resume. Click it again and the initialization consumer will Resume.

Cancel or Delete Batch
You can cancel or delete a batch. Select the batch and then click the Cancel or Delete button as needed.
If you cancel an initialization batch, all running initializations will be stopped. If you delete an initialization batch, all messages currently in the queue will be lost.

Pending Tab
On the Day to Day tab of both the V9 Dashboard and Prism Dashboard is a pair of tabs: Completed and Pending. By default, the Completed tab is selected, showing formation for completed Day-to-Day replication records. The Pending tab, on the other hand can be useful for verifying the messages that are about to be put on the RabbitMQ bus.
The screen shows summary information (for all connections):

  • Total number of new messaged pending
  • Messages being processed on the RabbitMQ bus (i.e. "in process")
  • Messaged placed on the RabbitMQ bus (i.e. completed)

Limiting the Number of Rows Fetched by GET Requests
You can setup limiting for a resource by adding a section to either PrismBackOffice.ini or PrismCommon.ini file (depending on which Windows Service the resource is name spaced to).
[GETLIMIT]
# >0: limit number of rows fetched
#  0: no limit (default)
# -1: fetch but log a warning
# -2: do not fetch and log a warning
# -3: error out
customer=100
document=100

Example
Adding this to the backoffice.ini file will limit the TransferSlip top level resource.
[GETLIMIT]
Transferslip=100    Limits slip GET to max of 100 slips

Restart PrismMQ after Deleting Replication Records
The key service involved with replication, PrismMQ, caches records aggressively to improve performance. If the record is in the cache, it is assumed that the record exists and PrismMQ doesn't need to confirm its existence in the database. This is usually not a problem; however, it can become a problem if you have deleted replication-related records (e.g. replication_status) from tables while troubleshooting. The record you removed as part of your troubleshooting efforts may still be in the cache. Therefore, you should restart PrismMQ as soon as possible after deleting replication-related records. Restarting PrismMQ will clear the cache.