Retail Pro Prism Replication Troubleshooting - Common Issues

This is a table of replication troubleshooting provided from the 2022 Retail Pro Prism 2 Workshop.

ISSUE POTENTIAL ROOT CAUSE IDENTIFY SOLUTION
Missing/stuck data Initialization process is running/stuck
  • Look at the Connection Manager to identify where data is (sent?, processing?) and verify the producer /  consumer cache tables for data
  • Check the replication status table on the store. See if the init session is in progress, paused, canceled
  • If data is still moving & processing, you will need to wait.  Init takes priority over D2D.
  • If data isn't moving/processing, determine why if possible.
  • Cancel initialization if paused or can't resume.
Missing/stuck data Error - constraint or potentially DB optimistic lock, if exceeds defined retry count
  • Enable log level 3 and resend document to capture additional details in order to determine what exactly is the constraint.
  • Possible server performance issue
  • Optimistic lock is usually the same record sent more than once (ex: same customer being resent over and over)
Correct issue on document or resend data
Missing/stuck data Backlog of data on the POA/RIL producer tables Identify root cause: Performance, init session in progress, locked queue cycling.
  • Depends on root cause.
  • If locking queue is the issue, then address this. Avoiding this is key.
  • Init session in progress or stuck. See comments on init priority and issues with stuck queues above.
Missing/stuck data Malformed custom JSON file (integrations)
  • Identify in the PrismMQ logs
Example: !Error | Data was not readable, likely a serializer error. Cannot report replication status details
Contact the developer.
Missing/stuck data DB Data file size capacity (OS file size limit) Seen in DB logs (Oracle alert_rproods.log file) and possibly in PrismMQ logs. Add additional data file (see RIL TTK)
Missing/stuck data RabbitMQ Mnesia DB files corrupt Logging into the RMQ management console gives an error, even after restarting RabbitMQ service.
  • Delete the RabbitMQ queues in the queue folder until you find the corrupt queue/queues.
  • C:\ProgramData\RetailPro\Server\RabbitMQ\db\

rabbit@"hostname"mnesia

\msg_stores\vhosts\628WB

79CIFDYO9LJI6DKMI09L\queues
Process out of memory Memory limitation: 32bit memory address limits - 1.8+ GB max Task manager (details - peak or current memory usage), noting memory usage for PMQ processes Depending on what process. Restart service in most cases.
Process out of memory
  • Customer UDF
  • Extremely large document
Memory limit: Identify message size in producer cache/consumer cache table Reduce number of consuming threads. Don't send the offending data.
RabbitMQ Lost connections (repeatedly). Possible lost messages or initialization failures.
  • Known issue (fixed in the  latest release of Retail Pro Prism 1.14.7)
  • RabbitMQ queue setup: Heartbeat check is out of sync with connection timeout.
  • Seen in RMQ logs every couple seconds. This occurs over and over on all the connected systems.
  • Client unexpectedly closed TCP connection
Upgrade to latest Retail Pro Prism 1.14.7.2153 or later.
Preferences overwritten Core resources replicated from store to POA
  • New store was published before joining the enterprise.
  • Changes made at store replicate to the POA.
  • Scheduler will trigger core resources to be sent with some tasks (Update active season).
  • Retail Pro Prism 2.1 release has the ability to turn off core resources (in PMQ config file)
  • Disable scheduler tasks "update active season" (set active = 0) on all store servers (ideally before joining the enterprise)
  • Clean out producer_cache before joining the enterprise.
Data stuck in RabbitMQ on the sending side Firewall See if you can establish telnet or verify that ports are open. Correct firewall setup
Data stuck in RabbitMQ on the sending side Store server networking issue Ping or attempt to establish any connection to the store/receiving server Correct network issue
Join Enterprise Error - Invalid controller Data Invalid controller data (restoring or reinstalling a system previously joined) Likely the controller table has a record of this system with a different SID or same SID and controller ID. Other possible issues could also exist (see KB).  KB: Resolving Invalid Controller Data Error in RP Prism's Enterprise Manager
Join Enterprise Error - Invalid controller Data Init of core resources failed/stuck
  • Join fails or gets stuck on the last step where it is initializing the core resources.
  • Checking the replication_status table and producer/consumer cache tables to determine if data is really stuck or has completed and is just missing end of init message.
Kill the TTK session and clean up the init session if it remains.  Then initialize the core resources manually.

 

 

Published on Jul 19, 2022 in Configuration & Settings, Data Replication

 

Find Another Article