V9 Database Troubleshooting Outline
V9 Database Troubleshooting Outline
The purpose of this outline is to help business partners initially troubleshoot a client?s Retail Pro V9 Oracle database, in the event that they are unexpectedly unable to connect to the database. This guide will devise a way to identify certain issues and resolve them in a timely manner. Simply follow the steps below.
TROUBLESHOOTING THE DATABASE
1.Reboot the machine
If by any chance resources are being taken up at the time of a system down, rebooting the machine will free up resources and also cleanly restart the services.
2.Check hard drive space
a.Typically in the event of a mid-day system down, the hard drive could get full thus no longer enabling the archive logs to create anymore files and in result causing a system down issue. For example, the LGWR (Log Writer) writes to Redo Logs in a circular fashion.When the current redo log file fills, LGWR begins writing to the next available redo log file. When the last available redo log file is filled, LGWR returns to the first redo log file and writes to it, starting the cycle again. If archiving is enabled (the database is in ARCHIVELOG
mode), a filled redo log file is available to LGWR after the changes recorded in it have been written to the data files and the file has been archived. If the disk is full the data in the redo logs cannot be written to an archive file, the LGWR cannot start over and write to the first Redo Log, and the database will crash. In that event, you will need to clear up any space that is needed at that time and then restart the services.
b.When a redo log is overwritten, all changes stored in the redo log are no longer recoverable; therefore, before Oracle overwrites the file, Retail Pro can copy the contents to an archive log. Archive log files are then used to perform database recovery.
-
- You can modify the location of this file (with the restriction that the target path must exist, because Oracle will not attempt to create the path if it does not exist).
- Archive logs need to be managed. If you choose to ignore the archive logs then they will eventually fill up your hard disk (or other backup media). It is only necessary to keep the archive logs that have been created since the last full backup. This means that each time you perform a cold (complete) backup you can safely delete all of your archive logs.
- Note: The database will be backed up before and after enabling of archiving.
3.Check Event Viewer
a.DO NOT CLEAR EXISTING LOGS! This would prevent a technician from seeing any issues that may have been encountered on the machine regarding to the system down.
b.Ensure that the correct configuration is present (set to overwrite logs). See Knowledgebase #21 for correct configuration - http://knowledgebase.retailpro.com/questions.php?questionid=21
c.While viewing the logs, check for any viruses, bad hard drive errors, Retail Pro Licensing errors, etc.
4.Check Services (Control Panel>Administrative Tools>Services)
a.OracleServiceRproODS, OracleODS11gr1TNSListener, and RetailPro License Server are the main services. May have to check the Oracle BI services if present. Initially, check to see if these services are running. Then proceed to restart the services.
b.Shutting down and restarting in this order:
RetailPro License Server (stop)
OracleODS11gr1TNSListener (stop)
OracleServiceRproODS (stop, and then start)
OracleODS11gr1TNSListener (wait till the RproODS has started and then start)
RetailPro License Server (start)
c.If you encounter an error when attempting to restart the RetailPro License Server service, you may need to manually kill the service. Go to Task Manager>Processes tab and look for LicenseServer.exe and right-click and click on End Process Tree.
5.In the event of encountering a "Unable to seat workstation..." error.
There may be a time, when the user encounters an error such as "Unable to seat workstation: maximum number of seats exceeded". This is caused by licensing restricting the number of Rpro9.exe connections based on the number of seats allocated for that station. In order to resolve such an issue, there are a couple of sql queries that will help identify the connected workstations along with clearing up the appropriate workstations. You can find this and more following KB #40. http://knowledgebase.retailpro.com/questions.php?questionid=40
6.No errors in logging and still able to log into TTK and licensing is correct, but get the ?cannot connect to database? error. You may want to double-check the workgroup or domain name to ensure that there are no underscores in the name. See KB Article 101 for a quick understanding on this topic - http://knowledgebase.retailpro.com/questions.php?questionid=101
REPAIRING THE DATABASE
1.Create a cold back up of the database in its present state
a.Before any attempts of making any changes to the database, such as rebuilding the control file, it is imperative to create a back up of the present database. This would prevent from possibly damaging the database any further.
i.Ensure to shutdown the Oracle Services prior to creating a back up (see steps outlined above).
ii.Make a copy of the :\Oracle\oradata\rproods directory. If a bad drive is suspected, ensure to copy over to a different drive.
iii.Then restart the Oracle Services.
2.Attempt to log into the Tech Tool Kit
a.Log into Tech Tool Kit and this will determine if there is a licensing issue or a database issue.
i.If you attempt to log into the TTK and encounter the prompt that the database is not mounted or is closed, this would definitely be a database issue. However, if you can get into TTK, the database is online and accessible and the issue may be license related.
ii. In order to log into TTK Offline, you will require a default dbsec.dat file in the RetailPro9 folder. See Knowledgebase 148 on how to create -http://knowledgebase.retailpro.com/questions.php?questionid=148 Copy this file and drop it in the root of the RetailPro9 directory. Then you will be able to log into TTK Offline using the default Sysadmin credentials.
iii. From TTK Offline, you would be able to Rebuild the Password File or even Rebuild the Control File and conduct limited recovery processes.
3.Checking the Licensing Log
a.In the event that accessing TTK is good and that the database is online, the user may need to check the status of the installation next by going into the Service Manager and into the licensing server and clicking on Configure. Is it in trial mode and has expired, has it been activated? Has this database been properly restored onto a different machine? If there are no visible discrepancies or explanations, you may need to check the status of communication with the licensing, by checking the licensing log in the :\RetailPro9\Logs directory.
4.Check the Alert Log
a.Looking for Oracle Errors
i.For R3 and higher installations, the Alert log is located in:
:\Oracle\diag\rdbms\rproods\rproods\trace
For R1 installations, the Alert log is located in:
:\oracle\admin\rproods\bdump
ii.Open the alert_rproods.log in Notepad and start from the bottom and use the ?Find? function in the ?Up? direction using a keyword search of ?ORA-?. Then you will encounter any ORA errors that may come up. It is also important to look at the dates of the ORA errors to determine the timeframe as to when the errors have started. By knowing some of these ORA errors will then determine which point of action must be done next. Keep in mind, Oracle errors from days ago could possibly cause corruption during the present time of the system down. However, there have been some ORA errors that may come up that can be ignored. Such as an ORA-00283-Media Recovery Failed, followed by an ORA-00264-No media recovery required. This is created by simply following the rebuild of the control file within the Tech Tool Kit.
Media Recovery Start
Fast Parallel Media Recovery enabled
Media Recovery failed with error 264
ORA-283 signaled during: ALTER DATABASE RECOVER DATABASE..
Some examples of Oracle errors:
ORA-00600 is typically an Oracle error that can be caused:
Time Outs
File Corruption
Failed data checks in memory, hardware, or I/O messages
Incorrectly restored files
The possible causes above are purely an indication of another problem. The alert log and trace files should indicate an accompanying error. The ORA-00600 error is a general error that is serious in nature, and would require in-depth analysis of the database environment. In most occasions the ORA-00600 error indicates a non-recoverable error in which a backup would need to be restored in order for normal operations to resume.
However there are some occasions where an ORA-00600 error has a known good solution
For Example:
ORA-00600 internal error code, arguments: [dbkif_find_next_record_1] [],[],[],[],[],[]...
The KB article located below contains a solution to that error
http://partners.retailpro.com/partnerservices/knowledgebase.php?a=145
Another Example:
ORA-00600: internal error code, arguments: [kcratr1_lastbwr], [], [], [], [], [], [], [], [], [], [], []
The key is [kcratr1_lastbwr]. It is a known oracle bug (9156272) and is fixed in 11.2.0.2 patch. Retail Pro is currently using 11.1.0.7.0 in R3. To work around this problem when it is encountered, a trained Retail Pro Inc support technician must log into the database of the downed system via Sqlplus and issue a series of commands.
Example of another ORA error..
?ORA-1113 signaled during: alter database open?
This message indicates that one or more of the files of the database is preventing the database from opening and likely need recovery.� The procedure to resolve this problem is to follow the file recovery steps outlined in the tech toolkit documentation to identify the offline database file and recover using file or database recovery.
5. In the event that an Oracle error is encountered and depending how severe the database error may be, a recover from backup may need to be done. We recommend reading the documentation regarding ?Best Practices for Backup and Recovery? located on the Documentation Portal. http://partners.retailpro.com/Documentation/view.php?id=257
If no backup is available, reinstalling and regenerating may be the only option if this happens to be a store database that routinely transmits everyday transactions to a corporate database. Please note, that this is not a valid backup scheme.
6. Option of Last Resort
However, in some cases in order to recover a database may need the experience of an Oracle DBA. By enlisting our Professional Services staff who will attempt to repair the database for a fee. The repair attempt will be billed at our normal rate of $180 (US)/hour. The repair process usually takes between 2 and 6 hours ($360 to $1,080). We cannot guarantee this repair process will restore the database to its original state as this process may not be successful, the customer will be charged regardless of the outcome. The key to a successful recovery is having a recent good backup and archive files.