Issue
I recently came across an issue whereby some projects where publishing fine for some users and others where failing. After say a few tries, the same project would fail to publish with the dreaded 26000 error!
(click picture to enlarge)
<?xml version="1.0" encoding="utf-16"?>
<errinfo>
<general>
<class name="Queue">
<error id="26000" name="GeneralQueueJobFailed" uid="8bf04958-b8c1-47ac-bfc9-a0bbebb59238" JobUID="cf6c2adb-22c4-43a8-bed8-e52113705c66" ComputerName="TAVMPS01" GroupType="ProjectPublish" MessageType="PublishProjectMessage" MessageId="135" Stage=""/>
</class>
</general>
</errinfo>
Investigation
The only other time I had seen this before was when the RTM update had failed leading to binaries being at a patch level different to the databases. Not this time!
I checked the patch levels of each database and also the client apps. This was confusing! How can a project publish perfectly well as many as 10 times and then suddenly fail! The inconsistency was worrying and hard to troubleshoot.
- the SharePoint Config Wizard (SCW) indicated no problems with the farm
- projects were saving and checking in fine.
On closer examination of the ULS logs (throttled for verbose) I determined that there was a possible issue with a procedure in the Published SQL DB. This was not a good sign.
(click picture to enlarge)
I then ran the database integrity check on the Published database and this pointed to database corruption! Error 8967 is not a good sign! but, at least I knew what the cause of the issue was.
This error was somewhat similar to another SQL issue http://support.microsoft.com/kb/960791/en-us. It did not, however, cover my exact scenario, but, was an indicator of the type of issue I was facing.
(click picture to enlarge)
Resolution
To resolve the issue the options were;
- Repair the SQL database (REPAIR_ALLOW_DATA_LOSS, REPAIR_FAST, REPAIR_REBUILD). Note: use the first option repair with data loss option only as a last resort as this may render your system unusable. Test the other repair options first on a test/development environment first before applying to live.
- Restore previous ‘good’ backup of the SQL databases
- Restore from previous SharePoint farm backup
I went for Option (2) above and everything started working fine. We had to replicate some of the changes as the ‘good’ DB backup copy was slightly older.
Recommendation
Avoid using option (1) above i.e. repairing the DB with data loss unless absolutely necessary and no other options are viable. This is no guarantee of a fix. If you want to try the other SQL database repair options then test thoroughly on a test environment first before applying on live.
I would recommend option (2) to fix this type of issue. Note: remember to restore all four project server databases and associated content DB. Do not be tempted to mix and match i.e. single DB restore.
This just goes to show how important it is to ensure that you have a proper SQL database backup maintenance plan setup (with verification) as per your business data loss (and DR) policy. Not only this, but it important to also do a restore test on a test/dev environment on a regular basis say every quarter. Stay tuned for post on recommended strategy for Disaster Recovery.
Note: This posting is provided "AS IS" with no warranties, and confers no rights.
No comments:
Post a Comment
Please include your email address with comments.