Category Archives: Troubleshooting

XPost: Event 18054 errors in the SQL application log

Here is a great post by Kevin Holman addressing an issue you would come across if you have had to move your SCOM databases or recover them to a new SQL server.

Sample error:

Log Name:      Application
Source:        MSSQL$I01
Date:          10/23/2010 5:40:14 PM
Event ID:      18054
Task Category: Server
Level:         Error
Keywords:      Classic
User:          OPSMGR\msaa
Computer:      SQLDB1.opsmgr.net
Description:
Error 777980007, severity 16, state 1 was raised, but no message with that error number was found in sys.messages. If error is larger than 50000, make sure the user-defined message is added using sp_addmessage.

In essence this happens, as explain, due to the sysmessages being created in the master database on installation. These messages will then be missing after a database move or new sql server recovery.

Kevin has also provided a script to re-add these messages for you here – CAUTION for SCOM 2012 R2 only

SCOM: Alerting on Exchange 2013 Database Failover

As it turns out the Exchange 2013 MP will not be able to alert you should your Exchange 2013 databases fail over, this is by design, as Microsoft does not consider this condition to be an issue.

There is a great article by Scott at flobee.net which addresses this issue for Exchange 2010, it is quite simple to apply the same thinking for use with Exchange 2013. The event is the same, the target just needs to be Exchange 2013 server.

Screenshot-FailoverAlert2

 

SCOM: The System Center Management service terminated with service-specific error %%-2130771964

Just a quick issue which bares being noted.

A colleague of mine had an issue where the health service on one of his management servers would not start. The error displayed was “The System Center Management service terminated with service-specific error %%-2130771964″

The resolution is simple, rename the Health Service State folder and then start the service.

This issue is caused by corruption in the health service cache which is preventing the service from starting,

SCOM: Error in SNMP GET response from IP Address: Status: noSuchInstance(129)

Today I was addressing this error in the Operations Manager event log at one of my customers:

Error in SNMP GET response from IP Address: Status: noSuchInstance(129)

According to What Gets Monitored with System Center Operations Manager 2012 Network Monitoring this can be caused by several things:

Possible Resolutions

  • Stale Discovery Data – Device has been reconfigured since the last discovery and Operations Manager is attempting to monitor a component that no longer exists on the device.
  • If the device doesn’t support the workflow, then a solution is to disable the workflows utilizing the value for the device. This will prevent these workflows loading and failing in the future.
  • Possibly a device issue, try updating the Firmware and OS on the device
  • Possible a discovery issue where the instance is being discovered incorrectly.  For example Operations Manager is expecting to monitor a performance counter but this is a virtual interface and the counters are not present for the interface.  Try running a re-discovery for the device.

In my clients case the network devices in question did not respond to the information request of certain workflows.

I resolved the issue by creating a group and adding the ports which were not returning data to that group. Then I evaluated each of the event log errors for the name of the workflow that was executing the snmp get, see example below:

Log Name:      Operations Manager
Source:       Health Service Modules
Event ID:      11009
Error in SNMP GET response from IP Address: 10.11.11.1, Status: noSuchInstance(129).
One or more workflows were affected by this.
OID: .1.3.6.1.2.1.10.7.2.1.2.268
Workflow name: System.NetworkManagement.MIB2.NetworkAdapter.InputPacketBroadcastPct
Instance name: Port-37

There is a great reference at http://mpdb.azurewebsites.net which you can use to match up the workflow name to the corresponding rule or monitor. If I use the above example and search the page for System.NetworkManagement.MIB2.NetworkAdapter.InputPacketBroadcastPct I can see it correlates to a rule called Input Broadcast Packets Percentage (netcor) which I can override to turn off against the group I created earlier.

It can be a lengthy process and there are other causes which are easier to address so you will need to do some testing in each individual case.

 

Cross post: Quick note on an issue you might encounter after installing Microsoft Security Update 3004375

A potential issue has been highlighted on the Microsoft Operating Manager Engineering Blog.

“We’ve seen an issue relating to 3004375 that occurs due to a regression, and while it’s already been fixed (by installing 3023562), we wanted to take a minute and let you know about some of the details in case you happen to see it. ”

Full details are available here

Short version: If you have KB3004375 installed on your management servers you may need to install KB3023562

 

SCOM: Dashboards are blank when opened in a particular console

It’s always the bugs you come across when you are trying to do something else that can be the most frustrating. Today while creating some dashboards for one of my clients the views returned blank after saving them.

I then had a look at the SQL built in dashboards to see how widespread this issue was and had the same results. The dashboard pane was blank and the tasks also did not display.

dashie

First I tried restarting the console with the /clearcache command and this usually address display issues and had the same result.

I then tried a console on a different server and then I tested using the web console both loaded the dashboards properly . So I was dealing with a server issue or a profile issue.

A bit of research suggested that sometimes this problem can be caused by a corrupt .net installation. As this requires change approval in order to address I first tested the profile option by logging on to the same server, where I experienced the issue with another user account. This time the dashboards loaded correctly so it’s probably not a .net issue.

Considering that /clearcache didn’t work I wanted to first try and remove the momcache.mdb cache file located in C:\Users\<user account>\AppData\Local\Microsoft\Microsoft.EnterpriseManagement.Monitoring.Console in order to test all options before recreating the user profile.

After deleting momcache.mdb and re-opening the console all the dashboards now loaded correctly.

SCOM: Unable to view overrides in Authoring pane when no scope is selected.

An odd issue I encountered today at one of my customers that has recently completed a migration to SCOM 2012 R2, when trying to open the overrides node with no scope selected I got the following error “An object of class ManagementPackClass with ID <GUID Removed> was not found” and then the view shows 0 overrides until the console is reopened with a /clearcache.

Turns out this error is caused by having an override that references a class that no longer exists, in order to track it down you first have to access the overrides which you can no longer see unless you do the following:

1. Export your overrides to text file using poweshell by running get-scomoverride | out-file c:\SCOMOverrides.txt
2. Search the output text file for the GUID which was in the original error “An object of class ManagementPackClass with ID <GUID Removed> was not found
3. The GUID will appear on a line labeled Context, below that one look for a line labeled Identifier, In my case it looked as follows “Identifier: 1|Windows.Operating.System.Custom.Monitors”
4. Generally this is enough information to identify which management pack needs to be deleted if it is not then continue to step 5
5. In SQL run the following query, remebering to edit it for your Identifier, against your operationsmanager database select * from ManagementPack where ManagementPackSystemName like ‘Windows.Operating.System.Custom.Monitors’
6. Once you have identified which management pack is causing your issue it needs to be deleted, once the new configuration is processed you will be able to see your overrides again.

Solution provided by Mcirosoft on Technet forum

SCOM: A System Center service may not start after applying the update in KB2677070

An issue to keep an eye our for in your SCOM and SCSM envionments

This KB article contains 5 potential workarounds if you find yourself experiencing the below problem:

After applying the following update, the System Center Data Access service or the System Center Management Configuration service may fail to start with a time-out error.

2677070 – An automatic updater of revoked certificates is available for Windows Vista, Windows Server 2008, Windows 7, and Windows Server 2008 R2 (https://support.microsoft.com/en-us/kb/2677070/)

Also, after opening the Service Manager console the following error may be displayed:

Reporting Data Warehouse management Server is currently unavailable. You will be unable to view reports or administer the Data Warehouse until this server is available. Please contact your system administrator. After the server becomes available please close your console and re-open to connect.

SCOM: Cluster maintenance mode, resource groups and you

One of my colleagues brought an issue to my attention whereby placing a Cluster server into maintenance mode was causing a flood of  “Cluster resource or group offline or partially online” alerts from that clusters resource groups.

It turns out that the default maintenance mode setting for the Resource Group dependency rollup monitor is “Rollup monitor in maintenance mode as error”. This means that any objects rolling up as maintenance mode will cause the resource group monitor to go critical and generate “Cluster resource or group offline or partially online” alerts.

Changing the value to maintenance mode alleviates this issue. I’ve also has success placing the resource groups into maintenance mode before the cluster objects.

clusterb

SCOM: Updated Exchange 2010 MP version 14.3.210.2

A new version of the  Exchange 2010 MP was released during this months wave of updates, the new version is 14.3.210.2 and it is available for download here. It addresses a particular issue related to Exchange 2010 servers running Powershell 2.0 side by side with Powershell 3.0+.

Updated in this version:

  • Added a new MSI(Exchange2010PowershellFix) that should be used if the management pack doesn’t work on an Exchange 2010 server that has Powershell 2.0/3.0+ installed side by side. Please refer to the “Changes included in 14.3.210.2 (PS 3.0+ Update)” section for more information.

The Exchange2010PowershellFix.msi is also available for download at the same location here.

Extract from the guide:

Exchange 2010 MP versions 14.03.0038.004 and earlier required ONLY Powershell 2.0 to be installed on the Exchange server for it to work. When Powershell 3.0 or higher is installed on Exchange 2010 servers that were working with only Powershell 2.0 installed, Exchange MP stops working.

The new MSI (Exchange2010PowershellFix.MSI) that has been included in this release enables Exchange 2010 MP to work on servers that have Powershell 2.0 installed side by side with Powershell 3.0+. This new MSI should be used only if your existing Exchange 2010 MP isn’t able to monitor your Exchange 2010 server that has Powershell 2.0 and Powershell 3.0+ installed side by side. This will NOT work if the server has only Powershell 3.0 or higher installed.

 If you already have 14.03.0038.004 installed, please execute the 14.3.210.2 “Exchange2010PowershellFix” MSI and import the MP’s contained within it.

 Note:  You must install the 14.03.0038.004 package (Exchange2010ManagementPackForOpsMgr2007-x64.msi/ Exchange2010ManagementPackForOpsMgr2007-x86.msi) prior to applying the 14.3.210.2 update (Exchange2010PowershellFix) in order for the Exchange 2010 MP to function correctly.