Category Archives: Troubleshooting

SCOM: Agents not submitting performance data

Sometimes you might have a situation where all of your agents are showing as healthy in the console but when you try and draw a performance report data is missing.

The below SQL query which has been developed by my colleague Gerrie Louw will identify any agent that has not submitted performance data in the past 4 hours. It does so by checking the following performance counters:

Processor > % Processor Time
LogicalDisk > % Free Space > C:
Memory > Available MBytes

Note: You will probably have to change the DisplayName_ and IsVirtualNode for your OperationsManager database.

if object_id(‘tempdb..#temptable’) IS NOT NULL
DROP TABLE #temptable

SELECT     distinct bmetarget.Name into #temptable
FROM        OperationsManager.dbo.BaseManagedEntity AS BMESource WITH (nolock) INNER JOIN
OperationsManager.dbo.Relationship AS R WITH (nolock) ON
R.SourceEntityId = BMESource.BaseManagedEntityId INNER JOIN
OperationsManager.dbo.BaseManagedEntity AS BMETarget WITH (nolock) ON
R.TargetEntityId = BMETarget.BaseManagedEntityId inner join mtv_computer d on bmetarget.name=d.[DisplayName_55270A70_AC47_C853_C617_236B0CFF9B4C]
and d.IsVirtualNode_E817D034_02E8_294C_3509_01CA25481689 is null
WHERE     (bmetarget.fullname like ‘Microsoft.Windows.Computer%’)

if object_id(‘tempdb..#healthstate’) IS NOT NULL
DROP TABLE #healthstate

select  megv.path, megv.ismanaged, megv.isavailable, megv.healthstate into #healthstate
from managedentitygenericview as megv with (nolock) inner join managedtypeview as mtv with (nolock)
on megv.monitoringclassid=mtv.id
where mtv.name =’microsoft.systemcenter.agent’

if object_id(‘tempdb..#perfcpudata’) IS NOT NULL
DROP TABLE #perfcpudata

select Path, ‘CPU’ as ‘Cat’ into #perfcpudata
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where (TimeSampled < GETUTCDATE() AND TimeSampled > DATEADD(MINUTE,-240, GETUTCDATE()))
and objectname =’Processor’ and countername=’% Processor Time’

if object_id(‘tempdb..#perfmemdata’) IS NOT NULL
DROP TABLE #perfmemdata

select Path,’Memory’ as ‘Cat’ into #perfmemdata
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where (TimeSampled < GETUTCDATE() AND TimeSampled > DATEADD(MINUTE,-240, GETUTCDATE()))
and objectname =’Memory’ and countername=’Available MBytes’

if object_id(‘tempdb..#perfdiskdata’) IS NOT NULL
DROP TABLE #perfdiskdata

select Path,’Disk’ as ‘Cat’ into #perfdiskdata
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where (TimeSampled < GETUTCDATE() AND TimeSampled > DATEADD(MINUTE,-240, GETUTCDATE()))
and objectname =’LogicalDisk’ and countername=’% Free Space’ and instancename=’C:’

if object_id(‘tempdb..#temptable1′) IS NOT NULL
DROP TABLE #temptable1
create table #temptable1 (
name nvarchar(250),
cat nvarchar(20),
val nvarchar(2)
)
insert into #temptable1
select name, ‘CPU’ as ‘cat’, ’1′ as ‘val’
from #temptable where name not in
(select path from #perfcpudata)

insert into #temptable1
select name, ‘Memory’ as ‘cat’, ’1′ as ‘val’
from #temptable where name not in
(select path from #perfmemdata)

insert into #temptable1
select name, ‘Disk’ as ‘cat’, ’1′ as ‘val’
from #temptable where name not in
(select path from #perfdiskdata)

if object_id(‘tempdb..#output’) IS NOT NULL
DROP TABLE #output
create table #output (
name nvarchar(250),
cpu nvarchar(2),
memory nvarchar(2),
disk nvarchar(2)
)

insert into #output
select distinct tt.name ,’0′,’0′,’0′
from #temptable1 as tt, #healthstate as hs
where tt.name=hs.path collate SQL_Latin1_General_CP1_CI_AS
and hs.isavailable=1
and hs.ismanaged=1
and hs.healthstate is not null

update #output set cpu=1 where #output.name in (select name from #temptable1 where #temptable1.name=#output.name and #temptable1.cat=’CPU’)
update #output set memory=1 where #output.name in (select name from #temptable1 where #temptable1.name=#output.name and #temptable1.cat=’Memory’)
update #output set disk=1 where #output.name in (select name from #temptable1 where #temptable1.name=#output.name and #temptable1.cat=’Disk’)

select * from #output

You can use this query to build a report such as the one sampled below:

No Perf Report edited

Loading

SCOM: Top Support Solutions for System Center 2012 Operations Manager

Here is a fantastic technet blog with the top Microsoft Support solutions for the most common issues experienced when using System Center 2012 Operations Manager (updated quarterly).

Definitely one to have a look through occasional to see what the top issues are, that are being experienced with SCOM.

http://blogs.technet.com/b/topsupportsolutions/archive/2014/02/04/top-support-solutions-for-system-center-2012-operations-manager.aspx

There are also similar support pages for other Microsoft products, http://blogs.technet.com/b/topsupportsolutions/

On a side note if you find this blog useful I encourage you to follow me on Twitter  and / or add me on LinkedIn

Loading

SCOM: Bug with SQL MP 6.4.1.0

With the 6.4.1.0 version of the SQL management pack the SQL 2012 DB Engine group does not contain all SQL 2012 servers. This is due to the group being populated based on a SQL registry key which is looking for a version value of 11.0.xxxx.x, however when updating SQL 2012 to SP1 the version changes to 11.1.xxx.x

Kevin Holman has written a nice blog entry about this particular issue: here. As well as an addendum management pack that contains a new group population discovery set to “11.*” along with an override to disable the built in group, which is available for download at the bottom of his article.

Loading

SCOM 2012: Agent on Windows 2012 R2 servers can stop responding

There is a known issue where SCOM 2012 agents stop responding on Windows 2012 R2 Domain Controllers but can affect other Windows 2012 R2 servers as well . Kevin Holman has posted an article with the resolution to this issue: Here

“This is caused by an issue in the Server OS (Windows Server 2012 R2), which is outlined at http://support.microsoft.com/kb/2923126
There is a hotfix, which addresses the issue, which is included in the Feb 2014 update rollup hotfix:  http://support.microsoft.com/kb/2919394″

Loading

SCOM: Upgrade to Operations Manager 2012 R2 may result in Data Warehouse synchronization failures

Brian McDermott highlighted an issue to watch out for when upgrading to SCOM 2012 R2 where you may get Data Warehouse synchronization failure errors after the upgrade.

The article can be found here with solid reasoning as to the cause and solution:

Please note that the Event ID 31565 noted above is a very generic error and you should only run the SQL below if the description identifies that it is the problem with the TFSWorkItemID column.

Error below:

Log Name:      Operations Manager
Source:        Health Service Modules
Date:
Event ID:      31565
Task Category: Data Warehouse
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      OMMS.domain.com
Description:
Failed to deploy Data Warehouse component. The operation will be retried.
Exception ‘DeploymentException’: Failed to perform Data Warehouse component deployment operation: Install; Component: DataSet, Id: ‘0d698dff-9b7e-24d1-8a74-4657b86a59f8′, Management Pack Version-dependent Id: ’29a3dd22-8645-bae5-e255-9b56bf0b12a8′; Target: DataSet, Id: ’23ee52b1-51fb-469b-ab18-e6b4be37ab35’. Batch ordinal: 3; Exception: Sql execution failed. Error 207, Level 16, State 1, Procedure vAlertDetail, Line 18, Message: Invalid column name ‘TfsWorkItemId’.

This issue can be fixed with the below SQL query, as always BACKUP your databases and proceed at your own risk:
USE OperationsManagerDW
 
DECLARE @GuidString NVARCHAR(50)
SELECT @GuidString = DatasetId FROM StandardDataset
WHERE SchemaName = ‘Alert’
 
— update all tables that were already created
DECLARE
   @StandardDatasetTableMapRowId int
  ,@Statement nvarchar(max)
  ,@SchemaName sysname
  ,@TableNameSuffix sysname
  ,@BaseTableName sysname
  ,@FullTableName sysname
 
SET @StandardDatasetTableMapRowId = 0
 
WHILE EXISTS (SELECT *
              FROM StandardDatasetTableMap tm
              WHERE (tm.StandardDatasetTableMapRowId > @StandardDatasetTableMapRowId)
                AND (tm.DatasetId = @GuidString)
             )
BEGIN
  SELECT TOP 1
     @StandardDatasetTableMapRowId = tm.StandardDatasetTableMapRowId
    ,@SchemaName = sd.SchemaName
    ,@TableNameSuffix = tm.TableNameSuffix
    ,@BaseTableName = sdas.BaseTableName
  FROM StandardDatasetTableMap tm
          JOIN StandardDataset sd ON (tm.DatasetId = sd.DatasetId)
          JOIN StandardDatasetAggregationStorage sdas ON (sdas.DatasetId = tm.DatasetId) AND (sdas.AggregationTypeId = tm.AggregationTypeId)
  WHERE (tm.StandardDatasetTableMapRowId > @StandardDatasetTableMapRowId)
    AND (tm.DatasetId = @GUIDString)
    AND (sdas.TableTag = ‘detail’)
    AND (sdas.DependentTableInd = 1)
  ORDER BY tm.StandardDatasetTableMapRowId
 
  SET @FullTableName = @BaseTableName + ‘_’ + @TableNameSuffix
 
  IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @FullTableName AND TABLE_SCHEMA = @SchemaName
    AND COLUMN_NAME = N’TfsWorkItemId’)
  BEGIN
    SET @Statement = ‘ALTER TABLE ‘ + QUOTENAME(@SchemaName) + ‘.’ + QUOTENAME(@FullTableName) + ‘ ADD TfsWorkItemId nvarchar(256) NULL’
    EXECUTE (@Statement)
  END
 
  IF NOT EXISTS (SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = @FullTableName AND TABLE_SCHEMA = @SchemaName
    AND COLUMN_NAME = N’TfsWorkItemOwner’)
  BEGIN
    SET @Statement = ‘ALTER TABLE ‘ + QUOTENAME(@SchemaName) + ‘.’ + QUOTENAME(@FullTableName) + ‘ ADD TfsWorkItemOwner nvarchar(256) NULL’
    EXECUTE (@Statement)
  END
END
 
— alter cover views
EXEC StandardDatasetBuildCoverView@GUIDString, 0
GO

 

Loading

SCOM: When updating the IBM Storage Management Pack to 2.1.0

When updating the IBM Storage Management Pack to 2.1.0 there are a few things to be aware of, most of which is included in the documentation.

After completing the installation, running the upgrade configuration and removing the old management packs and importing the new ones we still weren’t able to re-discover the IBM SAN.

The documentation recommends, in the configuration section that, “The IBM storage configuration must be synchronized with Management Server manually  if there is storage configuration left after upgrading from previous version to version 2.1.0. The IBM storage configuration also should be synchronized with the Management Server manually after the management pack is deleted and re-imported.”

Except that the command skipped all of our SANs.

IMB1

Checking the SCOM configuration revealed that something which shouldn’t have happened, had happened. The SCOM configuration had been lost during the upgrade

IMB2

Using the –sc-set command to re-do the configration was successful, which allowed the migration to complete and in short order the SANs were discovered and monitoring.

IMB3

Loading

SCOM 2012: The System Center Management service stops responding after an instance of SQL Server goes offline

Update: this issue also applies to SCOM 2012 R2 as confirmed in this article by Kevin Holman

Microsoft released a KB article on the 5th of December on how to deal with a particular issue in SCOM 2012 SP1. It may be worth applying this fix preemptively if you are still running SP1, in order to avoid unnecessary downtime.

“After an instance of Microsoft SQL Server that hosts the OperationsManager database goes offline, the System Center Management service of the Microsoft System Center 2012 Operations Manager Service Pack 1 (SP1) management server stops responding.

For example, the System Center Management service stops responding after the instance of SQL Server disconnects, restarts, or fails. To recover from this issue after the instance of SQL Server is available again, you must restart the System Center Management service. “

KB Article

NB: As always backup your registry before making any changes

To resolve this issue, you can enable the automatic recovery feature in System Center 2012 Operations Manager SP1. By default, this automatic recovery feature is disabled. 

To enable the automatic recovery feature on the management server, follow these steps:

  1. Start Registry Editor.
  2. Locate and then click the following registry subkey:
    HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\System Center\2010\Common\DAL
  3. Create the following two registry entries:
    • DALInitiateClearPoolType: DWORD
      Decimal value: 1
    • DALInitiateClearPoolSecondsType: DWORD
      Decimal Value: 60Note The DALInitiateClearPoolSeconds setting controls when the management server drops the current connection pool and when the management server tries to reestablish an SQL connection. We recommend that you set this setting to 60 seconds or more to avoid performance issues.
  4. Restart the System Center Management service on the management server.

Loading

SCOM: Windows 2008 also running monitors for Windows 2003, Orphaned class

Now here’s something you certainly don;t see every day. It all started when I was asked to investigate a flood of memory alerts for a particular server at one of my customers. When I opened Health Explorer I noticed the following:

The server was running the monitors for Windows 2008 and Windows 2003.

HExpl double monitors
As it turns out the server had been recently re-installed from 2003 to 2008 with the same name, without the agent being uninstalled or being removed from the console. This caused a bit of confusion in the back-end. A quick look at the Windows Server 2003 Operating System Inventory showed another server which was “Upgraded” i in the same fashion:

2003 OS edt
What’s happened here is the class for Windows Server 2003 Operating System is still being loaded by the agent, and this is causing all of the related rules and monitors to load as well. In the past when I’ve come across this particular issue I’ve been able to solve it with the remove-disabledmonitoringobject powershell cmdlet. 
All that you need to do is override the discovery rule in question to false for your object (In this case “Discover Windows Server 2003 Operating System) and then open OpsMgr Shell and run remove-disabledmonitoringobject. After a short delay the offending objects are removed.

However in this case the above did not work, eventually I deleted the agent from the console, waited for grooming to run (you can force it if you are in a hurry), cleared the local agent cache and then approved the agent. Now only the correct objects are being discovered.

Loading

SCOM / SCVMM: PRO group names contain Chinese characters in System Center Operations Manager

If you have integrated SCOM 2012 and SCVMM 2012 and are using the PRO management packs you may have noticed some groups containing Chinese characters.

In an English-language version of Microsoft System Center 2012, you connect Virtual Machine Manager (VMM) to Operations Manager. However, group names in the Performance and Resource Optimization (PRO) management packs contain Chinese characters.

Cause: This behavior occurs because the LanguageCode option for the management group is not set in the Operations Manager database. Therefore, when a management pack contains multiple languages, its display names appear in the last language that is included.

Microsoft has published a KB article with the solution to this issue.

Loading