Tag Archives: #Troubleshooting

SCOM: Agent error Keyset does not exist

An issue to be aware of when you package your SCOM agent with your server build image is that when the server is built a certificate is generated for the agent to use, this certificate resides in the Operation Manager Certificate Store. If the server is then renamed due to it having a temporary build name you will see the below error in your Operations Manager event log.

Event: 7022
Source: HealthService

The Health Service has downloaded secure configuration for management group <MG Name>, and processing the configuration failed with error code Keyset does not exist(0x80090016).

Re-installing the agent will fix this issue but there is a simpler solution by Gerrie Louw, open your certificate MMC, navigate to the Operation Manager Store and delete the certificate, then restart your Healthservice.

The symptoms can occur with all versions of the SCOM / MMA agent under the agent packaged with a server image scenario.

Loading

SCOM: Guided walkthrough for troubleshooting UNIX and Linux agent discovery

Microsoft has just released a Guided Walkthrough for troubleshooting UNIX and Linux agent discovery issues in System Center 2012 Operations Manager.

It is available here:KB2993901

It’s nice to see this type of guide being released, cross-platform monitoring with SCOM is still not very common and it can be tricky to get right.

Loading

SCOM: Caution with management packs in large environments

Kevin Holman recently published a great article about the inherent pitfalls of importing a management pack into an environment without understanding the intended scope, scalability, and any known/common issues.

Specifically he discusses the Dell  Hardware Management Pack (Detailed Edition) which has a small scalability limitation of 300 agents.

The lesson to learn here is – be careful when importing MP’s.  A badly written MP, or an MP designed for small environments, might wreak havoc in larger ones.  Sometimes the recovery from this can be long and quite painful.   An MP that tests out fine in your Dev SCOM environment might have issues that wont be seen until it moves into production.  You should always monitor for changes to a production SCOM deployment after a new MP is brought in, to ensure that you don’t see a negative impact.  Check the management server event logs, MS CPU performance, database size, and disk/CPU performance to see if there is a big change from your established baselines.

 

Go here for the full article, definitely worth the read.

Loading

SCOM *nix Monitoring: The WinRM client cannot process the request because the server name cannot be resolved

All Linux monitored servers in a critical state is not an ideal way to start a Monday morning. Especially when none of the servers are actually experiencing an issue.

The issue at hand:
All of the Linux servers generated a heartbeat failure at the same time. Looking through the health explorer revealed the following error:

 The WinRM client cannot process the request because the server name cannot be resolved.

Testing WinRM with the following command also yielded the same result, and testing with DNS resolved the server name successfully.

winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx -username:username -password:password -remote:https://servername:1270/wsman -auth:basic -skipCACheck -encoding:utf-8 -format:#pretty

The Solution:

WinRM uses the windows proxy to resolve host names, I checked the windows proxy settings on the Management Server using the following command.

netsh winhttp show proxy

and discovered that my proxy was set correctly but the bypass list for excluded servers had been replaced with a single server, using the below command I was able to amend the bypass list to include all of the local domain servers.

netsh winhttp set proxy proxy-server=”http=<proxy FQDN” bypass-list=”*<Domain Suffix>”

One that was completed the WinRM test returned the correct data and the servers started to turn green again.

 

Loading

SCOM 2012: Survival Guide

I came across a great article on TechNet which is essentially a compilation of useful SCOM information, it includes everything from the basics and key concepts, to deployment guidelines and information on how to configure and use different features.

This is definitely one to keep in your favorites

Go here for the full article

Loading

SCOM: Some or all Cluster resources are not discovered by the OpsMgr agent

Here is an article from the OpsMgr Engineering Blog detailing an issue with SCOM discovering Ckuster Resources

“If a Cluster has orphaned object entries in ClusterHive registry key, the System Center Operations Manager agent may not discover some or all Cluster resources. This can occur with System Center Operations Manager 2007 (OpsMgr 2007) or System Center 2012 Operations Manager (OpsMgr 2012).”

 

The article details a fix which envolves locating and then removing the orphaned objects:

“First, use the Failover Cluster PowerShell commands Get-ClusterResource and Get-ClusterGroup to get the list of Resources and Groups. Then, using the output, check for Resources/Groups that appear as Offline and verify if these can be seen in the Failover Cluster Console. Verify with the Cluster administrator whether these are still valid, then assuming they are not and you’ve identified which ones that are orphaned, delete them using these commands from an elevated CMD Prompt (Run As Administrator):

For orphaned resources: Cluster RES “<RESOURCE_NAME>” /DELETE

For orphaned groups: Cluster GROUP “<GROUP_NAME>” /DELETE

Once this is done the missing cluster resources should now be discovered.”

Loading

SCOM: Troubleshooting Flow For Slow SCOM 2012x Consoles

A while ago I had an issue at a customer where their SCOM console would experience huge performance degradation at random intervals.

This is a slow and complex situation to troubleshoot which is why I am glad to see a comprehensive and well laid out Troubleshooting plan from Marnix Wolf.  This is a must read to better understand the areas that can impact your SCOM environment performance and more importantly your user experience, as people don’t want to use a console that’s slow.

Click here for the full article.

Loading

SCOM: Agents not submitting performance data

Sometimes you might have a situation where all of your agents are showing as healthy in the console but when you try and draw a performance report data is missing.

The below SQL query which has been developed by my colleague Gerrie Louw will identify any agent that has not submitted performance data in the past 4 hours. It does so by checking the following performance counters:

Processor > % Processor Time
LogicalDisk > % Free Space > C:
Memory > Available MBytes

Note: You will probably have to change the DisplayName_ and IsVirtualNode for your OperationsManager database.

if object_id(‘tempdb..#temptable’) IS NOT NULL
DROP TABLE #temptable

SELECT     distinct bmetarget.Name into #temptable
FROM        OperationsManager.dbo.BaseManagedEntity AS BMESource WITH (nolock) INNER JOIN
OperationsManager.dbo.Relationship AS R WITH (nolock) ON
R.SourceEntityId = BMESource.BaseManagedEntityId INNER JOIN
OperationsManager.dbo.BaseManagedEntity AS BMETarget WITH (nolock) ON
R.TargetEntityId = BMETarget.BaseManagedEntityId inner join mtv_computer d on bmetarget.name=d.[DisplayName_55270A70_AC47_C853_C617_236B0CFF9B4C]
and d.IsVirtualNode_E817D034_02E8_294C_3509_01CA25481689 is null
WHERE     (bmetarget.fullname like ‘Microsoft.Windows.Computer%’)

if object_id(‘tempdb..#healthstate’) IS NOT NULL
DROP TABLE #healthstate

select  megv.path, megv.ismanaged, megv.isavailable, megv.healthstate into #healthstate
from managedentitygenericview as megv with (nolock) inner join managedtypeview as mtv with (nolock)
on megv.monitoringclassid=mtv.id
where mtv.name =’microsoft.systemcenter.agent’

if object_id(‘tempdb..#perfcpudata’) IS NOT NULL
DROP TABLE #perfcpudata

select Path, ‘CPU’ as ‘Cat’ into #perfcpudata
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where (TimeSampled < GETUTCDATE() AND TimeSampled > DATEADD(MINUTE,-240, GETUTCDATE()))
and objectname =’Processor’ and countername=’% Processor Time’

if object_id(‘tempdb..#perfmemdata’) IS NOT NULL
DROP TABLE #perfmemdata

select Path,’Memory’ as ‘Cat’ into #perfmemdata
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where (TimeSampled < GETUTCDATE() AND TimeSampled > DATEADD(MINUTE,-240, GETUTCDATE()))
and objectname =’Memory’ and countername=’Available MBytes’

if object_id(‘tempdb..#perfdiskdata’) IS NOT NULL
DROP TABLE #perfdiskdata

select Path,’Disk’ as ‘Cat’ into #perfdiskdata
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where (TimeSampled < GETUTCDATE() AND TimeSampled > DATEADD(MINUTE,-240, GETUTCDATE()))
and objectname =’LogicalDisk’ and countername=’% Free Space’ and instancename=’C:’

if object_id(‘tempdb..#temptable1′) IS NOT NULL
DROP TABLE #temptable1
create table #temptable1 (
name nvarchar(250),
cat nvarchar(20),
val nvarchar(2)
)
insert into #temptable1
select name, ‘CPU’ as ‘cat’, ’1′ as ‘val’
from #temptable where name not in
(select path from #perfcpudata)

insert into #temptable1
select name, ‘Memory’ as ‘cat’, ’1′ as ‘val’
from #temptable where name not in
(select path from #perfmemdata)

insert into #temptable1
select name, ‘Disk’ as ‘cat’, ’1′ as ‘val’
from #temptable where name not in
(select path from #perfdiskdata)

if object_id(‘tempdb..#output’) IS NOT NULL
DROP TABLE #output
create table #output (
name nvarchar(250),
cpu nvarchar(2),
memory nvarchar(2),
disk nvarchar(2)
)

insert into #output
select distinct tt.name ,’0′,’0′,’0′
from #temptable1 as tt, #healthstate as hs
where tt.name=hs.path collate SQL_Latin1_General_CP1_CI_AS
and hs.isavailable=1
and hs.ismanaged=1
and hs.healthstate is not null

update #output set cpu=1 where #output.name in (select name from #temptable1 where #temptable1.name=#output.name and #temptable1.cat=’CPU’)
update #output set memory=1 where #output.name in (select name from #temptable1 where #temptable1.name=#output.name and #temptable1.cat=’Memory’)
update #output set disk=1 where #output.name in (select name from #temptable1 where #temptable1.name=#output.name and #temptable1.cat=’Disk’)

select * from #output

You can use this query to build a report such as the one sampled below:

No Perf Report edited

Loading

SCOM: Top Support Solutions for System Center 2012 Operations Manager

Here is a fantastic technet blog with the top Microsoft Support solutions for the most common issues experienced when using System Center 2012 Operations Manager (updated quarterly).

Definitely one to have a look through occasional to see what the top issues are, that are being experienced with SCOM.

http://blogs.technet.com/b/topsupportsolutions/archive/2014/02/04/top-support-solutions-for-system-center-2012-operations-manager.aspx

There are also similar support pages for other Microsoft products, http://blogs.technet.com/b/topsupportsolutions/

On a side note if you find this blog useful I encourage you to follow me on Twitter  and / or add me on LinkedIn

Loading