Category Archives: Troubleshooting

SCOM: Updated Exchange 2010 MP version 14.3.210.2

A new version of the  Exchange 2010 MP was released during this months wave of updates, the new version is 14.3.210.2 and it is available for download here. It addresses a particular issue related to Exchange 2010 servers running Powershell 2.0 side by side with Powershell 3.0+.

Updated in this version:

  • Added a new MSI(Exchange2010PowershellFix) that should be used if the management pack doesn’t work on an Exchange 2010 server that has Powershell 2.0/3.0+ installed side by side. Please refer to the “Changes included in 14.3.210.2 (PS 3.0+ Update)” section for more information.

The Exchange2010PowershellFix.msi is also available for download at the same location here.

Extract from the guide:

Exchange 2010 MP versions 14.03.0038.004 and earlier required ONLY Powershell 2.0 to be installed on the Exchange server for it to work. When Powershell 3.0 or higher is installed on Exchange 2010 servers that were working with only Powershell 2.0 installed, Exchange MP stops working.

The new MSI (Exchange2010PowershellFix.MSI) that has been included in this release enables Exchange 2010 MP to work on servers that have Powershell 2.0 installed side by side with Powershell 3.0+. This new MSI should be used only if your existing Exchange 2010 MP isn’t able to monitor your Exchange 2010 server that has Powershell 2.0 and Powershell 3.0+ installed side by side. This will NOT work if the server has only Powershell 3.0 or higher installed.

 If you already have 14.03.0038.004 installed, please execute the 14.3.210.2 “Exchange2010PowershellFix” MSI and import the MP’s contained within it.

 Note:  You must install the 14.03.0038.004 package (Exchange2010ManagementPackForOpsMgr2007-x64.msi/ Exchange2010ManagementPackForOpsMgr2007-x86.msi) prior to applying the 14.3.210.2 update (Exchange2010PowershellFix) in order for the Exchange 2010 MP to function correctly.

Loading

SCOM: Supercharge console performance

I came across a recent article by Marnix Wolf and S.Carrilho regarding a little known SQL setting known as Max Degree of Parallelism (MDoP)

When SQL Server runs on a computer with more than one microprocessor or CPU, it detects the best degree of parallelism, that is, the number of processors employed to run a single statement, for each parallel plan execution. You can use the max degree of parallelism option to limit the number of processors to use in parallel plan execution.

This becomes an issue on servers with hyper-thread enabled processors, as by the nature of hyper-threading the system thinks that there are more cores available then there physically are.

This setting can be found under SQL Server advanced properties:

mdop

In order to calculate the recommended value for this setting we need to use the MDoP calulator which makes use of two queries:

1. Output of following query from the SQL Server instance: 

SELECT COUNT(DISTINCT memory_node_id) AS NUMA_Nodes FROM sys.dm_os_memory_clerks WHERE memory_node_id!=64

2. Launch Powershell and get the output of following PS command:
Get-WmiObject -namespace “root\CIMV2” -class Win32_Processor -Property NumberOfCores | select NumberOfCores

3. Input these value into the calculator:

mdopcalc

In this example the SQL query returned a value of 1 and the PS returned a count of 4 cores, the calculator recommends an MDoP setting of 4.

I’ve had situations in the past where no amount of tweaking seemed to improve console performance and this certainly a factor I will be taking into account in the future. Definitely give Marnixs’ article a read as he covers the topic in more detail with additional findings from his field experience.

There are other ways to improve console performance, I will combine them together in a future blog post about complete console tuning.

Loading

SCOM: Performance views still show counters when performance collection rules are disabled

Microsoft recently released KB3002249 which details an issue whereby when performance collection rules are disabled in SCOM, performance views still show counters even after all the data is groomed out.

This effects all versions of SCOM and can make the PerformanceDataAllView difficult to read due to clutter.

“This issue is by design. The Operations Manager grooming processes does not groom the PerformanceSource table.”

The below query can be used to see which performance counters will be deleted for what objects before you run the delete script:

Use OperationsManager select PS.PerformanceSourceInternalId, BME.BaseManagedEntityId, BME.DisplayName, PC.CounterName, PC.ObjectName, PS.TimeAdded, PS.LastModified, PDA.PerformanceSourceInternalId from PerformanceSource PS left join PerformanceDataAllView PDA on PDA.PerformanceSourceInternalID = PS.PerformanceSourceInternalId join PerformanceCounter PC on PC.PerformanceCounterId = PS.PerformanceCounterId join BaseManagedEntity BME on BME.BaseManagedEntityId = PS.BaseManagedEntityId where PDA.PerformanceSourceInternalId IS NULL

 

The following is a small SQL script that will remove the entries from PerformanceDataAllView for which no data is recorded.

Note Stop all the Operations Manager services on all Management Servers before you run the script. Always back up your OperationsManager Database before you run this script.

Use OperationsManager
delete from PerformanceSource where PerformanceSourceInternalId in 
(
select PS.PerformanceSourceInternalId from PerformanceSource PS
left join PerformanceDataAllView PDA on PDA.PerformanceSourceInternalID = PS.PerformanceSourceInternalId
where PDA.PerformanceSourceInternalId IS NULL
)

Loading

SCOM: Agent error Keyset does not exist

An issue to be aware of when you package your SCOM agent with your server build image is that when the server is built a certificate is generated for the agent to use, this certificate resides in the Operation Manager Certificate Store. If the server is then renamed due to it having a temporary build name you will see the below error in your Operations Manager event log.

Event: 7022
Source: HealthService

The Health Service has downloaded secure configuration for management group <MG Name>, and processing the configuration failed with error code Keyset does not exist(0x80090016).

Re-installing the agent will fix this issue but there is a simpler solution by Gerrie Louw, open your certificate MMC, navigate to the Operation Manager Store and delete the certificate, then restart your Healthservice.

The symptoms can occur with all versions of the SCOM / MMA agent under the agent packaged with a server image scenario.

Loading

SCOM: Guided walkthrough for troubleshooting UNIX and Linux agent discovery

Microsoft has just released a Guided Walkthrough for troubleshooting UNIX and Linux agent discovery issues in System Center 2012 Operations Manager.

It is available here:KB2993901

It’s nice to see this type of guide being released, cross-platform monitoring with SCOM is still not very common and it can be tricky to get right.

Loading

SCOM: Caution with management packs in large environments

Kevin Holman recently published a great article about the inherent pitfalls of importing a management pack into an environment without understanding the intended scope, scalability, and any known/common issues.

Specifically he discusses the Dell  Hardware Management Pack (Detailed Edition) which has a small scalability limitation of 300 agents.

The lesson to learn here is – be careful when importing MP’s.  A badly written MP, or an MP designed for small environments, might wreak havoc in larger ones.  Sometimes the recovery from this can be long and quite painful.   An MP that tests out fine in your Dev SCOM environment might have issues that wont be seen until it moves into production.  You should always monitor for changes to a production SCOM deployment after a new MP is brought in, to ensure that you don’t see a negative impact.  Check the management server event logs, MS CPU performance, database size, and disk/CPU performance to see if there is a big change from your established baselines.

 

Go here for the full article, definitely worth the read.

Loading

SCOM *nix Monitoring: The WinRM client cannot process the request because the server name cannot be resolved

All Linux monitored servers in a critical state is not an ideal way to start a Monday morning. Especially when none of the servers are actually experiencing an issue.

The issue at hand:
All of the Linux servers generated a heartbeat failure at the same time. Looking through the health explorer revealed the following error:

 The WinRM client cannot process the request because the server name cannot be resolved.

Testing WinRM with the following command also yielded the same result, and testing with DNS resolved the server name successfully.

winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx -username:username -password:password -remote:https://servername:1270/wsman -auth:basic -skipCACheck -encoding:utf-8 -format:#pretty

The Solution:

WinRM uses the windows proxy to resolve host names, I checked the windows proxy settings on the Management Server using the following command.

netsh winhttp show proxy

and discovered that my proxy was set correctly but the bypass list for excluded servers had been replaced with a single server, using the below command I was able to amend the bypass list to include all of the local domain servers.

netsh winhttp set proxy proxy-server=”http=<proxy FQDN” bypass-list=”*<Domain Suffix>”

One that was completed the WinRM test returned the correct data and the servers started to turn green again.

 

Loading

SCOM 2012: Survival Guide

I came across a great article on TechNet which is essentially a compilation of useful SCOM information, it includes everything from the basics and key concepts, to deployment guidelines and information on how to configure and use different features.

This is definitely one to keep in your favorites

Go here for the full article

Loading

SCOM: Some or all Cluster resources are not discovered by the OpsMgr agent

Here is an article from the OpsMgr Engineering Blog detailing an issue with SCOM discovering Ckuster Resources

“If a Cluster has orphaned object entries in ClusterHive registry key, the System Center Operations Manager agent may not discover some or all Cluster resources. This can occur with System Center Operations Manager 2007 (OpsMgr 2007) or System Center 2012 Operations Manager (OpsMgr 2012).”

 

The article details a fix which envolves locating and then removing the orphaned objects:

“First, use the Failover Cluster PowerShell commands Get-ClusterResource and Get-ClusterGroup to get the list of Resources and Groups. Then, using the output, check for Resources/Groups that appear as Offline and verify if these can be seen in the Failover Cluster Console. Verify with the Cluster administrator whether these are still valid, then assuming they are not and you’ve identified which ones that are orphaned, delete them using these commands from an elevated CMD Prompt (Run As Administrator):

For orphaned resources: Cluster RES “<RESOURCE_NAME>” /DELETE

For orphaned groups: Cluster GROUP “<GROUP_NAME>” /DELETE

Once this is done the missing cluster resources should now be discovered.”

Loading

SCOM: Troubleshooting Flow For Slow SCOM 2012x Consoles

A while ago I had an issue at a customer where their SCOM console would experience huge performance degradation at random intervals.

This is a slow and complex situation to troubleshoot which is why I am glad to see a comprehensive and well laid out Troubleshooting plan from Marnix Wolf.  This is a must read to better understand the areas that can impact your SCOM environment performance and more importantly your user experience, as people don’t want to use a console that’s slow.

Click here for the full article.

Loading