Kevin Holman recently published a great article about the inherent pitfalls of importing a management pack into an environment without understanding the intended scope, scalability, and any known/common issues.
Specifically he discusses the Dell Hardware Management Pack (Detailed Edition) which has a small scalability limitation of 300 agents.
The lesson to learn here is – be careful when importing MP’s. A badly written MP, or an MP designed for small environments, might wreak havoc in larger ones. Sometimes the recovery from this can be long and quite painful. An MP that tests out fine in your Dev SCOM environment might have issues that wont be seen until it moves into production. You should always monitor for changes to a production SCOM deployment after a new MP is brought in, to ensure that you don’t see a negative impact. Check the management server event logs, MS CPU performance, database size, and disk/CPU performance to see if there is a big change from your established baselines.
Go here for the full article, definitely worth the read.
It is always great when the capabilities of Operation Manager are extended to monitor new areas that were previously outside of the general WinTel scope. The latest addition from NiCE is a zLinux management pack.
The NiCE zLinux MP is the first product of its class that provides monitoring of Linux server distributions on the IBM system z platform using Microsoft System Center. IBM System z is a universal name used by IBM for all its mainframe computers. These mainframe computers have gone through a number of name changes and are also known as System/390 or zSeries.
This management pack allows for inclusion of your zLinux components into your SCOM system which enables you to get a complete picture of the systems that are dependent on zLinux in a single end-to-end view.
• Perform ‘Logical Disk Health checks’ i.e. easily ascertain the availability and performance of your Logical Disk (File System) instances
• Carry out ‘Network Adapter Health checks’ i.e. determine the availability and performance of your Network Adapter instances
• Execute ‘Operating System Health assessments’ i.e. effortlessly discover the availability and performance for Red Hat Enterprise
Linux Server Operating System instances
• Determine the health of your processor by effectively monitoring your processor instances
More information is available at the NiCE Customer Portal
All Linux monitored servers in a critical state is not an ideal way to start a Monday morning. Especially when none of the servers are actually experiencing an issue.
The issue at hand:
All of the Linux servers generated a heartbeat failure at the same time. Looking through the health explorer revealed the following error:
The WinRM client cannot process the request because the server name cannot be resolved.
Testing WinRM with the following command also yielded the same result, and testing with DNS resolved the server name successfully.
winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx -username:username -password:password -remote:https://servername:1270/wsman -auth:basic -skipCACheck -encoding:utf-8 -format:#pretty
WinRM uses the windows proxy to resolve host names, I checked the windows proxy settings on the Management Server using the following command.
netsh winhttp show proxy
and discovered that my proxy was set correctly but the bypass list for excluded servers had been replaced with a single server, using the below command I was able to amend the bypass list to include all of the local domain servers.
netsh winhttp set proxy proxy-server=”http=<proxy FQDN” bypass-list=”*<Domain Suffix>”
One that was completed the WinRM test returned the correct data and the servers started to turn green again.