It is always great when the capabilities of Operation Manager are extended to monitor new areas that were previously outside of the general WinTel scope. The latest addition from NiCE is a zLinux management pack.
The NiCE zLinux MP is the first product of its class that provides monitoring of Linux server distributions on the IBM system z platform using Microsoft System Center. IBM System z is a universal name used by IBM for all its mainframe computers. These mainframe computers have gone through a number of name changes and are also known as System/390 or zSeries.
This management pack allows for inclusion of your zLinux components into your SCOM system which enables you to get a complete picture of the systems that are dependent on zLinux in a single end-to-end view.
• Perform ‘Logical Disk Health checks’ i.e. easily ascertain the availability and performance of your Logical Disk (File System) instances
• Carry out ‘Network Adapter Health checks’ i.e. determine the availability and performance of your Network Adapter instances
• Execute ‘Operating System Health assessments’ i.e. effortlessly discover the availability and performance for Red Hat Enterprise
Linux Server Operating System instances
• Determine the health of your processor by effectively monitoring your processor instances
More information is available at the NiCE Customer Portal
All Linux monitored servers in a critical state is not an ideal way to start a Monday morning. Especially when none of the servers are actually experiencing an issue.
The issue at hand:
All of the Linux servers generated a heartbeat failure at the same time. Looking through the health explorer revealed the following error:
The WinRM client cannot process the request because the server name cannot be resolved.
Testing WinRM with the following command also yielded the same result, and testing with DNS resolved the server name successfully.
winrm enumerate http://schemas.microsoft.com/wbem/wscim/1/cim-schema/2/SCX_OperatingSystem?__cimnamespace=root/scx -username:username -password:password -remote:https://servername:1270/wsman -auth:basic -skipCACheck -encoding:utf-8 -format:#pretty
WinRM uses the windows proxy to resolve host names, I checked the windows proxy settings on the Management Server using the following command.
netsh winhttp show proxy
and discovered that my proxy was set correctly but the bypass list for excluded servers had been replaced with a single server, using the below command I was able to amend the bypass list to include all of the local domain servers.
netsh winhttp set proxy proxy-server=”http=<proxy FQDN” bypass-list=”*<Domain Suffix>”
One that was completed the WinRM test returned the correct data and the servers started to turn green again.
The first update for System Center 2012 R2 has been released, Update Rollup 1 is avaibile here
This update contains 10 fixes for System Center 2012 Operations Manager and adds support for Oracle Solaris 11 aswell as fixing 8 issues with UNIX and Linux Monitoring.
It’s nice to see the release of an update so soon after the initial launch.
Update: Kevin Holman has written a great step by step article for applying this update, there are some thing to e aware of.
Having come across another patch recently which can cause critical issue with SCOM I’ve decided to create a page to record the KB numbers on as well as any relevant additional information.
1. KB2585542 – This patch will break Unix monitoring due to causing WS-Management connections to UNIX/Linux agents to fail. If this patch is installed on your management servers you can either uninstall it or perform one of the following:
- Edit the registry to add this 32bit DWORD value:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SecurityProviders\SCHANNEL\ SendExtraRecord = 2
- Or there is a “FixIt” package is available in the KB article under the Known Issues section that can be used to disable the security update
2.KB2775511 – Marnix Wolf has a great article on this issue. “After installing KB2775511 on Operations Manager Management Servers, agents or servers may be affected by a deadlock.
Once in deadlock, Management Servers will generate Heart Beat failures and will go into a “greyed out” state. grayed out. As a result, devices managed by these Management Servers will also go into a “greyed out” or “not monitored state.””
This patch is a combination of 89 hot fixes so ideally you want to avoid installing it. Even though the issue doesn’t occur on all SCOM system it would be advisable to wait for an updated bulletin from the MS System Center team before installing it.
Note: Microsoft have release a hotfix to address this issue, I’d still recommend approaching with caution. Link – “SCOM 2012 or SCOM 2007 R2 throws a “Heartbeat Failure” message and then goes into a greyed out state in Windows Server 2008 R2 SP1“
The link below is an extremely useful article when is comes to troubleshooting your Unix / Linux discoveries with SCOM 2012.
Something I came across which I did not see in this article, you might get the following error during your discovery:
Unexpected DiscoveryResult.ErrorData type. Please file bug report. ErrorData: Microsoft.SystemCenter.CrossPlatform.ClientLibrary.MPAbstractions.InvalidWSManTaskResponseException Failed to parse output from WSMan discovery. Output from task was: <DataItem type=”Microsoft.SystemCenter.WSManagement.WSManData” time=”2013-02-26T15:35:00.1849706+02:00″ sourceHealthServiceId=”16F56055-5671-604C-AF9D-088444BA4B6E”><WsManData><ErrorCode>0x800703fa</ErrorCode><ErrorMessage>Illegal operation attempted on a registry key that has been marked for deletion. </ErrorMessage></WsManData></DataItem>. at System.Activities.WorkflowApplication.Invoke(Activity activity, IDictionary`2 inputs, WorkflowInstanceExtensionManager extensions, TimeSpan timeout) at System.Activities.WorkflowInvoker.Invoke(Activity workflow, IDictionary`2 inputs, TimeSpan timeout, WorkflowInstanceExtensionManager extensions) at Microsoft.SystemCenter.CrossPlatform.ClientActions.DefaultDiscovery.InvokeWorkflow(IManagedObject managementActionPoint, DiscoveryTargetEndpoint criteria, IInstallableAgents installableAgents)
If this happens all you need to do is restart your SCOM services on the management server that is running your discovery, it will then complete.