Microsoft has released a new Update Rollup for the Microsoft Monitoring Agent it is available through windows update or via the catalog link in the KB article here.
Issues that are fixed in this update rollup
Support for Windows Server 2008 and Windows Vista. Add support for Windows Server 2008 and Windows Vista on-boarding to the Microsoft Operations Management Suite.
Maintain original identification after Operations Management Suite is disabled and then re-enabled or after agent is reinstalled. Preserve the identification of the system during upgrades or when enabling and disabling the Operations Management Suite. Previously, disabling and then re-enabling a connection to the Operations Management Suite caused a new agent identification to be generated.
Always use the Microsoft .NET Framework 4.0 for loading managed modules if it is available. Set a preference for the .NET Framework 4.0 when managed code is loaded in management packs. The .NET Framework 2.0 is used if the .NET Framework 4.0 is not available. Previously, the .NET Framework 4.0 would be used on Windows Server 2012, Windows 8, or later versions of Windows operating systems while the .NET Framework 2.0 was loaded on earlier versions of Windows.
Fix the Apply button that is not working in the Microsoft Monitoring Agent control panel.
Remove a dialog box that is displayed during a silent installation when a restart was required.
Microsoft has just announced a new update rollup for the Microsoft Monitoring Agent, the KB article and download link is here.
Issues that are fixed in this update rollup
Proxy settings are added to the agent setup wizard.
Users can now provide proxy settings while they complete the agent installation wizard.
Account validation steps are added to the wizard.
After users enter an account and workspace to connect to Operational Insights, the wizard performs a test to verify that the account information is valid.
Connection status is added to Control Panel.
A connection status feature is added to Control Panel settings for the Microsoft Monitoring Agent.
I often find it interesting, particularly among younger engineers how little emphasis is placed on this icon , discussions usually center around phrases like “it’s only one grey agent” and there is little urgency in fixing that agent.
I use one agent as my example because what does that one agent really mean. Well it of course does depend on each environment and requires some understanding of the clients business. For example it’s the end of the month and is the payroll server, or perhaps you are supporting a web based business such as an online retailer and is their web front end. Management servers may be the heart and soul of Operations Manager but without the lifeblood of the agents you don’t have much to work with.
Agents watch data sources on the monitored computer and collect information according to the configuration that is sent to it from its management server. The agent also calculates the health state of the monitored computer and objects on the monitored computer and reports back to the management server. When the health state of a monitored object changes or other criteria are met, an alert can be generated from the agent. This lets operators know that something requires attention. By providing health data about the monitored object to the management server, the agent provides an up-to-date picture of the health of the device and all the applications that it hosts.
Below is a diagram of the data flow from the agent to the management server and from there through to the Ops and DW databases.
I posted an article earlier in the year about agents not submitting performance data that shows how to pickup agents that may not be working even if they show green in the console. It just emphasises that just because agents look like they are working it doesn’t mean they are 100%
Hopefully this article helps to raise awareness of the importance of happy SCOMing.
An issue to be aware of when you package your SCOM agent with your server build image is that when the server is built a certificate is generated for the agent to use, this certificate resides in the Operation Manager Certificate Store. If the server is then renamed due to it having a temporary build name you will see the below error in your Operations Manager event log.
Event: 7022 Source: HealthService
The Health Service has downloaded secure configuration for management group <MG Name>, and processing the configuration failed with error code Keyset does not exist(0x80090016).
Re-installing the agent will fix this issue but there is a simpler solution by Gerrie Louw, open your certificate MMC, navigate to the Operation Manager Store and delete the certificate, then restart your Healthservice.
The symptoms can occur with all versions of the SCOM / MMA agent under the agent packaged with a server image scenario.
Sometimes you might have a situation where all of your agents are showing as healthy in the console but when you try and draw a performance report data is missing.
The below SQL query which has been developed by my colleague Gerrie Louw will identify any agent that has not submitted performance data in the past 4 hours. It does so by checking the following performance counters:
Processor > % Processor Time
LogicalDisk > % Free Space > C:
Memory > Available MBytes
Note: You will probably have to change the DisplayName_ and IsVirtualNode for your OperationsManager database.
if object_id(‘tempdb..#temptable’) IS NOT NULL
DROP TABLE #temptable
SELECT distinct bmetarget.Name into #temptable
FROM OperationsManager.dbo.BaseManagedEntity AS BMESource WITH (nolock) INNER JOIN
OperationsManager.dbo.Relationship AS R WITH (nolock) ON
R.SourceEntityId = BMESource.BaseManagedEntityId INNER JOIN
OperationsManager.dbo.BaseManagedEntity AS BMETarget WITH (nolock) ON
R.TargetEntityId = BMETarget.BaseManagedEntityId inner join mtv_computer d on bmetarget.name=d.[DisplayName_55270A70_AC47_C853_C617_236B0CFF9B4C]
and d.IsVirtualNode_E817D034_02E8_294C_3509_01CA25481689 is null
WHERE (bmetarget.fullname like ‘Microsoft.Windows.Computer%’)
if object_id(‘tempdb..#healthstate’) IS NOT NULL
DROP TABLE #healthstate
select megv.path, megv.ismanaged, megv.isavailable, megv.healthstate into #healthstate
from managedentitygenericview as megv with (nolock) inner join managedtypeview as mtv with (nolock)
on megv.monitoringclassid=mtv.id
where mtv.name =’microsoft.systemcenter.agent’
if object_id(‘tempdb..#perfcpudata’) IS NOT NULL
DROP TABLE #perfcpudata
select Path, ‘CPU’ as ‘Cat’ into #perfcpudata
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where (TimeSampled < GETUTCDATE() AND TimeSampled > DATEADD(MINUTE,-240, GETUTCDATE()))
and objectname =’Processor’ and countername=’% Processor Time’
if object_id(‘tempdb..#perfmemdata’) IS NOT NULL
DROP TABLE #perfmemdata
select Path,’Memory’ as ‘Cat’ into #perfmemdata
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where (TimeSampled < GETUTCDATE() AND TimeSampled > DATEADD(MINUTE,-240, GETUTCDATE()))
and objectname =’Memory’ and countername=’Available MBytes’
if object_id(‘tempdb..#perfdiskdata’) IS NOT NULL
DROP TABLE #perfdiskdata
select Path,’Disk’ as ‘Cat’ into #perfdiskdata
from PerformanceDataAllView pdv with (NOLOCK)
inner join PerformanceCounterView pcv on pdv.performancesourceinternalid = pcv.performancesourceinternalid
inner join BaseManagedEntity bme on pcv.ManagedEntityId = bme.BaseManagedEntityId
where (TimeSampled < GETUTCDATE() AND TimeSampled > DATEADD(MINUTE,-240, GETUTCDATE()))
and objectname =’LogicalDisk’ and countername=’% Free Space’ and instancename=’C:’
if object_id(‘tempdb..#temptable1′) IS NOT NULL
DROP TABLE #temptable1
create table #temptable1 (
name nvarchar(250),
cat nvarchar(20),
val nvarchar(2)
)
insert into #temptable1
select name, ‘CPU’ as ‘cat’, ’1′ as ‘val’
from #temptable where name not in
(select path from #perfcpudata)
insert into #temptable1
select name, ‘Memory’ as ‘cat’, ’1′ as ‘val’
from #temptable where name not in
(select path from #perfmemdata)
insert into #temptable1
select name, ‘Disk’ as ‘cat’, ’1′ as ‘val’
from #temptable where name not in
(select path from #perfdiskdata)
if object_id(‘tempdb..#output’) IS NOT NULL
DROP TABLE #output
create table #output (
name nvarchar(250),
cpu nvarchar(2),
memory nvarchar(2),
disk nvarchar(2)
)
insert into #output
select distinct tt.name ,’0′,’0′,’0′
from #temptable1 as tt, #healthstate as hs
where tt.name=hs.path collate SQL_Latin1_General_CP1_CI_AS
and hs.isavailable=1
and hs.ismanaged=1
and hs.healthstate is not null
update #output set cpu=1 where #output.name in (select name from #temptable1 where #temptable1.name=#output.name and #temptable1.cat=’CPU’)
update #output set memory=1 where #output.name in (select name from #temptable1 where #temptable1.name=#output.name and #temptable1.cat=’Memory’)
update #output set disk=1 where #output.name in (select name from #temptable1 where #temptable1.name=#output.name and #temptable1.cat=’Disk’)
select * from #output
You can use this query to build a report such as the one sampled below:
There is a known issue where SCOM 2012 agents stop responding on Windows 2012 R2 Domain Controllers but can affect other Windows 2012 R2 servers as well . Kevin Holman has posted an article with the resolution to this issue: Here
Now here’s something you certainly don;t see every day. It all started when I was asked to investigate a flood of memory alerts for a particular server at one of my customers. When I opened Health Explorer I noticed the following:
The server was running the monitors for Windows 2008 and Windows 2003.
As it turns out the server had been recently re-installed from 2003 to 2008 with the same name, without the agent being uninstalled or being removed from the console. This caused a bit of confusion in the back-end. A quick look at the Windows Server 2003 Operating System Inventory showed another server which was “Upgraded” i in the same fashion:
What’s happened here is the class for Windows Server 2003 Operating System is still being loaded by the agent, and this is causing all of the related rules and monitors to load as well. In the past when I’ve come across this particular issue I’ve been able to solve it with the remove-disabledmonitoringobject powershell cmdlet.
All that you need to do is override the discovery rule in question to false for your object (In this case “Discover Windows Server 2003 Operating System) and then open OpsMgr Shell and run remove-disabledmonitoringobject. After a short delay the offending objects are removed.
However in this case the above did not work, eventually I deleted the agent from the console, waited for grooming to run (you can force it if you are in a hurry), cleared the local agent cache and then approved the agent. Now only the correct objects are being discovered.
With the upcoming release of System Center 2012 R2, Microsoft will be replacing the familiar SCOM agent with a new Microsoft Monitoring Agent which can be downloaded here.
The key feature being that the agent can operate in a standalone configuration, which generates an IntelliTrace file view-able with Visual Studio 2013 Release Candidate.or any other version of Visual Studio 2013.
It is worth noting that the current SCOM agent is not compatible with SCOM 2012 R2, the MMA agent is however able to connect to a SCOM 2012 SP1 and RTM management groups meaning it will be worthwhile to upgrade your agents ahead of your Management Group if you want to minimize monitoring downtime.
Marnix Wolf has created a table detailing the primary differences between the two agents here.
In the case of an agent that is managing a large amount of objects you may find that not all of them are discovered or if they are that some of them remain in a Not Monitored State. This can be caused by a couple of things.
If you find this error in your OpsMgr event log: “The health service has removed some items from the send queue for management group since it exceeded the maximum allowed size of 15 megabytes”
The the below registry keys need to be adjusted:
Set HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\HealthService\Parameters\Persistence Version Store Maximum to 80 MB (5120). Default = 60 MB
Set HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\HealthService\Parameters\Management Groups\<MG Name>\maximumQueueSizeKb to 100 MB. Default = 15 MB
Set HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft Operations Manager\3.0\Modules\Global\PowerShell\ScriptLimit\QueueMinutes to 120 mins
However if you find this error: In memory container (hash table System.Health.EntityStateChangeData) had to drop data because it reached max limit. Possible data loss.
Then the following registry key need to be adjusted:
HKLM\System\CurrentControlSet\Services\HealthService\Parameters:”State Queue Items”, the default value for this key is 1024, depending on the server load double this to 2048 or if the error continues to occur to 4096
I have come across instances where both of these errors occur, after the adjustments were made and the heath service restarted all objects were discovered and monitored correctly.
Not a common error by any means and there are several blog posts out there pertaining to other error codes.
Marnix Wolf has a great article about Error 2147500037
If you get error 5 (0x5) however this means that SCOM is unable to create self-signed certificate.
In our case local system did not have full permissions to the server C:\ProgramData\Microsoft\Crypto\RSA\S-1-5-18 directory. Added that and the service started right up.