The Return of the Search Application Topology Component Health State Error
Today I was prepping a SharePoint 2013 VM and noticed the Search Service Administration page in Central Admin didn’t display the Search Application Topology health indicators. Instead it showed an error I hadn’t seen in a while:
“Unable to retrieve topology component health states. This may be because the admin component is not up and running”
I had encountered this error several times during the SharePoint 2013 preview phase. Then it turned out to be a genuine bug and fixing it required the installation of several hotfixes as detailed in the SharePoint Server 2013 Known Issues article. However, on this particular VM I had installed the RTM versions of Windows Server 2012, SQL Server 2012 and SharePoint 2013. I also had installed the latest updates and the SharePoint 2013 Prerequisites Installer should have taken care of other missing stuff. So this bug shouldn’t have happened again.
Just to be sure I tried installing the four hotfix packages, but (of course) this failed as they were either already installed or not relevant anymore. Now, I was pretty sure I had seen fully working Search Application Topology health indicators the night before (yeah, we SharePointies are nightly creatures). So what could have changed?
Then I remembered I had performed a little tweak to minimize SharePoint 2013’s memory usage. As you may know SharePoint 2013 has some very high memory requirements, especially around Search. Specifically the Search Engine’s NodeRunner.exe processes are memory hungry beasts. A SharePoint 2013 server configured for Search typically has five such NodeRunner.exe instances running, each consuming hundreds of megabytes of memory. For production environments this may be just fine, but on my development and demo VM’s I like them a little less greedy.
From earlier blog posts I learned it is possible to limit the amount of memory a NodeRunner.exe instance uses, by adjusting the memoryLimitMegabytes parameter in the C:\Program Files\Microsoft Office Servers\15.0\Search\Runtime\1.0\NodeRunner.exe.config
XML file. In fact, in the SharePoint 2013 Preview version the NodeRunners suffered from a memory leak and adjusting this config was a quick hack to fix this leak. One of the last things I did last night was change this value from 0 (unlimited) to 100 MB. However, I hadn’t actually come around to testing the outcome of this change (yes, I do need some sleep).
...
<nodeRunnerSettings memoryLimitMegabytes="100" />
...
So the first thing I did was starting SysInternals Process Explorer (which I always use instead of the built-in Task Manager) and searched for the NodeRunner.exe processes.
To my surprise there weren’t any…
The NodeRunner.exe instances are spawned by SharePoint’s Search Host Controller Service, so the first thing I did was go to the “Manage Services on Server” page in Central Admin and restart that service. Thanks to Process Explorer’s handy ‘Difference Highlighting’ feature it was clearly visible that several NodeRunner instances got started, but almost immediately stopped running.
To get a little more information I started SysInternals Process Monitor (another indispensable tool) and logged the system while restarting the Search Host Controller Service. From Process Monitor’s ‘Process Tree’ dialog it was clear (too) many NodeRunner instances were started, but somehow weren’t able to survive for long.
Process Explorer also showed me the NodeRunner processes were started using the following command line parameter:
"C:\Program Files\Microsoft Office Servers\15.0\Search\Runtime\1.0\NodeRunner.exe" --noderoot "C:\Program Files\Microsoft Office Servers\15.0\Data\Office Server\Applications\Search\Nodes\9E4952\IndexComponent1" --addfrom "C:\Program Files\Microsoft Office Servers\15.0\Data\Office Server\Applications\Search\Nodes\9E4952\IndexComponent1\Configuration\Local\Node.ini" --tracelog "C:\Program Files\Microsoft Office Servers\15.0\Data\Office Server\Applications\Search\Nodes\9E4952\IndexComponent1\Logs\NodeRunner.log"
From the bold piece above I learned NodeRunners keep a trace log. When I checked that file, the cause for the failing of the NodeRunners became pretty clear:
Caught exception in node activator: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at Microsoft.Ceres.CoreServices.Node.NodeActivator.ActivateNode(IDictionary`2 configuration)
--- End of inner exception stack trace ---
Since I figured this must be due to me changing NodeRunner’s memory limits, I reset the NodeRunner’s config file (hey, always keep a backup!) and again restarted the Search Host Controller Service. This time the NodeRunner instances continued to run just fine. So it became clear to me there apparently is some minimum amount of memory a NodeRunner instance needs to be able to run and in my case it needed more than 100 MB.
With the NodeRunner instances running just fine, the Topology Health indicators also worked again, showing my farm’s Search stuff was in perfect health. I experimented with the NodeRunner memory limits some more and finally settled for a 250 MB limit. Not as small as I’d liked them to be, but still a significant decrease when compared to their original memory footprint.
I do want to make clear that even with this 250 MB limit I experienced some NodeRunner crashes. The general advice is to NOT change the NodeRunner memory limit configuration. And NEVER EVER do this in a production environment!
Hopefully this blog post helps out other people who might stumble upon this problem.
Posted on November 8, 2012, in 2013, sharepoint, Uncategorized and tagged search, sharepoint, sharepoint 2013, vm. Bookmark the permalink. 11 Comments.
Hi
great post. I just had the same issue with the RTM bits on Server 2008 R2 SP1. The mentioned hotfix kb2533623 does not apply to my System. But I had already configured the noderunner Limit to 256MB before provisioning the Service application and it always failed with your error in the ULS log and when provisioning via PowerShell it said
Exception calling “Activate” with “0” argument(s): “Topology activation
failed. Failed to connect to system manager. SystemManagerLocations:
net.tcp://srv1/90C38B/AdminComponent1/Management”
At line:1 char:1
+ $clone.Activate()
+ ~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : SearchTopologyActivationException
Releasing the noderunner Limit did the trick.
Hi,
Awesome post. Just help me fix the issue I created myself. Nice job working through that, too.
Hello,
Great post–you just helped me solve another self-created problem and saved me the trouble of troubleshooting. Nice work and thank you.
Ray
I had the same issue – my farm was three-tier, streamlined topology – DB, App and WFE. All with Windows Server 2012, SQL 2012, SharePoint 2013 RTM. I tried everything I found in internet, but the host controller service didn’t started. My resolution was to add all features from .NET 3.5 and .NET 4.5, including all kind of WCF activations (except MSMQ activation).
Awesome analysis. I got sick of deleting and recreating the Search service as advised by other posts on the web as the problem always came back and this clearly explains why.
Nice job. You just made my day.
I set 10Gb of RAM at my test VM, installed December 2013 CU, set 500mb as the limit for nodeRunner and got about 50% of memory taken by all apps. But this crap throws outOfMemoryException every 20 seconds in event log 😦
Guys – I ran into this issue and it took almost a weeks time to fix without reinstalling of SP 2013, but i had to recreate search application multiple times though the problem didn’t fixed. Finally using ULS log viewer i noticed that my SharePoint Search Host Controller service / application was running with a different service account, after changing the service account to the same account used for search service account fixed the problem and all issues fixed within minutes.
thanks ! it work!!
it correct all my errors syndroms that raise that kind of message
• sharepoint 2013 error id 2548
• Content Plugin can not be initialized – list of CSS addresses is not set
• Failed to extract required parameter FastConnector:ContentDistributor, hr=0x80070002 [pluginconfig.cpp:81]
Thankyou! This post helped me finally get to the bottom of the issue… which was caused by an earlier update I had made to the noderunner config.
Pingback: My Experience in Provisioning Search Service Application in SharePoint 2013 | Tjen Sharepoint Blog