The Return of the Search Application Topology Component Health State Error
Today I was prepping a SharePoint 2013 VM and noticed the Search Service Administration page in Central Admin didn’t display the Search Application Topology health indicators. Instead it showed an error I hadn’t seen in a while:
“Unable to retrieve topology component health states. This may be because the admin component is not up and running”
I had encountered this error several times during the SharePoint 2013 preview phase. Then it turned out to be a genuine bug and fixing it required the installation of several hotfixes as detailed in the SharePoint Server 2013 Known Issues article. However, on this particular VM I had installed the RTM versions of Windows Server 2012, SQL Server 2012 and SharePoint 2013. I also had installed the latest updates and the SharePoint 2013 Prerequisites Installer should have taken care of other missing stuff. So this bug shouldn’t have happened again.
Just to be sure I tried installing the four hotfix packages, but (of course) this failed as they were either already installed or not relevant anymore. Now, I was pretty sure I had seen fully working Search Application Topology health indicators the night before (yeah, we SharePointies are nightly creatures). So what could have changed?
Then I remembered I had performed a little tweak to minimize SharePoint 2013′s memory usage. As you may know SharePoint 2013 has some very high memory requirements, especially around Search. Specifically the Search Engine’s NodeRunner.exe processes are memory hungry beasts. A SharePoint 2013 server configured for Search typically has five such NodeRunner.exe instances running, each consuming hundreds of megabytes of memory. For production environments this may be just fine, but on my development and demo VM’s I like them a little less greedy.
From earlier blog posts I learned it is possible to limit the amount of memory a NodeRunner.exe instance uses, by adjusting the memoryLimitMegabytes parameter in the
C:\Program Files\Microsoft Office Servers\15.0\Search\Runtime\1.0\NodeRunner.exe.config XML file. In fact, in the SharePoint 2013 Preview version the NodeRunners suffered from a memory leak and adjusting this config was a quick hack to fix this leak. One of the last things I did last night was change this value from 0 (unlimited) to 100 MB. However, I hadn’t actually come around to testing the outcome of this change (yes, I do need some sleep).
<nodeRunnerSettings memoryLimitMegabytes="100" />
So the first thing I did was starting SysInternals Process Explorer (which I always use instead of the built-in Task Manager) and searched for the NodeRunner.exe processes.
To my surprise there weren’t any…
The NodeRunner.exe instances are spawned by SharePoint’s Search Host Controller Service, so the first thing I did was go to the “Manage Services on Server” page in Central Admin and restart that service. Thanks to Process Explorer’s handy ‘Difference Highlighting’ feature it was clearly visible that several NodeRunner instances got started, but almost immediately stopped running.
To get a little more information I started SysInternals Process Monitor (another indispensable tool) and logged the system while restarting the Search Host Controller Service. From Process Monitor’s ‘Process Tree’ dialog it was clear (too) many NodeRunner instances were started, but somehow weren’t able to survive for long.
Process Explorer also showed me the NodeRunner processes were started using the following command line parameter:
"C:\Program Files\Microsoft Office Servers\15.0\Search\Runtime\1.0\NodeRunner.exe" --noderoot "C:\Program Files\Microsoft Office Servers\15.0\Data\Office Server\Applications\Search\Nodes\9E4952\IndexComponent1" --addfrom "C:\Program Files\Microsoft Office Servers\15.0\Data\Office Server\Applications\Search\Nodes\9E4952\IndexComponent1\Configuration\Local\Node.ini" --tracelog "C:\Program Files\Microsoft Office Servers\15.0\Data\Office Server\Applications\Search\Nodes\9E4952\IndexComponent1\Logs\NodeRunner.log"
From the bold piece above I learned NodeRunners keep a trace log. When I checked that file, the cause for the failing of the NodeRunners became pretty clear:
Caught exception in node activator: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at Microsoft.Ceres.CoreServices.Node.NodeActivator.ActivateNode(IDictionary`2 configuration)
--- End of inner exception stack trace ---
Since I figured this must be due to me changing NodeRunner’s memory limits, I reset the NodeRunner’s config file (hey, always keep a backup!) and again restarted the Search Host Controller Service. This time the NodeRunner instances continued to run just fine. So it became clear to me there apparently is some minimum amount of memory a NodeRunner instance needs to be able to run and in my case it needed more than 100 MB.
With the NodeRunner instances running just fine, the Topology Health indicators also worked again, showing my farm’s Search stuff was in perfect health. I experimented with the NodeRunner memory limits some more and finally settled for a 250 MB limit. Not as small as I’d liked them to be, but still a significant decrease when compared to their original memory footprint.
I do want to make clear that even with this 250 MB limit I experienced some NodeRunner crashes. The general advice is to NOT change the NodeRunner memory limit configuration. And NEVER EVER do this in a production environment!
Hopefully this blog post helps out other people who might stumble upon this problem.